Use a less-volatile character to delimit replaced strings in Remarkup
Summary:
See T581. This is less of an issue than I thought.
A couple years ago, Reddit had a security hole in its markdown implementation
involving some stuff like this:
http://blog.reddit.com/2009/09/we-had-some-bugs-and-it-hurt-us.html
I was concerned there might be something more troubling here, but it's actually
specific to the rules we run.
When we replace a string, we use a delimiter like '~3%', but in the proxy image
rule any delimiters already present will be urlencoded. Use a less volatile
character to survive URL encoding. The character is arbitrary, it's just a
terminator so "~1" doesn't replace "~10", "~11", etc.
We run the replacements in reverse order so this isn't normally possible anyway,
but you could have a sequence like this:
input: ~1 blah blah <10+ other string groups> escapes into: ~11 blah blah <10+ other string groups>
...and that would be replaced with group 11, not group 1. The 'Z' makes sure
that turns out OK.
Test Plan:
php>
PhabricatorMarkupEngine::newDifferentialMarkupEngine()->markupText('http://www.example.com/~xyz.jpg');
"<p><a href=\"/file/proxy/?uri=http%3A%2F%2Fwww.example.com%2F~xyz.jpg\"
target=\"_blank\"><img
src=\"/file/proxy/?uri=http%3A%2F%2Fwww.example.com%2F~xyz.jpg\"
class=\"remarkup-proxy-image\" /></a></p>"
Reviewers: cpiro
Reviewed By: cpiro
CC: aran, cpiro, epriestley
Differential Revision: 1031