Use "\1", not "~", to delimit remarkup tokens
Summary:
- In general, markup engines need to handle rule precedence and prevent the output of rules from being incorrectly modified by further rules.
- For instance, without proper precedence rules, http://x/ might be marked up into <a href="http://x/">http://x/</a>, and then that might be marked up into <a href="http:<em>x:/>http:</em>x/</a>. From there, it's a short jump to XSS (this particular case can't happen because we're more subtle with regexps, but this is a good example of the general problem).
- Remarkup handles rule precedence by doing token replacement. Once we've matched a block of text to a rule, we remove it from the corpus and replace it with a token that doesn't match any rules. Then we run other rules safely, and eventually go back and replace all the tokens with the stored text. See PhutilRemarkupBlockStorage for a description of this.
- We currently use "~1Z", "~2Z", etc., as tokens. These don't match other rules and survive HTML encoding, so they are appropriate selections.
- But, we want to introduce
xxxfor strikethrough, which conflicts with these tokens. - Use "\11Z", "\12Z" as tokens instead. These have the same properties as the "~" rules but free up "~" for use.
Test Plan: Ran unit tests, which have reasonably extensive coverage of this case.
Reviewers: 20after4, jungejason, btrahan
Reviewed By: btrahan
CC: aran, epriestley
Differential Revision: https://secure.phabricator.com/D1942