Homec4science

Prevent backtracking on long JSON strings with escape codes

Authored by Alex Vandiver <alexmv@dropbox.com> on Feb 15 2017, 23:11.

Description

Prevent backtracking on long JSON strings with escape codes

Summary:
The JSONLintLexer class uses a series of regular expression rules to verify that
lexed JSON tokens are formatted correctly. The string check attempts to express
that strings are well-formed by matching:

  • Start of token
  • A double-quote
  • A repetition of zero or more of:
    • A two-character escape sequence (i.e. \n)
    • A six-character unicode codepoint (i.e. \u1234)
    • Any number of printable characters that are not \ or ".
  • A closing quote

Key to this regex is that the three alternatives that make up the core of the
string are non-overlapping sets. The PCRE engine which backs PHP, however, does
not note this, and assumes that it must thus mark each run though this
alternation as a possible point to backtrack to. To do so, internally it uses a
stack frame. Long strings can thus cause the process to run out of stack space;
this can be replicated via:

preg_match('{^(?:x|y)*}', str_repeat("x", 10000));

PCRE does not consider this type of stack exhaustion a bug -- see
https://bugs.exim.org/show_bug.cgi?id=979

Specifically, long JSON strings with many escape codes (each of which
necessitates a stack frame) can trigger this stack exhaustion in the PCRE
engine, and thus segfault the PHP process. This failure mode does not trigger
on PHP 7.0 and up, which default to using PCRE's JIT mode.

Force the PCRE engine to realize that the alternation cases when parsing the
JSON string are non-overlapping, and that backtracking across them is never an
option, by using the non-backtracking (?>x|y) capture form, along with the
non-backtracking repetition symbols ++ and *+. These allow PCRE to use a
tail-recursive form, thus preventing stack exhaustion.

Test Plan:
In a repository configured to lint JSON files:

perl -le 'print q|{"k":"|.(q|a\\n|x10000).q|"}|' > test.json
arc lint test.json

...no longer segfaults.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: Korvin, epriestley

Differential Revision: https://secure.phabricator.com/D17363

Details

Committed
Alex Vandiver <alexmv@dropbox.com>Feb 16 2017, 23:42
Pushed
aubortMar 17 2017, 12:03
Parents
rPHUe78b0df10881: Remove unused class "PhutilConsoleConcatenatedView"
Branches
Unknown
Tags
Unknown

Event Timeline

Alex Vandiver <alexmv@dropbox.com> committed rPHU863226f4e56c: Prevent backtracking on long JSON strings with escape codes (authored by Alex Vandiver <alexmv@dropbox.com>).Feb 16 2017, 23:42