Homec4science

Improve prose diffs for changes spanning very large blocks of intermediate text

Authored by epriestley <git@epriestley.com> on Nov 16 2016, 19:00.

Description

Improve prose diffs for changes spanning very large blocks of intermediate text

Summary:
Ref T7643. The failure case described in T7643#200778 is a change, followed by more than 128 sentences, followed by another change.

Because the most coarse level is "split on sentences", this hits maximum length guards and just gives up, marking the whole diff as changed.

Add a new level 0 for splitting on paragraphs. This allows us to accommodate a greater range of reasonable input texts.

This will still fail for a change, followed by more than 128 paragraphs, followed by another change. But hopefully that's outside the realm of cases which we reasonably need to handle.

(Because a "paragraph" here is "text between newlines", some types of text may have a lot of "paragraphs" and we may need to continue tweaking this: for example, remarkup tables or inline code blocks.)

Also, reduce the amount of work we do after hitting an internal limit.

Test Plan: Added failing unit test; made it pass.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T7643

Differential Revision: https://secure.phabricator.com/D16881

Details

Committed
epriestley <git@epriestley.com>Nov 16 2016, 19:09
Pushed
aubortMar 17 2017, 12:03
Parents
rPHU22305aa53e49: Translate windows timezones into Timezone DB timezones using the CLDR map
Branches
Unknown
Tags
Unknown

Event Timeline

epriestley <git@epriestley.com> committed rPHU086df1ba443c: Improve prose diffs for changes spanning very large blocks of intermediate text (authored by epriestley <git@epriestley.com>).Nov 16 2016, 19:09