Improve prose diff smoothing rules for whitespace and prefix/suffix changes
Summary:
Ref T7643. In D11297, I rewote the test plan but the algorithm chose to share spaces and produce a silly diff which a human would not produce:
{F1679097}
This diff is technically correct, but not particularly readable.
To improve this, first allow noisy changes to be smoothed at the beginning and end of runs, not just in the middle. Part of the problem was that apple and banana (both with spaces after them) were being diffed as "xxxxxs" or similar, since the spaces were removed early in the process and not smoothed. Pad the string before smoothing, and allow strings like "...xxxxs" to be smoothed into "...xxxxx".
Second when merging runs of "-" and "+", humans would apply different rules depending on the content of the added and removed text. For example, if "elephants" is changed to "cats", it's easier for humans to read this:
- elephants + cats
..than this:
- elephan + ca = ts
This is basically the smoothing rule we already apply. However, if the suffix isn't letters like ts but something like . (period, space), humans would prefer this:
- in the past + once upon a time = .<space>
So when we merge runs of changes, find common "layout" prefixes and suffixes and merge them as "=" blocks. This eliminates cases where the smoothing rule smooths things out more than a human editor would.
Test Plan:
Ran unit tests. Generated a similar local change. Before, got this "correct" mishmash which a human would not produce:
{F1679108}
After, got this better version which is what a human editor would do with this:
{F1679109}
Reviewers: chad
Reviewed By: chad
Subscribers: asherkin
Maniphest Tasks: T7643
Differential Revision: https://secure.phabricator.com/D16071