Homec4science

Make prose diff algorithm more iterative, to improve prose diffs for (among…

Authored by epriestley <git@epriestley.com> on Nov 10 2016, 21:21.

Description

Make prose diff algorithm more iterative, to improve prose diffs for (among other things) removed commas

Summary:
Ref T7643. This is a little hard to explain but before we would do this:

  • Diff paragraphs.
  • For each different paragraph, diff sentences
  • For each different sentence, diff characters.

Now, we do this:

  • Diff paragraphs.
  • Collect all the identical, purely added, and purely removed paragraphs and set them aside. We know we should have good diffs for these already.
  • What's left over is sequences of removed/added/changed paragraphs, which we may not have great diffs for yet. Smush these together into big diff blocks.
  • Now, for these blocks, diff sentences.
  • Repeat all of that to diff characters.

This seems to pass all the existing unit tests, and pass new unit tests which I was previously unable to make pass by fiddling with things without changing the algorithm.

Test Plan: Passed existing unit tests. Passed new unit tests.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T7643

Differential Revision: https://secure.phabricator.com/D16839

Details

Committed
epriestley <git@epriestley.com>Nov 10 2016, 21:40
Pushed
aubortMar 17 2017, 12:03
Parents
rPHUc6634479d0f1: Add a `phutil_person()` wrapper for the string extractor
Branches
Unknown
Tags
Unknown

Event Timeline

epriestley <git@epriestley.com> committed rPHUf36c31c991ca: Make prose diff algorithm more iterative, to improve prose diffs for (among… (authored by epriestley <git@epriestley.com>).Nov 10 2016, 21:40