Homec4science

Implement basic ngram search for Owners Package names

Authored by epriestley <git@epriestley.com> on Dec 21 2015, 21:22.

Description

Implement basic ngram search for Owners Package names

Summary:
Ref T9979. This uses ngrams (specifically, trigrams) to build a reasonably efficient index for substring matching. Specifically, for a package like "Example", with ID 123, we store rows like this:

< ex, 123>
<exa, 123>
<xam, 123>
<amp, 123>
<mpl, 123>
<ple, 123>
<le , 123>

When the user searches for exam, we join this table for packages with tokens exa and xam. MySQL can do this a lot more efficiently than it can process a LIKE "%exam%" query against a huge table.

When the user searches for a one-letter or two-letter string, we only search the beginnings of words. This is probably what they want, the only thing we can do quickly, and a reasonable/expected behavior for typeaheads.

Test Plan:

  • Ran storage upgrades and search indexer.
  • Searched for stuff with "name contains".
  • Used typehaead and got sensible results.
  • Searched for aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz and saw only 16 joins.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9979

Differential Revision: https://secure.phabricator.com/D14846

Details

Committed
epriestley <git@epriestley.com>Dec 22 2015, 17:00
Pushed
aubortJan 31 2017, 17:16
Parents
rPH5c8025c41dad: Add some more consistant NUX to Phame
Branches
Unknown
Tags
Unknown

Event Timeline

epriestley <git@epriestley.com> committed rPH96fe8c0b83cf: Implement basic ngram search for Owners Package names (authored by epriestley <git@epriestley.com>).Dec 22 2015, 17:00