BibIndex: fuzzy author name tokenizer
- Introduced fuzzy author name tokenizer.
- Get a tokenizer with b_e_t.BibIndexFuzzyNameTokenizer().
- Call tokenizer.tokenize(name) to get a (potentially long) list of expanded forms, suitable for phrase-indexing. Strings in, lists of strings out.
- Or, call with tokenizer.scan(name) to turn name into an idiosyncratic data structure used by tokenizer.parse_scanned. This structure is a dictionary that tags non-lastnames, lastnames, and titles.
- You can also call tokenizer.parse_scanned(tagged stuff) to generate the expanded forms directly from tagged data.
- Includes unit tests for all of the above that cover something like 98% of the common cases. Effort has been made to test for pathological names also.
(closes: #14366, #64426, #14513)