Homec4science

BibIndex: Greek stemmer improvements

Authored by Nikolaos Kasioumis <nikolaos.kasioumis@cern.ch> on Feb 25 2012, 16:38.

Description

BibIndex: Greek stemmer improvements

  • Optimizes the greek stemmer class by pre-compiling all the regular expression patterns and other helper variables declaring them as private constants. (closes #908)
  • Changes the Greek stemmer class definition to the new-style class, inheriting object, and changes its name to GreekStemmer to match Invenio's classes naming convention.
  • Changes the name of the main stem function from stem_word to stemWord to match SnowBall PyStemmer's respective stem function.
  • Adds the function stemWords which accepts a list of words and returns a list of their stems, matching SnowBall PyStemmer's respective stemWords function.
  • Adds a function which replaces accented vowels with their non-accented versions and another function which converts all the cased characters into uppercase. These two functions together prepare any given word for the main stem function.

Details

Committed
Tibor Simko <tibor.simko@cern.ch>Feb 29 2012, 11:48
Parents
R3600:4858ab802df1: BibRank: fix citation indexer time stamp updating
Branches
Unknown
Tags
Unknown

Event Timeline

Tibor Simko <tibor.simko@cern.ch> committed R3600:7c269a177e75: BibIndex: Greek stemmer improvements (authored by Nikolaos Kasioumis <nikolaos.kasioumis@cern.ch>).Feb 29 2012, 11:48