Homec4science

BibRank: citesummary optimizations

Authored by Alessio Deiana <alessio.deiana@cern.ch> on Nov 2 2012, 14:00.

Description

BibRank: citesummary optimizations

  • Use the key parameter for sort instead of the cmp parameter to sort by citations faster in the citesummary
  • Still in the citesummary, instead of appending to lists, uses list comprehensions
  • Instead of storing the whole citation dict in memory, we now only store citations counts in memory.
  • Stores citation dict in the database and removes it from memory.
  • Improves code quality in bibrank_citation_indexer.py.
  • In citesummary, sort the citations count first, so that we only have to do it once.
  • Use 2 different methods for sorting citations counts in citesummary:
    • for less than 20000 records, we use the counts dictionary and access by key
    • for more than 20000 records, we use the presorted citations counts list (closes #1481)
  • Split stats computation and html rendering in the citesummary in 2 different functions.
  • With that change, we can now iterate only once through, the citations counts and compute all the stats. (closes #1217)
  • When citesummary was computed for less than 20000 records, we would display the number of self-citations instead of the number of citations.

Signed-off-by: Alessio Deiana <alessio.deiana@cern.ch>

Details

Event Timeline

Samuele Kaplun <samuele.kaplun@cern.ch> committed R3600:742f3500c86a: BibRank: citesummary optimizations (authored by Alessio Deiana <alessio.deiana@cern.ch>).Dec 18 2013, 16:21