CDS Invenio Bibrank Record Sorter can be called from within your Python programs via a high-level API and a mid-mid level API. 1. High-level API Description: The high-level access to the BibRank Record Sorter is provided by exactly the same function as called from the web interface when users submit their queries, if a rank method has been selected. This should guarantee exactly the same behaviour if the same parameters are given. There are three thing to note: (i) When a search is done, the search engine is sending a HitSet containing all the records that matches the query. Since only records in the HitSet are ranked, a HitSet must be created containing wished records to rank and be sent as a parameter to the function. (ii) Some rank methods may choose to ignore the HitSet, like the "Similar Records" function. (iii) In case of an error ranking the records, the returned data is different. Signature: def rank_records(rank_method_code, rank_limit_relevance, hitset_global, pattern, verbose=0): """ rank_method_code - 'jif', 'wrd' or other methods rank_limit_relevance - a number defining the threshold of which to rank records, may be ignored by rank method. hitset_global, search engine hits, if all records should be ranked, fill the HitSet with ones. pattern, search engine query or record ID, must be a list. ['CERN', 'fermilab'] or ['recid:12345'] verbose, verbose level - 0-9 defines how much debug information should be shown output if successfull: list of records - [123, 321, 12451, 123, 12, 4] list of rank values - ascending, same length as the list of records [0, 10, 20, 30, 40, 100] prefix - text to show before the rank value, '<--' hides rank value, defined in config file. postfix - text to show after the rank value, '-!>' hides rank value, defined in config file. verbose_output - the debug text depending on the verbose level. output if error: list of records - is None list of rank values - is None prefix - Contains headline of error postfix - Error message or error box if exception. verbose_output - the debug text depending on the verbose level. """ Examples: >>> # import the function: >>> from invenio.bibrank_record_sorter import rank_records >>> # rank all records with the words 'higgs boson' according to the method "wrd" >>> rank_records('wrd', 0, a_hitset, ['higgs', 'boson'], 0) >>> # find similar records to the record 12345, hitset is here ignored because of 'similar records' >>> rank_records('wrd', 0, a_hitset, ['recid:12345'], 0) >>> # rank all records based on jif value >>> rank_records('jif', 0, a_hitset, [], 0) 2. Mid-level API Description: Using the mid-level API, you can call directly the various methods for ranking. The functions will not return data in a way the search engine understands. They will neither find out if it is the correct function that is called, but return an error if wrong code/function is used. Signatures: def combine_method(rank_method_code, pattern, hitset, rank_limit_relevance,verbose): -This method calls each method mentioned in the config file and add the results together def find_similar(rank_method_code, recID, hitset, rank_limit_relevance, -This method finds similar records based on the one given in the recID field. The recID field must be a integer value. hitset is ignored. rank_method_code has to be 'wrd'. def word_frequency(rank_method_code, lwords, hitset, rank_limit_relevance,verbose): -This method ranks records based on the list of words in lwords field. rank_method_code has to be 'wrd'. Only records in hitset is ranked. def rank_by_method(rank_method_code, lwords, hitset, rank_limit_relevance,verbose): -All other rank methods uses this function together with data from the rnkMETHODDATA table (a dictionary of {recid: (text, value)} to rank the data. Only records in the hitset is ranked. These mid-level API functions demands that the function create_rnkmethod_cache() has been called, since it loads the config options needed. The rank methods returns all the same data: ([[recid, value],[recid, value]], prefix, postfix, verbose_output) Examples: >>> # import the function: >>> from invenio.bibrank_record_sorter import find_similar >>> # find records similar to 12345, hitset must be full >>> find_similar('wrd', 12345, hitset, 0, 0) >>> # rank records according to a method called jif, using the single_tag...based method. >>> # the list of words is here ignored, only the records in the hitset are used. >>> rank_by_method('jif',['higgs'], hitset, 0, 0) >>> # rank records containing ['higgs', 'boson'] using word similarity ('wrd') >>> word_similarity('wrd',['higgs', 'boson'], hitset, 0, 0) >>> # rank records using various methods, which methods to use is read from the config file. >>> combine_method('cmb', ['higgs','boson'], hitset, 0, 0)