Page Menu
Home
c4science
Search
Configure Global Search
Log In
Files
F90875375
api.html.wml
No One
Temporary
Actions
Download File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Subscribers
None
File Metadata
Details
File Info
Storage
Attached
Created
Tue, Nov 5, 14:03
Size
6 KB
Mime Type
text/x-c
Expires
Thu, Nov 7, 14:03 (2 d)
Engine
blob
Format
Raw Data
Handle
22150716
Attached To
R3600 invenio-infoscience
api.html.wml
View Options
## $Id$
## This file is part of the CERN Document Server Software (CDSware).
## Copyright (C) 2002, 2003, 2004, 2005 CERN.
##
## The CDSware is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## The CDSware is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDSware; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
#include "cdspage.wml" \
title="BibRank Record Sorter API" \
navbar_name="hacking-websearch" \
navtrail_previous_links="<a class=navtrail href=<WEBURL>/hacking/>Hacking CDSware</a> > <a class=navtrail href=index.html>WebSearch Internals</a> " \
navbar_select="hacking-websearch-api"
<p>Version <: print generate_pretty_revision_date_string('$Id$'); :>
<protect>
<pre>
CDSware Bibrank Record Sorter can be called from within your Python
programs via a high-level API and a mid-mid level API.
1. High-level API
Description:
The high-level access to the BibRank Record Sorter is provided
by exactly the same function as called from the web interface
when users submit their queries, if a rank method has been
selected. This should guarantee exactly the same behaviour if
the same parameters are given.
There are three thing to note: (i) When a search is done, the
search engine is sending a HitSet containing all the records that
matches the query. Since only records in the HitSet are ranked, a
HitSet must be created containing wished records to rank and be sent
as a parameter to the function. (ii) Some rank methods may choose to
ignore the HitSet, like the "Similar Records" function. (iii) In case
of an error ranking the records, the returned data is different.
Signature:
def rank_records(rank_method_code, rank_limit_relevance,
hitset_global, pattern, verbose=0):
"""
rank_method_code - 'jif', 'wrd' or other methods
rank_limit_relevance - a number defining the threshold of which
to rank records, may be ignored by rank method.
hitset_global, search engine hits, if all records should be
ranked, fill the HitSet with ones.
pattern, search engine query or record ID, must be a
list. ['CERN', 'fermilab'] or ['recid:12345']
verbose, verbose level - 0-9 defines how much debug information
should be shown
output if successfull:
list of records - [123, 321, 12451, 123, 12, 4]
list of rank values - ascending, same length as the list of
records [0, 10, 20, 30, 40, 100]
prefix - text to show before the rank value, '<--' hides rank
value, defined in config file.
postfix - text to show after the rank value, '-!>' hides rank
value, defined in config file.
verbose_output - the debug text depending on the verbose level.
output if error:
list of records - is None
list of rank values - is None
prefix - Contains headline of error
postfix - Error message or error box if exception.
verbose_output - the debug text depending on the verbose level.
"""
Examples:
>>> # import the function:
>>> from cdsware.bibrank_record_sorter import rank_records
>>> # rank all records with the words 'higgs boson' according to the method "wrd"
>>> rank_records('wrd', 0, a_hitset, ['higgs', 'boson'], 0)
>>> # find similar records to the record 12345, hitset is here ignored because of 'similar records'
>>> rank_records('wrd', 0, a_hitset, ['recid:12345'], 0)
>>> # rank all records based on jif value
>>> rank_records('jif', 0, a_hitset, [], 0)
2. Mid-level API
Description:
Using the mid-level API, you can call directly the various methods
for ranking. The functions will not return data in a way the search
engine understands. They will neither find out if it is the correct
function that is called, but return an error if wrong code/function
is used.
Signatures:
def combine_method(rank_method_code, pattern, hitset, rank_limit_relevance,verbose):
-This method calls each method mentioned in the config file and add the results together
def find_similar(rank_method_code, recID, hitset, rank_limit_relevance,
-This method finds similar records based on the one given in the recID field. The recID field
must be a integer value. hitset is ignored. rank_method_code has to be 'wrd'.
def word_frequency(rank_method_code, lwords, hitset, rank_limit_relevance,verbose):
-This method ranks records based on the list of words in lwords field. rank_method_code has to be
'wrd'. Only records in hitset is ranked.
def rank_by_method(rank_method_code, lwords, hitset, rank_limit_relevance,verbose):
-All other rank methods uses this function together with data from the rnkMETHODDATA table
(a dictionary of {recid: (text, value)} to rank the data. Only records in the hitset is ranked.
These mid-level API functions demands that the function create_rnkmethod_cache() has been called,
since it loads the config options needed.
The rank methods returns all the same data:
([[recid, value],[recid, value]], prefix, postfix, verbose_output)
Examples:
>>> # import the function:
>>> from cdsware.bibrank_record_sorter import find_similar
>>> # find records similar to 12345, hitset must be full
>>> find_similar('wrd', 12345, hitset, 0, 0)
>>> # rank records according to a method called jif, using the single_tag...based method.
>>> # the list of words is here ignored, only the records in the hitset are used.
>>> rank_by_method('jif',['higgs'], hitset, 0, 0)
>>> # rank records containing ['higgs', 'boson'] using word similarity ('wrd')
>>> word_similarity('wrd',['higgs', 'boson'], hitset, 0, 0)
>>> # rank records using various methods, which methods to use is read from the config file.
>>> combine_method('cmb', ['higgs','boson'], hitset, 0, 0)
</pre>
</protect>
Event Timeline
Log In to Comment