api.html.wml
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Tue, Nov 5, 14:03

api.html.wml
View Options

	## $Id$

	## This file is part of the CERN Document Server Software (CDSware).
	## Copyright (C) 2002, 2003, 2004, 2005 CERN.
	##
	## The CDSware is free software; you can redistribute it and/or
	## modify it under the terms of the GNU General Public License as
	## published by the Free Software Foundation; either version 2 of the
	## License, or (at your option) any later version.
	##
	## The CDSware is distributed in the hope that it will be useful, but
	## WITHOUT ANY WARRANTY; without even the implied warranty of
	## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
	## General Public License for more details.
	##
	## You should have received a copy of the GNU General Public License
	## along with CDSware; if not, write to the Free Software Foundation, Inc.,
	## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

	#include "cdspage.wml" \
	title="BibRank Record Sorter API" \
	navbar_name="hacking-websearch" \
	navtrail_previous_links="<a class=navtrail href=<WEBURL>/hacking/>Hacking CDSware</a> > <a class=navtrail href=index.html>WebSearch Internals</a> " \
	navbar_select="hacking-websearch-api"

	<p>Version <: print generate_pretty_revision_date_string('$Id$'); :>

	<protect>
	<pre>
	CDSware Bibrank Record Sorter can be called from within your Python
	programs via a high-level API and a mid-mid level API.

	1. High-level API

	Description:

	The high-level access to the BibRank Record Sorter is provided
	by exactly the same function as called from the web interface
	when users submit their queries, if a rank method has been
	selected. This should guarantee exactly the same behaviour if
	the same parameters are given.

	There are three thing to note: (i) When a search is done, the
	search engine is sending a HitSet containing all the records that
	matches the query. Since only records in the HitSet are ranked, a
	HitSet must be created containing wished records to rank and be sent
	as a parameter to the function. (ii) Some rank methods may choose to
	ignore the HitSet, like the "Similar Records" function. (iii) In case
	of an error ranking the records, the returned data is different.

	Signature:

	def rank_records(rank_method_code, rank_limit_relevance,
	hitset_global, pattern, verbose=0):
	"""
	rank_method_code - 'jif', 'wrd' or other methods
	rank_limit_relevance - a number defining the threshold of which
	to rank records, may be ignored by rank method.
	hitset_global, search engine hits, if all records should be
	ranked, fill the HitSet with ones.
	pattern, search engine query or record ID, must be a
	list. ['CERN', 'fermilab'] or ['recid:12345']
	verbose, verbose level - 0-9 defines how much debug information
	should be shown

	output if successfull:
	list of records - [123, 321, 12451, 123, 12, 4]
	list of rank values - ascending, same length as the list of
	records [0, 10, 20, 30, 40, 100]
	prefix - text to show before the rank value, '<--' hides rank
	value, defined in config file.
	postfix - text to show after the rank value, '-!>' hides rank
	value, defined in config file.
	verbose_output - the debug text depending on the verbose level.

	output if error:
	list of records - is None
	list of rank values - is None
	prefix - Contains headline of error
	postfix - Error message or error box if exception.
	verbose_output - the debug text depending on the verbose level.
	"""


	Examples:

	>>> # import the function:
	>>> from cdsware.bibrank_record_sorter import rank_records
	>>> # rank all records with the words 'higgs boson' according to the method "wrd"
	>>> rank_records('wrd', 0, a_hitset, ['higgs', 'boson'], 0)
	>>> # find similar records to the record 12345, hitset is here ignored because of 'similar records'
	>>> rank_records('wrd', 0, a_hitset, ['recid:12345'], 0)
	>>> # rank all records based on jif value
	>>> rank_records('jif', 0, a_hitset, [], 0)

	2. Mid-level API

	Description:

	Using the mid-level API, you can call directly the various methods
	for ranking. The functions will not return data in a way the search
	engine understands. They will neither find out if it is the correct
	function that is called, but return an error if wrong code/function
	is used.

	Signatures:
	def combine_method(rank_method_code, pattern, hitset, rank_limit_relevance,verbose):
	-This method calls each method mentioned in the config file and add the results together

	def find_similar(rank_method_code, recID, hitset, rank_limit_relevance,
	-This method finds similar records based on the one given in the recID field. The recID field
	must be a integer value. hitset is ignored. rank_method_code has to be 'wrd'.

	def word_frequency(rank_method_code, lwords, hitset, rank_limit_relevance,verbose):
	-This method ranks records based on the list of words in lwords field. rank_method_code has to be
	'wrd'. Only records in hitset is ranked.

	def rank_by_method(rank_method_code, lwords, hitset, rank_limit_relevance,verbose):
	-All other rank methods uses this function together with data from the rnkMETHODDATA table
	(a dictionary of {recid: (text, value)} to rank the data. Only records in the hitset is ranked.

	These mid-level API functions demands that the function create_rnkmethod_cache() has been called,
	since it loads the config options needed.
	The rank methods returns all the same data:
	([[recid, value],[recid, value]], prefix, postfix, verbose_output)

	Examples:

	>>> # import the function:
	>>> from cdsware.bibrank_record_sorter import find_similar
	>>> # find records similar to 12345, hitset must be full
	>>> find_similar('wrd', 12345, hitset, 0, 0)
	>>> # rank records according to a method called jif, using the single_tag...based method.
	>>> # the list of words is here ignored, only the records in the hitset are used.
	>>> rank_by_method('jif',['higgs'], hitset, 0, 0)
	>>> # rank records containing ['higgs', 'boson'] using word similarity ('wrd')
	>>> word_similarity('wrd',['higgs', 'boson'], hitset, 0, 0)
	>>> # rank records using various methods, which methods to use is read from the config file.
	>>> combine_method('cmb', ['higgs','boson'], hitset, 0, 0)
	</pre>
	</protect>

api.html.wmlNo OneTemporaryActions

File Metadata

api.html.wmlView Options

Event Timeline

api.html.wml
No OneTemporary
Actions

api.html.wml
View Options