diff --git a/modules/bibrank/doc/admin/guide.html.wml b/modules/bibrank/doc/admin/guide.html.wml
index 57afea49e..f973032de 100644
--- a/modules/bibrank/doc/admin/guide.html.wml
+++ b/modules/bibrank/doc/admin/guide.html.wml
@@ -1,551 +1,564 @@
## $Id$
## This file is part of the CERN Document Server Software (CDSware).
## Copyright (C) 2002 CERN.
##
## The CDSware is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## The CDSware is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDSware; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
#include "cdspage.wml" \
title="BibRank Admin Guide" \
navtrail_previous_links="/admin/
Version <: print generate_pretty_revision_date_string('$Id$'); :>
The bibrank module consist currently of two tools:
bibrank - Generates ranking data for ranking search results based on methods like:
bibrankgkb - For generating knowledge base files for use with bibrankJournal Impact Factor Word Similarity/Similar Records Combined Method ##Number of downloads ##Author Impact ##Citation Impact
- comment line starts with '#' sign in the first column - each section in a configuration file is declared inside '[' ']' signs - values in knowledgebasefiles are separated by '---'
Rank method: A method responsible for creating the necessary data to rank a result. Translations: Each rank method may have many names in many languages. Collections: Which collections the rank method should be visible in.
Usage bibrank: bibrank -wjif -a --id=0-30000,30001-860000 --verbose=9 bibrank -wjif -d --modified='2002-10-27 13:57:26' bibrank -wjif --rebalance --collection=Articles bibrank -wwrd -a -i 234-250,293,300-500 -u admin@cdsware Ranking options: -w, --run=r1[,r2] runs each rank method in the order given -c, --collection=c1[,c2] select according to collection -i, --id=low[-high] select according to doc recID -m, --modified=from[,to] select according to modification date -l, --lastupdate select according to last update -a, --add add or update words for selected records -d, --del delete words for selected records -S, --stat show statistics for a method -R, --rebalance rebalancing rank data: does complete update. if not used: quick update Repairing options: -k, --check check consistency for all records in the table(s) check if update of ranking data is necessary -r, --repair try to repair all records in the table(s) Scheduling options: -u, --user=USER user name to store task, password needed -s, --sleeptime=SLEEP time after which to repeat tasks (no) e.g.: 1s, 30m, 24h, 7d -t, --time=TIME moment for the task to be active (now) e.g.: +15s, 5m, 3h , 2002-10-27 13:57:26 General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1)
jif.kb
- journal impact factor knowledge base)
Phys. Rev., D---3.838 Phys. Rev. Lett.---6.462 Phys. Lett., B---4.213 Nucl. Instrum. Methods Phys. Res., A---0.964 J. High Energy Phys.---8.664
It is adviced to run the BibRank daemon using no parameters, since the default settings then will be used.$ bibrank -wjif -r Task #53 was successfully scheduled for execution.
$ bibrank Task #2505 was successfully scheduled for execution.
$ bibrank 2505 2004-09-07 17:51:46 --> Task #2505 started. 2004-09-07 17:51:46 --> 2004-09-07 17:51:46 --> Running rank method: Number of downloads. 2004-09-07 17:51:47 --> No new records added since last time method was run 2004-09-07 17:52:10 --> 2004-09-07 17:52:10 --> Running rank method: Journal Impact Factor. 2004-09-07 17:52:10 --> No new records added since last time method was run 2004-09-07 17:52:11 --> Reading knowledgebase file: /soft/cdsware-CDSCERNWIENERDEV/etc/bibrank/cern_jif.kb 2004-09-07 17:52:11 --> Number of lines read from knowledgebase file: 420 2004-09-07 17:52:11 --> Number of records available in rank method: 0 2004-09-07 17:52:12 --> 2004-09-07 17:52:12 --> Running rank method: Word frequency 2004-09-07 17:52:13 --> rnkWORD01F contains 256842 words from 677912 records 2004-09-07 17:52:14 --> rnkWORD01F is in consistent state 2004-09-07 17:52:14 --> Using the last update time for the rank method 2004-09-07 17:52:14 --> No new records added. rnkWORD01F is up to date 2004-09-07 17:52:14 --> rnkWORD01F contains 256842 words from 677912 records 2004-09-07 17:52:14 --> rnkWORD01F is in consistent state 2004-09-07 17:52:14 --> Task #2505 finished.
Explanation:[rank_method] function = single_tag_rank_method [single_tag_rank] tag = 909C4p kb_src = /usr/local/cdsware-DEMO/etc/bibrank/jif.kb check_mandatory_tags = 909C4c,909C4v,909C4y
+The kb_src file must contain data on the form: +[rank_method] ##The function which is responsible for doing the work. Should not be changed function = single_tag_rank_method ##This section must be available if the single_tag_rank_method is going to be used [single_tag_kb] ##The tag which got the value to be searched for on the left side in the kb file (like the journal name) tag = 909C4p ##The path to the kb file which got the content of the tag above on left side, and value on the left side kb_src = /log/cdsware-DEMODEV/etc/bibrank/jif.kb ##Tags that must be included for a record to be added to the ranking data, to disable remove tags check_mandatory_tags = 909C4c,909C4v,909C4y +Phys. Rev., D---3.838 +Phys. Rev. Lett.---6.462 +Phys. Lett., B---4.213 +Nucl. Instrum. Methods Phys. Res., A---0.964 +J. High Energy Phys.---8.664 ++The left side must match the content of the tag mentioned in the tag variable.
Explanation:[rank_method] function = word_similarity [word_similarity] -stem_if_avail = yes -stem_query_language = en +stemming = en table = rnkWORD01F -stopword = /soft/cdsware-CDSCERNWIENERDEV/etc/bibrank/stopwords.kb +stopword = True relevance_number_output_prologue = ( relevance_number_output_epilogue = ) -tag1 = 6531_a, 1, en +#relevance_number_output_prologue = +#MARC tag,tag points, tag language +tag1 = 6531_a, 2, en tag2 = 695__a, 1, en tag3 = 6532_a, 1, en tag4 = 245__%, 10, en -tag5 = 246_% , 1, fr +tag5 = 246_%, 1, fr tag6 = 250__a, 1, en tag7 = 711__a, 1, en tag8 = 210__a, 1, en tag9 = 222__a, 1, en tag10 = 520__%, 1, en tag11 = 590__%, 1, fr tag12 = 111__a, 1, en -tag13 = 100__%, 1, none +tag13 = 100__%, 2, none tag14 = 700__%, 1, none tag15 = 721__a, 1, none + [find_similar] max_word_occurence = 0.05 min_word_occurence = 0.00 min_word_length = 3 min_nr_words_docs = 3 max_nr_words_upper = 20 max_nr_words_lower = 10 -override_default_min_relevance = no default_min_relevance = 75
+Tip: When executing a search using a ranking method, you can add "verbose=1" to the list of parameteres +in the URL to see which terms have been used in the ranking.[rank_method] #internal name for the bibrank program, do not modify function = word_similarity [word_similarity] -#if stemmer is available, use it. Adviced to turn off if not installed -stem_if_avail = yes -#the default language to stem the search query in -stem_query_language = en +#if stemmer is available, default stemminglanguage should be given here. Adviced to turn off if not installed +stemming = en #the internal table to load the index tables from. table = rnkWORD01F -#the path to the stopword list. One word per line, words in the file will not be indexed. -stopword = /bibrank/stopwords.kb +#remove stopwords? +stopword = True #text to show before the rank value when the search result is presented. <-- to hide result relevance_number_output_prologue = ( #text to show after the rank value when the search result is presented. --> to hide result relevance_number_output_epilogue = ) #MARC tag,tag points, tag language -#a list of the tags to be used, together with a number describing the importance of the tag, and the most common language for the content. Not all languages are supported. Among the supported ones are: fr/french, en/english, no/norwegian, se/swedish, de/german, it/italian, pt/portugese +#a list of the tags to be used, together with a number describing the importance of the tag, and the +#most common language for the content. Not all languages are supported. Among the supported ones are: +#fr/french, en/english, no/norwegian, se/swedish, de/german, it/italian, pt/portugese #keyword tag1 = 6531_a, 1, en #keyword tag2 = 695__a, 1, en #keyword tag3 = 6532_a, 1, en #keyword tag4 = 245__%, 10, en #title, the words in the title is usually describing a record very good. tag5 = 246_% , 1, fr #french title tag6 = 250__a, 1, en #title tag7 = 711__a, 1, en #title tag8 = 210__a, 1, en #abbreviated tag9 = 222__a, 1, en #key title [find_similar] #term should exist in maximum X/100% of documents max_word_occurence = 0.05 #term should exist in minimum X/100% of documents min_word_occurence = 0.00 #term should be atleast 3 characters long min_word_length = 3 #term should be in atleast 3 documents or more min_nr_words_docs = 3 #do not use more than 20 terms for "find similar" max_nr_words_upper = 20 #if a document contains less than 10 terms, use much used terms too, if not ignore them max_nr_words_lower = 10 -#override minimum relevance value and use the one from search_engine? -override_default_min_relevance = no #default minimum relevance value to use for find similar default_min_relevance = 75
Explanation:[rank_method] function = combine_method [combine_method] method1 = cern_jif,33 method2 = cern_acc,33 method3 = wrd,33 relevance_number_output_prologue = ( relevance_number_output_epilogue = )
[rank_method] #tells which method to use, do not change function = combine_method [combine_method] -#each line tells which method to use, the code is the same as in the BibRank interface, the number describes how much of the total score the method should count. +#each line tells which method to use, the code is the same as in the BibRank interface, the number describes how +#much of the total score the method should count. method1 = jif,50 method2 = wrd,50 #text to be shown before the rank value on the search result screen. relevance_number_output_prologue = ( #text to be shown after the rank value on the search result screen. relevance_number_output_epilogue = )
Usage: bibrankgkb %s [options] Examples: bibrankgkb --input=bibrankgkb.cfg --output=test.kb bibrankgkb -otest.cfg -v9 bibrankgkb Generate options: -i, --input=file input file, default from /etc/bibrank/bibrankgkb.cfg -o, --output=file output file, will be placed in current folder General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1)
##The main section [bibrankgkb] ##The url to a web page with the data to be read, does not need to have the same name as this one, but if there are several links, the url parameter should end with _0-> url_0 = http://www.taelinke.land.ru/impact_A.html url_1 = http://www.taelinke.land.ru/impact_B.html url_2 = http://www.taelinke.land.ru/impact_C.html url_3 = http://www.taelinke.land.ru/impact_DE.html url_4 = http://www.taelinke.land.ru/impact_FH.html url_5 = http://www.taelinke.land.ru/impact_I.html url_6 = http://www.taelinke.land.ru/impact_J.html url_7 = http://www.taelinke.land.ru/impact_KN.html url_8 = http://www.taelinke.land.ru/impact_QQ.html url_9 = http://www.taelinke.land.ru/impact_RZ.html ##The regular expression for the url mentioned should be given here url_regexp = ##The various sources that can be read in, can either be a file, web page or from the database kb_1 = /home/trondaks/w/cdsware/modules/bibrank/etc/cern_jif.kb kb_2 = /home/trondaks/w/cdsware/modules/bibrank/etc/cdsware_jif.kb kb_2_filter = /home/trondaks/w/cdsware/modules/bibrank/etc/convert.kb kb_3 = SELECT id_bibrec,value FROM bib93x,bibrec_bib93x WHERE tag='938__f' AND id_bibxxx=id kb_4 = SELECT id_bibrec,value FROM bib21x,bibrec_bib21x WHERE tag='210__a' AND id_bibxxx=id ##This points to the url above (the common part of the url is 'url_' followed by a number kb_5 = url_%s ##This is the part that will be read by the bibrankgkb tool to determine what to read. ##The first two part (separated by ,,) gives where to look for the conversion file (which convert ##the names between to formats), and the second part is the data source. A conversion file is not ##needed, as shown in create_0. If the source is from a file, url or the database, it must be ##given with file,www or db. If several create lines exists, each will be read in turn, and added ##to a common kb file. ##So this means that: ##create_0: Load from file in variable kb_1 without converting ##create_1: Load from file in variable kb_2 using convertion from file kb_2_filter ##create_3: Load from www using url in variable kb_5 and regular expression in url_regexp ##create_4: Load from database using sql statements in kb_4 and kb_5 create_0 = ,, ,,file,,%(kb_1)s create_1 = file,,%(kb_2_filter)s,,file,,%(kb_2)s #create_2 = ,, ,,www,,%(kb_5)s,,%(url_regexp)s #create_3 = ,, ,,db,,%(kb_4)s,,%(kb_4)s
But in cdsware you are using:COLLOID SURFACE A---1.98
By using a convertion file like:Colloids Surf., A---1.98
You can convert the source to the correct naming convention.COLLOID SURFACE A---Colloids Surf., A
Colloids Surf., A---1.98
$ ./bibrankgkb -v9 2004-03-11 17:30:17 --> Running: Generate Knowledge base. 2004-03-11 17:30:17 --> Reading data from file: /log/cdsware-DEMODEV/etc/bibrank/jif.kb 2004-03-11 17:30:17 --> Reading data from file: /log/cdsware-DEMODEV/etc/bibrank/conv.kb 2004-03-11 17:30:17 --> Using last resource for converting values. 2004-03-11 17:30:17 --> Reading data from file: /log/cdsware-DEMODEV/etc/bibrank/jif2.kb 2004-03-11 17:30:17 --> Converting between naming conventions given. 2004-03-11 17:30:17 --> Colloids Surf., A---1.98 2004-03-11 17:30:17 --> Phys. Rev. Lett.---6.462 2004-03-11 17:30:17 --> J. High Energy Phys.---8.664 2004-03-11 17:30:17 --> Nucl. Instrum. Methods Phys. Res., A---0.964 2004-03-11 17:30:17 --> Phys. Lett., B---4.213 2004-03-11 17:30:17 --> Phys. Rev., D---3.838 2004-03-11 17:30:17 --> Total nr of lines: 6 2004-03-11 17:30:17 --> Time used: 0 second(s).