diff --git a/modules/bibindex/doc/admin/guide.html.wml b/modules/bibindex/doc/admin/guide.html.wml index 9213e8df7..4f27fd6a8 100644 --- a/modules/bibindex/doc/admin/guide.html.wml +++ b/modules/bibindex/doc/admin/guide.html.wml @@ -1,139 +1,139 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="BibIndex Admin Guide" \ navtrail_previous_links="/admin/> > /admin/bibindex/>BibIndex Admin" \ navbar_name="admin" \ navbar_select="bibindex-admin-guide"

WARNING: BIBINDEX ADMIN GUIDE IS UNDER DEVELOPMENT
BibIndex Admin Guide is not yet completed. Most of admin-level functionality for BibIndex exists only in commandline mode. We are in the process of developing both the guide as well as the web admin interface. If you are interested in seeing some specific things implemented with high priority, please contact us at . Thanks for your interest!

Version <: print generate_pretty_revision_date_string('$Id$'); :>

Contents

1.Overview
2. Configure Metadata Tags and Fields
       2.1 Configure Physical MARC Tags
       2.2 Configure Logical Fields
3. Configure Word/Phrase Indexes
       3.1 Define New Index
       3.2 Configure Word-Breaking Procedure
       3.3 Configure Stopwords List
       3.4 Configure Stemming
       3.5 Configure Word Length
       3.6 Configure Removal of HTML Code
       3.7 Configure Accent Stripping
4. Run BibIndex Daemon

1. Overview

2. Configure Metadata Tags and Fields

2.1 Configure Physical MARC Tags

2.2 Configure Logical Fields

3. Configure Word/Phrase Indexes

3.1 Define New Index

 To define a new index you must first give the index a internal name. An empty
 index is then created by preparing the database tables. 
 
 Before the index can be used for searching, the fields that should be included
 in the index must be selected.
 
 When desired to fill the index based on the fields selected, you can schedule
 the update by running 'bibindex -w indexname' together with other
 desired parameters.
 

3.2 Configure Word-Breaking Procedure

 Can be configured by changing 'cfg_chars_alphanumericseparators' and 
 'cfg_chars_punctuation' in 'bibindex_engine_config.py'.
 
 How the words are broken up defines what is added to the index. Should only
 "director-general" be added, or should "director", "general" and "director-general"
 be added? The index can vary between 300 000 and 3 000 000 terms based the policy
 for breaking words.
 

3.3 Configure Stopwords List

 Bibindex supports stopword removal by not adding words which exists in a given stopword 
 list to the index. Stopword removal makes the index smaller by removing much used words.
 
 Which stopword list that should be used can be configured in the bibindex_engine_config.py
-file by changing the value of the variable cfg_path_stopwordlist. If no stopword list should
+file by changing the value of the variable cfg_path_to_stopwords_file. If no stopword list should
 be used, the value should be None.
 

3.4 Configure stemming

 The BibIndex indexer supports stemming, removing the ending of words thus creating a smaller 
 indexer. For example, using english, the word "information" will be stemmed to 
 "inform", "looking", "looks", "looked" will be stemmed to "look", thus giving more hits to 
 each word. 
 
 Currently only one stemmer is supported, so the stemmer to use should be selected based on the most 
 used language. All searches will also be stemmed based on the same language. For documents in other 
 languages, there will be no difference if stemmer is used or not.
 
 The Stemmer currently supported, supports the following languages:
 French, English, Norwegian, Swedish, German, Italian and Portugese.
 
 If another than the default stemmer should be used, the file 'bibindex_engine_stemmer.py' 
 must be changed to support the desired stemmers interface.
 
 To change the default language to use for the stemmer, change the variable 
-'cfg_use_stemmer_lang' in 'bibindex_engine_config.py'. 
+'cfg_stemmer_default_language' in 'bibindex_engine_config.py'. 
 To disable use of stemmer, set the value to None.
 

3.5 Configure Word Length

 By setting the value of 'cfg_min_word_length' in 'bibindex_engine_config.py' 
 higher than 0, only words with the number of characters higher than this will be added 
 to the index.
 

3.6 Configure Removal of HTML Code

 By setting the value of 'cfg_remove_html_code' in 'bibindex_engine_config.py' 
 to True, the indexer will try to remove all HTML code from documents before indexing, and
 index only the text left. Setting it to False disable it. (HTML code is defined as everything
 between '<' and '>' in a text.)
 

3.7 Configure Accent Stripping

4. Run BibIndex Daemon