Page MenuHomec4science

websearch-admin-guide.webdoc
No OneTemporary

File Metadata

Created
Wed, Jun 5, 01:01

websearch-admin-guide.webdoc

# -*- mode: html; coding: utf-8; -*-
# This file is part of Invenio.
# Copyright (C) 2007, 2008, 2009, 2010, 2011 CERN.
#
# Invenio is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
# published by the Free Software Foundation; either version 2 of the
# License, or (at your option) any later version.
#
# Invenio is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Invenio; if not, write to the Free Software Foundation, Inc.,
# 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
<!-- WebDoc-Page-Title: _(WebSearch Admin Guide)_ -->
<!-- WebDoc-Page-Navtrail: <a class="navtrail" href="<CFG_SITE_URL>/help/admin<lang:link/>">_(Admin Area)_</a> -->
<!-- WebDoc-Page-Revision: $Id$ -->
<p><table class="errorbox">
<thead>
<tr>
<th class="errorboxheader">
WARNING: THIS ADMIN GUIDE IS NOT FULLY COMPLETED
</th>
</tr>
</thead>
<tbody>
<tr>
<td class="errorboxbody">
This Admin Guide is not yet completed. Moreover, some
admin-level functionality for this module exists only in the form of
manual recipes. We are in the process of developing both the
guide as well as the web admin interface. If you are interested
in seeing some specific things implemented with high priority,
please contact us at <CFG_SITE_SUPPORT_EMAIL>. Thanks for your interest!
</td>
</tr>
</tbody>
</table>
<h2>Contents</h2>
<strong>1. <a href="#1">Overview</a></strong><br />
<strong>2. <a href="#2">Edit Collection Tree</a></strong><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.1 <a href="#2.1">Add new collection</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.2 <a href="#2.2">Add collection to tree</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.3 <a href="#2.3">Modify existing tree</a><br />
<strong>3. <a href="#3">Edit Collection Parameters</a></strong><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.1. <a href="#3.1">Modify collection query</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.2. <a href="#3.2">Modify access restrictions</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.3. <a href="#3.3">Modify translations</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.4. <a href="#3.4">Delete collection</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.5. <a href="#3.5">Modify portalboxes</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.6. <a href="#3.6">Modify search fields</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.7. <a href="#3.7">Modify search options</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.8. <a href="#3.8">Modify sort options</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.9. <a href="#3.9">Modify rank options</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.10. <a href="#3.10">Modify output formats</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.11. <a href="#3.11">Configuration of related external collections</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.12. <a href="#3.12">Detailed record page options</a><br />
<strong>4. <a href="#4">External and Hosted Collections</a></strong><br />
<strong>5. <a href="#5">Webcoll Status</a></strong><br />
<strong>6. <a href="#6">Collections Status</a></strong><br />
<strong>7. <a href="#7">Check External Collections</a></strong><br />
<strong>8. <a href="#8">Edit Search Engine Parameters</a></strong><br />
<strong>9. <a href="#9">Search Engine Cache</a></strong><br />
<strong>10. <a href="#10">Search Services</a></strong><br />
<strong>11. <a href="#11">Additional Information</a></strong><br />
<a name="1"></a><h2>1. Overview</h2>
<p>WebSearch Admin interface will help you to configure the search
collections that the end-users see. The WebSearch Admin functionality
can be basically separated into several parts: (i) how to organize
collections into <a href="#2">collection tree</a>; (ii) how to define
and edit <a href="#3">collection parameters</a>; (iii) how to update
collection cache via the <a href="#4">webcoll daemon</a>; and (iv) how
to influence the search engine behaviour and set various <a
href="#5">search engine parameters</a>. These issues will be
subsequently described in the rest of this guide.
<a name="2"></a><h2>2. Edit Collection Tree</h2>
<p>Metadata corpus in Invenio is organized into collections. The
collections are organized in a tree. The collection tree is what the
end-users see when they start to navigate at <a
href="<CFG_SITE_URL>"><CFG_SITE_NAME></a>. The collection tree is similar to what
other sites call Web Directories that organize Web into topical
categories, such as <a href="http://www.google.com/dirhp">Google
Directory</a>.
<p>Note that Invenio permits every collection in the tree to have
either "regular" or "virtual" sons. In other words, every node in the
collection tree may see either regular or virtual branches growing out
of it. This permits to create a tree with very complex, multi-level,
nested structures of regular and virtual branches, if needed, with the
aim to ease navigation to end-users from one branch to another. The
difference between a regular and a virtual branch will be explained in
detail further below in the <a href="#2.2">section 2.2</a>.
<a name="2.1"></a><h3>2.1 Add new collection</h3>
<p>To add a new collection, enter its default name in the default
language of the installation and click on the ADD button
to add it. There are two important actions that you have to perform
after adding a collection:
<ul>
<li>You have to define the set of records that belong to this
collection. This is done by defining a search engine query
that would return all records belonging to this collection.
See hints on <a href="#3.1">modify collection query</a> below.
<li>In order for the collection to appear in the collection
navigation tree, you will have to attach it to some existing
collection in the tree. See hints on <a href="#2.2">add
collection to tree</a> below.
</ul>
<p>After you edit these two things, the collection is fully usable for
the search interface. It will appear in the search interface after
the next run of the <a href="#4">WebColl Daemon</a>.
<p>However, you will probably want to customize further things, like
define collection name translation in various languages, define
collection web page portalboxes, define search options, etc, as
explained in this guide under the section <a href="#3">Edit
Collection Parameters</a>.
<a name="2.2"></a><h3>2.2 Add collection to tree</h3>
<p>To attach a collection to the tree, choose first which collection
do you want to attach, then choose the father collection to attach to,
and then choose the fathership relation type between them (regular,
virtual).
<p>The difference between the regular and the virtual relationship
goes as follows:
<ul>
<li><strong>regular relationship</strong>: If collection A is
composed of B and C, in a way that every document belonging to
A is either B or C, then this schema corresponds to the
regular type of relationship. For example, let A equals to
"Multimedia" and B and C to "Photos" and "Videos",
respectively. The latter collections would then be declared
as regular sons of "Multimedia" and they would appear in the
left-hand-side regular navigation tree entitled "Narrow by
Collection" in the collection tree.
<li><strong>virtual relationship</strong>: In addition to the
regular decomposition of "Multimedia" into "Photos" and
"Videos", it may be advantageous to present a different,
orthogonal point of view on "Multimedia", based not on the
document type as seen above, but rather on the document
creator information. Let us consider that some (large) part
of the multimedia material was created by the "University
Multimedia Service" and some (small) part by an external TV
company such as BBC. It may be advantageous to advertize this
point of view to the end users too, so they they would be able
to easily navigate down to the kind of multimedia material
they are looking for. We can create two more collections
named "University Multimedia Service" and "BBC Pictures and
Videos" and declare them as virtual sons of the "Multimedia"
collection. These collections would then appear in the
right-hand-side virtual navigation tree entitled "Focus on" in
the collection tree.
</ul>
The example presented above would then give us the following picture:
<blockquote>
<pre>
M u l t i m e d i a
Narrow by Collection: Focus on:
-------------------- ---------
[ ] Photos University Multimedia Service
[ ] Videos BBC Pictures and Videos
</pre>
</blockquote>
<p>It is important to note that if a collection A is composed of B and
C as its regular sons, and offers X and Y as its virtual sons, then
every document belonging to A must also belong to either B or C. This
requirement does not apply for X and Y, because X and Y offer only a
"focus-on" orthogonal view on a (possibly small) part of the document
corpus of A. If end-users search the collection A, then they are
actually searching inside B and C, not X and Y. If they want to
search inside X or Y, they have to click upon X or Y first. One can
consider virtual branches as a sort of non-essential searching aid to
the end-user that is activated only when users are interested in a
particular "focus-on" relationship, provided that this "virtual" point
of view on A interests her.
<a name="2.3"></a><h3>2.3 Modify existing tree</h3>
<p>To modify existing tree by WebSearch Admin Interface, click on
icons displayed next to collections. The meaning of icons is as
follows:
<table border="1">
<tr>
<td>
<img border="0" src="<CFG_SITE_URL>/img/iconcross.gif">
</td>
<td>
Remove chosen collection with its subcollections from the collection tree,
but do not delete the collection itself.
(For full deletion of a collection, see <a href="#3.4">section 3.4</a>.)
</td>
</tr>
<tr>
<td>
<img border="0" src="<CFG_SITE_URL>/img/arrow_up.gif"> &nbsp;
<img border="0" src="<CFG_SITE_URL>/img/arrow_down.gif">
</td>
<td>
Move chosen collection up or down among its brothers and sisters, i.e.
change the order of collections inside the same level of the tree.
</td>
</tr>
<tr>
<td>
<img border="0" src="<CFG_SITE_URL>/img/move_from.gif">
<img border="0" src="<CFG_SITE_URL>/img/move_to.gif">
</td>
<td>
Move chosen collection among branches of the tree.
Press the first icon (<img border="0" src="<CFG_SITE_URL>/img/move_from.gif">)
to choose a collection to move, and the second icon
(<img border="0" src="<CFG_SITE_URL>/img/move_to.gif">)
to select a new father collection that the chosen collection should be attached to.
</td>
</tr>
</table>
<a name="3"></a><h2>3. Edit Collection Parameters</h2>
<p>To finalize setting up of a collection, you could and should edit
many parameters, such as define list of records belonging to a
collection, define search fields, define search interface page
portalboxes, etc. In this section we will subsequently describe all
the various possibilities as they are presented in the <a
href="<CFG_SITE_URL>/admin/websearch/websearchadmin.py/editcollection?colID=1">Edit
Collection</a> pages of the WebSearch Admin Interface.
<a name="3.1"></a><h3>3.1 Modify collection query</h3>
<p>The <em>collection query</em> defines which documents belong to the
given collection. It is equal to the search term that retrieves all
documents belonging to the given collection, exactly as you would have
typed it into the search interface. For example, to define a
collection of all papers written by Ellis, you could set up your
collection query to be <code>author:Ellis</code>.
<p>Usually, the collection query is chosen on the basis of the
collection identifier that we store in MARC tag 980. This tag is
indexed in a logical field called <code>collection</code> so that a
collection of Theses could be defined via
<code>collection:THESIS</code>, supposing that every thesis metadata
record has got the text <code>THESIS</code> in MARC tag 980.
(Nitpick: we use the term `collection' in two contexts here: once as a
collection of metadata documents, but also and as a logical field
name. We should have probably called the latter
<code>collectionidentifier</code> or somesuch instead, but we hope the
difference is clear from... the context.)
<p>If a collection does not have any collection query defined, then
its content is defined by means of the content of its descendants
(subcollections). This is the case for composed collections. For
example, the composed collection <em>Articles & Preprints</em> (no
query defined) will be defined as a father of <em>Articles</em>
(query: <code>collection:ARTICLE</code>) and <em>Preprints</em>
(query: <code>collection:PREPRINT</code>). In this case the
collection query for <em>Articles & Preprints</em> can stay empty.
<p>Note that you should avoid defining non-empty collection query in
cases the collection has descendants, since it will prevail and the
descendants may not be taken into account. In the same way, if a
collection doesn't have any query nor any descendants defined, then
its contents will be empty.
<p>To define an external hosted collection set up the query to begin with
<code>hostedcollection:</code> (for more detailed information see <a href="#4">section 4</a>)
<p>To remove the collection query, set the parameter empty.
<a name="3.2"></a><h3>3.2 Modify access restrictions</h3>
<p>Until <em>Invenio-0.92.1</em> there was the possibility to directly
restrict a collection by specifying an Apache group. Users who had an
Apache user and password belonging to the given group would have been able
to access the restricted collection.</p>
<p>Collection restriction managament is now integrated with the wider
<a href="webaccess-admin-guide">Role Based Access Control</a>
facility of Invenio.</p>
<p>In order to restrict access to a collection you just have to create
at least an authorization for the action <code>viewrestrcoll</code>
specifying the name of the collection as the parameter</p>
<a name="3.3"></a><h3>3.3 Modify translations</h3>
<p>You may define translations of collection names into the languages
of your Invenio installation. Moreover, a collection name may be
different in different contexts (e.g. long name, short name, etc), so
that prior to modifying translations you will be asked to select which
name type you want to change.
<p>The interface also lets you customize the labelling (and
translations) of the default collection boxes: "Focus on:", "Narrow
by:" and "Latest addtions:".
<p>The translations aren't mandatory to define. If a translation does
not exist in a language chosen by the end user, the end user will be
shown the collection name in the default language of this
installation.
<p>Note also that the list of available languages depends on the
compile-time configuration (see the general <code>invenio.conf</code>
file).
<a name="3.4"></a><h3>3.4 Delete collection</h3>
<p>The collection to be deleted must be first removed from the
collection tree. Any metametadata associated with the collection
(such as association to portalboxes, association to records belonging
to this collection, etc) will be lost, but the metadata itself will be
preserved (such as portalboxes themselves, records themselves, etc).
In total, association to records, output formats, translations, search
options, sort options, search fields, ranking method, and access
restriction will be lost. Use with care!
<p>It may be a good idea only to remove the collection from the end
users interface, but to keep it "hidden" in a corner they don't see
and that they can't search when they search from Home. To achieve
this, do not delete the collection but simply remove it from the
collection tree so that it won't be attached to any father collection.
In this case the search interface page for this collection will stay
updated, but won't be neither shown in the tree nor searchable from
Home page. It will only be accessible via bookmarked URL, for
example.
<a name="3.5"></a><h3>3.5 Modify portalboxes</h3>
<p>The search interface HTML page for a given collection may be
customized by what we call <em>portalboxes</em>. Portalboxes are used
to show various kinds of information to the end user, such as a text
box with some inline help information about the given collection, an
illustrative picture, etc.
<p>To create a new portalbox, a title and a body must be given, where
the body can contain HTML if necessary.
<p>To add a portalbox to the collection, you must choose an existing
portalbox, the language for which the portalbox should be shown, the
position of the portalbox on the screen, and the ordering score of
portalboxes.
<ul>
<li>The <em>language</em> could be chosen depending on the language
used in the portalbox body. Since a portalbox is not necessarily
bound to one particular language, one portalbox may be reused for
several languages, which is particularly suitable for portalboxes
containing language-independent content such as images.
<li>The <em>position</em> of the portalbox on the screen is chosen
from several predefined positions, such as right-top, before-title,
after-title, before-narrow-by-collection-box, etc. You may present
several portalboxes on the same position in the same language, in
which case they will be shown by the order of decreasing score.
<li>The <em>score</em> defines the order of portalboxes that are to be
presented in the same position and in the same language context.
</ul>
<a name="3.6"></a><h3>3.6 Modify search fields</h3>
<p>The <em>search field</em> is a logical field (such as author,
title, etc) that will be proposed to the end users in Simple and
Advanced Search interface pages. If you do not set any search fields
for a collection, then a default list (author, title, year, etc) will
be shown.
<p>Note that if you want to add a new logical field or modify existing
physical MARC tags for a logical field, you have to use the <a
href="<CFG_SITE_URL>/admin/bibindex/bibindexadmin.py">BibIndex Admin</a> interface.
<a name="3.7"></a><h3>3.7 Modify search options</h3>
<p>The <em>search option</em> is like <a href="#3.6">search field</a>
in a way that it permits the end user to narrow down his search to
some logical field such as "subject", but unlike with the search field
the user is not required to type his query in a free text form;
rather, the search interface proposes to the end user several
interesting predefined values prepared by the administrators that the
end user may choose from. For example, an "author search" concept is
a good example of search field usage, since there is plenty of author
names to be matched, so that the end users would usually type the name
they wish to find in free text form; while a "subject search" concept
is a good example for search option usage, since usually there is a
limited number of subjects in the system given by local subject
classification scheme, that the end users do not necessarily know
about and that they are free to choose from a list. As a rule of
thumb, the search field concept denotes the case of unlimited number
possibilites of distinct values to be matched in a given field
(e.g. author, title, keyword); while the search option concept denotes
the case of only a handful or so distinct values to be matched in a
given field (e.g. subject, division, year).
<p>Search options are shown in the "Advanced Search" interfaces only,
while search fields are shown both in "Simple Search" and "Advanced
Search" interface. (Although if you want to add a search option to
the "Simple Search" interface, you can achieve it by creating
appropriate HTML code in a <a href="#3.5">portalbox</a>.) The search
options order, as well as the order of search option values, may be
defined by means of 'move' arrows in the WebSearch Admin interface.
<p>To add a new search option, a field name must first be chosen (for
example "subject") and then a list of possible field values must be
entered (for example "Mathematics", "Physics", "Chemistry", "Biology",
etc). Note that if you want to add a new logical field or modify
existing physical MARC tags for a logical field, you have to use the
<a href="<CFG_SITE_URL>/admin/bibindex/bibindexadmin.py">BibIndex Admin</a> interface.
<a name="3.8"></a><h3>3.8 Modify sort options</h3>
<p>You may define a list of logical fields that the end users will be
able to choose for the sorting purposes. For example, "first author"
or "year". If you don't select anything, a default list (author,
title, year, etc) will be shown.
<p>Note that if you want to add a new logical field or modify existing
physical MARC tags for a logical field, you have to use the <a
href="<CFG_SITE_URL>/admin/bibindex/bibindexadmin.py">BibIndex Admin</a> interface.
<a name="3.9"></a><h3>3.9 Modify rank options</h3>
<p>To enable a certain rank method for a collection, select the method
from the "enable rank method" box and add it. The documents in this
collection will then be included in the ranking sets the next time the
BibRank daemon will run. To disable a method the process is the same,
but select the method from the 'disable rank method' box.
<p>Note that if you want to add new ranking method or modify existing
ranking method, you have to use the <a
href="<CFG_SITE_URL>/admin/bibrank/bibrankadmin.py">BibRank Admin</a> interface.
<a name="3.10"></a><h3>3.10 Modify output formats</h3>
<p>Each collection may have several output formats defined. The end
users will be able to choose a format they want to see their search
results list in. Most formats like HTML brief or XML Dublin Core are
interesting for each collection, but some formats like HTML portfolio
are only interesting for Photographs collection, not for Articles
collection. The interface will permit you to choose the formats
appropriate for a given collection. The order of formats can be
changed using the 'move' arrows.
<p>Note that if you want to add new output format ('behaviour') or
modify existing output format, you have to use the <a
href="<CFG_SITE_URL>/admin/bibformat/bibformatadmin.py">BibFormat Admin</a> interface.
<a name="3.11"></a><h3>3.11 Configuration of related external collections</h3>
<p>You can customize each collection to provide your users an
additional source of information external to your repository: in a
<i>book</i> collection you might want for example to provide a link to
<i>Amazon</i> items corresponding to the user's query. Futhermore, for
some external services only, you can set the collection to display the
results directly in Invenio search results page.
<p>The following settings are available:
<dl>
<dt>Disabled</dt>
<dd>The external collection is not shown to the user.<dd>
<dt>See also</dt>
<dd>A link to the external collection listing the items corresponding to user's query is displayed (only once a query has been performed).</dd>
<dt>External search</dt>
<dd>User can ask to perform a search in parallel on your repository and on the external collection. Results are shown in the Invenio search results page. Not available for all external collections.<dd>
<dt>External search checked</dt>
<dd>Same as above, but the external collection is searched by default. Not available for all external collections.</dd>
<dl>
<p>You can also apply the settings to sub-collections, by checking the
"<i>Apply also to daughter collections</i>" checkboxes when you apply
your modifications.
<p>Note that in case you have defined an external hosted collection and you are
in fact configuring its related external collections there is no restriction on
setting even itself as "<em>See also</em>", "<em>External search</em>" or
"<em>External search checked</em>"; directly or recursively via the "<i>Apply
also to daughter collections</i>" option. It is up entirely to the admin to
keep a clean and consistent installation (for more detailed information see <a
href="#4">section 4</a>).
<a name="3.12"></a><h3>3.12 Detailed record page options</h3>
<p>These settings let you define how the detailed view (such as <a
href="<CFG_SITE_URL>/<CFG_SITE_RECORD>/1"><CFG_SITE_URL>/<CFG_SITE_RECORD>/1</a>) of records in this
collection will look like. <br/>
More details are available in the <a
href="<CFG_SITE_URL>/help/admin/webstyle-admin-guide#det_page">WebStyle admin
guide</a>.
<p> Please note that since a record might belong to several
collections, conflicts between collection settings might occur. This
is especially true in the case of <i>virtual</i> collections. It is
therefore the settings of the <i>primary collection</i> of the record
which are applied.
<a name="4"></a><h2>4. External and Hosted Collections</h2>
<p>External and hosted collections are a way to provide your users with
additional sources of information. The simplest option is the
"<em>See also</em>" one: it provides a link to the external collection listing
the items corresponding to the user's query. Another option is to set up the
external collection an "<em>External search [checked]</em>". This option implies
a parser implemented for that external collection and allows the user to perform
a parallel search on your repository and on that of the external collection. Read
more on how to set up the above options in section <a href="#3.11">section 3.11</a>.
Also please note that some external resourses might be under copyright restrictions.
<p>Another, more advanced option, are the external hosted collections. The purpose
of these collections is to behave just as if they were local ones. That means the
admin should set them up as local collections and attach them to the tree. These
collections however are not meant to store their records locally but rather to produce
them on the fly when asked to. Once attached to the tree an external hosted collection
appears in the search home page along with its number of records and a small graphic
(arrow in this case) to indicate their being external.
<p>The admin should define a new external collection (any of the above options)
starting with the <code>websearch_external_collections_config.py</code> file, which
consists basically of a python dictionary. Let us go through the process of defining
a new external collection, starting from the dictionary:
<ul>
<li>add a new <code>key:value</code> pair to the dictionary. The key is the
name of the external collection (eg. Amazon Books). The value is another
python dictionary with the parameters of the external collection. Let's go
through these parameters in <code>key:value</code> pairs:<br /><br />
<ul>
<li><code>'engine':the_name_of_engine</code><br />
The name of the search engine (no spaces or special characters allowed
and its implemented python class (eg. for the 'AmazonBooks' engine the
corresponding class should be named AmazonBooksSearchEngine). If not
defined the default ExternalSearchEngine class will be used.
<li><code>'base_url':the_base_url_of_the_external_collection</code><br />
The base url of the external collection, used to create actual hyper
references to the external collection (eg. 'http://books.amazon.com/' ,
'http://www.amazon.com/books/').
<li><code>'search_url':the_search_url_of_the_external_collection</code><br />
The search url of the external collection, to which the search terms
will be later appended and therefore looked up (eg.
'http://books.amazon.com/search.php?title=' ,
'http://www.amazon.com/books/lookup.asp?book=').
<li><code>'parser_params':dictionary_of_the_parameters_of_the_parser</code><br />
The parameters to be passed to the parser. This way a parser can be
dynamically reused for different external collections upon defining
different settings. Let's go through the various parameters:<br /><br />
<ul>
<li><code>'host':the_host_of_the_external_collection</code><br />
The host of the external collection is used to correct the urls
when printing out its results (eg. 'books.amazon.com',
'www.amazon.com').
<li><code>'path':the_path_on_the_host_of_the_external_collection</code><br />
The path, along with the host of the external collection, is used
to correct the urls when printing out its results (eg. '',
'books/').
<li><code>'parser':the_actual_parser_class</code><br />
The actual parser class to be used by the external collection engine.
It should be imported at the beggining of this configuration file
(eg. AmazonBooksExternalCollectionResultsParser,
AmazonExternalCollectionResultsParser).
<li><code>'fetch_format':the_format_to_be_used_to_fetch_data</code><br />
Usually an abbreviated string that defines the format in which
the data should be fetched. The parser must be able to parse this
format (eg. 'hb', 'xm').
<li><code>'num_results_regex_str':the_regular_expression_for_the_number_of_results</code><br />
The regular expression used to calculate the returned number of
results when the external collection is queried (eg.
r'<strong>([0-9,]+?)</strong> records found'). Should preferably
be a python raw string.
<li><code>'num_results_regex_str':the_regular_expression_for_the_total_number_of_records</code><br />
The regular expression used to calculate the total number of records
of an external collection (eg.
r'Searching <strong>([0-9,]+?)</strong> records in total'). This
is to be used by external hosted collections that present their
total number of records in the search home page. Should preferably
be a python raw string.
<li><code>'nbrecs_url':the_url_that_provides_the_total_number_of_records</code><br />
The url that provides information on the total number of records
of an external collection (eg.
'http://books.amazon.com/search.php?show_all=yes'). The regular
expression defined above will be used on the contents of this url.
Again, this is to be used by external hosted collections that
present their total number of records in the search home page.
</ul>
</ul>
</ul>
<p>Once the dictionary <code>key:value</code> pair has been added for the new
external collection the admin should implement (or simply use if already implemented)
the search engine python class defined for this external collection. For the
"<em>See also</em>" option the above steps are sufficient. If the admin wants
to enable the "<em>External search [checked]</em>" option as well a parser must
be (or have been) implemented. Finally to set up an external hosted collection
the admin also has to create a new local collection named exactly as the key of
the external hosted collection's <code>key:value</code> pair in the python
dictionary. The new local collection's query has to begin with
<code>hostedcollection:</code> (under the current configuration it is sufficient
for the query of any external hosted collection to just be defined as
<code>hostedcollection:</code>) and the collection itself has to be attached to
the tree to be visible in the search home page. Note that due to the nature of
external hosted collections their corresponding local collections cannot have any
other collections as sons; in other words they shouldn't have any other branches
growing from them.
<a name="5"></a><h2>5. Webcoll Status</h2>
<p>WebColl is the daemon that normally periodically runs via <a
href="<CFG_SITE_URL>/help/admin/bibsched-admin-guide">BibSched</a> and that updates the
collection cache with the collection parameters configured in the
previous section. Alternatively to running webcoll via BibSched, you
can also run it any time you want from the command line, either for
all collections or for selected collection only. See the --help
option.
<p>The WebSearch Admin interface has got a WebColl Status menu that
shows when the collection cache was last updated and when the next
update is scheduled. It warns in case something suspicious was
discovered.
<a name="6"></a><h2>6. Collections Status</h2>
<p>The Collection Status menu of the WebSearch Admin interface shows
the list of all collections and checks if there is anything wrong
regarding configuration of collections, together with the languages
the collection name has been translated into, etc. Here is the
detailed explanation of the functionality:
<blockquote>
<dl>
<dt><strong>ID</strong>
<dd>ID of the collection.
<dt><strong>Name</strong>
<dd>Name of the collection.
<dt><strong>Query</strong>
<dd>The collection definition query. Note that it should be empty if
a collection got subcollections. If not, then a query is needed.
<dt><strong>Subcollections</strong>
<dd>The subcollections that the collection is composed of. Note that
a collection which got defined by a query should not have any
subcollections.
<dt><strong>Restricted</strong>
<dd>A restricted collection can only be accessed by users belonging to
the Apache groups mentioned in this column.
<dt><strong>Hosted</strong>
<dd>A hosted collection is practicly an external one behaving just as if it were local.
<dt><strong>I18N</strong>
<dd>Show which languages the collection name has been translated into.
<dt><strong>Status</strong>
<dd>If no errors was found, <em>OK</em> is displayed for each
collection. If an error was found, then an error number and short
message are shown. The meaning of the error messages is the
following: <em>1:Conflict</em> means that the collection was defined
via a query but also via subcollections too; <em>2:Empty</em> means
that the collection wasn't defined neither via query nor via
subcollections.
</dl>
</blockquote>
<a name="7"></a><h2>7. Check External Collections</h2>
<p>The Check External Collections menu of the WebSearch Admin interface is a
simple tool to check and control the consistency of the external collections
the user has defined. External collections exist both in their own database
table as well in a user defined configuration file. This tool will check the
consistency between the two and report back to the user giving them the
option to fix any potential inconsistencies.
<a name="8"></a><h2>8. Edit Search Engine Parameters</h2>
<a name="9"></a><h2>9. Search Engine Cache</h2>
<a name="10"></a><h2>10. Search Services</h2>
<p>Search services are meant to display information contextual to a
search query in very specialized way, in the sense that they can
search/retrieve/display data beyond the traditional concept of
records. Typical search services could for example include:
<ul>
<li>Spell-check user queries by calling an external spellchecking library, and offering "<em>Did you mean ...?</em>" options.</li>
<li>Parse user input and display an author profile when searching for a well-defined author.</li>
<li>Search for submission names matching the user input.</li>
<li>Retrieve phone number from the institutional LDAP.</li>
<li>Etc.</li>
</ul>
</p>
<p>More information about search services might be found in each
plug-in file. An
advanced <a href="/help/hacking/search-services">Search Services hacking guide</a>
is also available.</p>
<a name="11"></a><h2>11. Additional Information</h2>
<a href="<CFG_SITE_URL>/help/hacking/search-engine-internals">WebSearch Internals</a>

Event Timeline