diff --git a/modules/bibformat/doc/admin/guide.html.wml b/modules/bibformat/doc/admin/guide.html.wml
index c626cd637..3883e5348 100644
--- a/modules/bibformat/doc/admin/guide.html.wml
+++ b/modules/bibformat/doc/admin/guide.html.wml
@@ -1,2539 +1,2558 @@
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
#include "cdspage.wml" \
title="BibFormat Admin Guide" \
navtrail_previous_links="/admin/>_(Admin Area)_ > /admin/bibformat/>BibFormat Admin" \
navbar_name="admin" \
navbar_select="bibformat-admin-guide"
Version <: print generate_pretty_revision_date_string('$Id$'); :>
Please note that the old PHP BibFormat administration guide can be found
further below.
BibFormat is in charge of formatting the bibliographic records that
are displayed to your users. It is called by the search engine when it has to
format a record.
As you might need different kind of formatting depending
on the type of record, but potentially have a huge amount of records in your database, you cannot specify
for each of them how they should look. Instead BibFormat uses a rule-based decision process
to decide how to format a record.
The best way to understand how BibFormat works is to have a look at
a typical workflow:
Step 1:
When CDS Invenio has to display a record, it
asks BibFormat to format the record with the given output format
and language. For example here the requested output format is
hd, which is a short code
for "HTML Detailed". This means that somehow a user arrived on
the page of the record and asked for a detailed view of the
record.
Step 2:
Beside is a screenshot of the "hd" or "HTML Detailed" output format.
You can see that the output format does not specify how to format the record, but
contains a set of rules which define which template must be used.
The rules are evaluated from top to bottom.
Each rule defines a condition on a field of the record, and a format template to use to
format the record if the condition matches.
Let's say that the field 980.a of the record is equal to
"Picture". Then first rules matches, and format template
Picture HTML Detailed is
used for formatting by BibFormat.
You can add, remove or edit output formats here
We see an extract of the Picture HTML Detailed format on the right,
as it is shown in the template editor. As you can see it
is mainly written using HTML. There are however some tags that
are not part of standard HTML. Those tags that starts with
<BFE_ are placeholders for the record values. For
example <BFE_MAIN_TITLE/> tells BibFormat to write the title
of the record. We call these tags "elements". Some
elements have parameters. This is the case of the <BFE_AUTHORS> element,
which can take separator and link as
parameters. The value of separator will be used to separate
authors' names and the link parameter tells if links to authors'
websites have to be created.
All elements are described in the elements documentation.
You can add, remove or edit format templates here
Step 4:
def format(bfo, separator='; ', link='no'):
"""
Prints the list of authors for the record
@param separator a character to separate the authors
@param link if 'yes' print HTML links to authors
"""
authors = bfo.fields("100__a")
if link == 'yes':
authors = map(lambda x: '<a href="'+weburl+'/search?f=author&p='\
+ quote(x) +'">'+x+'</a>', authors)
return authors.split(separator)
A format element is written in Python. It acts as a bridge
between the record in the database and the format
template. Typically you will not have to write or read format
elements, juste call them from the templates. Each element outputs
some text that is written in the template where it is called.
Developers can add new elements by creating a new file, naming it
with the name of element, and write a Python format
function that takes as parameters the parameters of the elements
plus a special one bfo. Regular Python code can be
used, including import of other modules.
In summary BibFormat is called by specifying a record and an output
format, which relies on different templates to do the formatting, and
which themselves rely on different format elements. Only developers need to modify
the format elements layer.
Output Format
Template
Template
Format Element
Format Element
Format Element
Format Element
You should now understand the philosophy behind BibFormat.
Let's try to create our own format.
This format will just print the title of a record.
First go to the main BibFormat admin page.
Then click on the "Manage Ouput Format" links. You will see the list of all output formats:
This is were you can delete, create or check output formats.
The menu at the top of the page let you go to other admininistration pages.
Click on the "Add New Output Format" button at the bottom of the page. You can then fill in some attributes
for the output format. Choose "title" as code, "Only Title" as name and "Prints only title" as description:
Leave other fields blank, and click on the button "Update Output format Attributes". You are then
redirected to the rules editor. Notice the menu at the top which let you close the editor, change the attributes again
and check the output format. However do not click on these links before saving your modification of rules!
As our format does not need to have a different behaviour depending on the record, we do not need to add new rules to the format. You just need to select a format template in the "By default use" list. However we first have to create our special format template that only print titles. So close the editor using the menu at the top of the page, and in the menu that just appeared instead, click on "Manage Format Templates". In a similar way to output formats, you see the list of format templates.
Click on the "Add New Format Template" button at the bottom of the page. As for the output format, fill in the attributes of the template with name "Title" and any relevant description.
Click on the "Update Output Format Attributes" button. You are redirected to the template editor. The editor is divided in three parts. The upper left part contains the code of the template. The bottom part is a preview of the template. The part on the right side is a short remainder of the format elements you can use in you template. You can hide this documentation by clicking on "Hide Documentation".
The above screenshot shows the template code already filled in. It calls the BFE_TITLE element. If you do not know the name of the element you want to call, you can search for it using the embedded documentation search. You can try to add other elements into your template, or write some HTML formatting.
When you are satisfied with your template, click on the save button, close the editor and go back to the "Only titles" output format rules editor. There select the template you have just created in the "Use by default" menu and save the ouput format and you are done.
This tutorial does not cover all aspects of the management of formats (For example "Knowledge bases" or internationalization). It also does not show all the power of output formats, as the one we have created simply call a template. However you have seen enough to configure BibFormat trough the web interface. Read the sections below to learn more about it.
BibFormat can be administered in two ways. The first way is to use the provided web interface. It should be the most
convenient way of doing for most users. The web interface is simple to use and provides great tools to manage your formats. Its only limitation concerns the format elements, which cannot be modified using it (But the web interface provide a dynamically generated documentation of your elements).
The other way to administer BibFormat is to directly modify the configuration files using your preferred text editor. This way of doing can bring much power to advanced users, but requires an access to the server's files. It also requires that the user double-check his modifications, or use the web interface to ensure the validity and correctness of his formats.
In this manual we will show both ways of doing. For each explication we show first how to do it through the web interface, then how to do it by manipulating the configuration files. Non-power users can stop reading as soon as they encounter the text "For developers and adventurers only".
We generally recommend to use the web interface, excepted for writing
format elements.
As you potentially have a huge amount of
bibliographic records, you cannot specify manually for each of them
how it should be formatted. This is why you can define rules that will
allow BibFormat to understand which kind of formatting to apply to a given
record. You define this set of rules in what is called an "output
format".
You can have different output formats, each with its own characteristics.
For example you certainly want that when multiple bibliographic records are
displayed at the same time (as it happens in search results), only
short versions are shown to the user , while a detailed record is
preferable when a single record is displayed, whatever the type of the record.
You might also want to
let your users decide which kind of output they want. For example you
might need to display HTML for regular web browsing, but would also
give a BibTeX version of the bibliographic reference for direct
inclusion in a LaTeX document.
To summarize, an output format groups similar kind of formats, specifying which kind
of formatting has to be done, but not how it has to be done.
To add a new output format, go to the Manage Output Formats page and click on the "Add New Output Format" button at the bottom of the page. The format has been created. You can then specify the attributes of the output format. See Edit the Attributes of an Output Format to learn more about it.
For developers and adventurers only:
Alternatively you can directly add a new output format file into the
/etc/bibformat/outputs/ directory of your CDS Invenio installation, if you have
access to the server's files. Use the format extension .bfo for your file.
You should also check that user www-data has read/write access to the file,
if you want to be able to modify the rules through the web interface.
To remove an output format, go to the Manage Output Formats page and click on the "Delete" button facing the output format you want to delete. If you cannot click on the button (the button is not enabled), this means that you do not have sufficent priviledge to do so (Format is protected. Contact the administrator of the system).
For developers and adventurers only:
You can directly remove an output format from the /etc/bibformat/outputs/ directory of your CDS Invenio installation.
However you must make sure that it is removed from the tables format and formatname in the database, so that other modules know that it is not longer available.
When you create a new output format, you can at first only specify the default template,
that is the one which is used when all rules fail. In the case of a basic output format,
this is enough. You can however add other rules, by clicking on the "Add New Rule" button.
Once you have added a rule, you can fill it with a condition, and a template that should be used
if the condition is true. For example the rule
will use template named "Picture HTML Detailed" if the field 980.a of the record to format is equal to "Picture".
Note that text "PICTURE" will match any letter case like "picture" or "Picture".
Leading and trailing spaces are ignored too (" Picture " will match "PICTURE").
Tips: you can use a regular expression as text. For example "PICT.*" will match "pictures"
and "PICTURE".
The above configuration will use format template "Default HTML Detailed" if all above rules fail (in that case
if field 980.a is different from "PICTURE"). If you have more rules, you decide in which order the conditions are evaluated. You can reorder rules by clicking on the small arrows on the left of the rules.
-
+
Note that when you are migrating your output formats from the old PHP BibFormat, you might not have translated all the formats to which your output formats refers. In that case you should use defined in old BibFormat option in the format templates menu, to make BibFormat understand that a match for this rule must trigger a call to the Behaviour of the old BibFormat. See section on Run old and new formats side by side for more details on this.
For developers and adventurers only:
To write an output format, use the following syntax:
First you
define which field code you put as the conditon for the rule.
You suffix it with a column. Then on next lines, define the values of
the condition, followed by --- and then the filename of the template
to use:
This means that if value of field 980.a is equal to PICTURE, then we
will use format template PICTURE_HTML_BRIEF.bft. Note that you must
use the filename of the template, not the name. Also note that spaces
at the end or beginning are not considered. On the following lines,
you can either put other conditions on tag 980.a, or add another tag on
which you want to put conditions.
At the end you can add a default condition:
default: PREPRINT_HTML_BRIEF.bft
which means that if no condition is matched, a format suitable for
Preprints will be used to format the current record.
The output format file could then look like this:
tag 980.a:
PICTURE --- PICTURE_HTML_BRIEF.bft
PREPRINT --- PREPRINT_HTML_BRIEF.bft
PUBLICATION --- PUBLICATION_HTML_BRIEF.bft
tag 8560.f:
.*@cern.ch --- SPECIAL_MEMBER_FORMATTING.bft
default: PREPRINT_HTML_BRIEF.bft
You can add as many rules as you want. Keep in mind that they are read
in the order they are defined, and that only first rule that
matches will be used.
Notice the condition on tag 8560.f: it uses a regular expression to
match any email address that ends with @cern.ch (the regular
expression must be understandable by Python)
code: a short identifier that is used to identify the output format. It must be unique and contain a maximum of 6 letters. Note that the code is not case sensitive ("HB" is equal to "hb").
content type: this is the content type of the format, specified in Mime. For example if you were to produce an Excel output, you could use application/ms-excel as content type. If a content type is specified, CDS Invenio will not print the usual header and footerfor the page, but will trigger a download in the client's browser when viewing the page (Unless the browser handles this content type).
name: a generic name to display in the interface for this output format.
(*) name: internationalized names for the output format, used for displaying localized name in the search interface.
description: an optional description for the output format.
Please read this information regarding output format codes:
There are some reserved codes that you should not use, or at least be aware of when choosing a code for your
output format. The table below summarizes these special words:
Code
Purpose
HB
Used for displaying list of results of a search.
HD
Used when no format is specified when viewing a record.
HM
Used for Marc output. The format is special in the sense that it filters
fields to display according to the 'ot' GET parameter of the HTTP request.
Starting with letter 't'
Used for displaying the value of the field specified by the 'ot' GET parameter of the HTTP request.
Starting with 3 digits
Used for displaying the value of the field specified by the digits.
For developers and adventurers only:
Excepted for the code, output format attributes cannot be changed in the output format file. These
attributes are saved in the database. As for the code, it is the name of the output format file,
without its .bfo extension. If you change this name, do not forget to propagate the modification in the database.
To check the dependencies of an output format on format templates, format elements and tags,
go to the Manage Output Formats page, click on
the output format you want to check, and then in the menu click on "Check Dependencies".
The next page shows you:
the format templates which might be called by the rules of the output format
To check the validity of an output format, simply go to the Manage Output Formats page, and look at the column 'status' for the output format you want to check. If message "Ok" is there,
then no problem was found with the output format. If message 'Not Ok' is in the column, click on it to see
the problems that have been found for the output format.
A format template defines the how a record should be formatted. For example it specifies which fields of the record are to be displayed, in which order and with which visual attributes. Basically the format template is written in HTML, so that it is easy for anyone to edit it.
To add a new format template, go to the Manage Format Templates page and click on the "Add New Format Template" button at the bottom of the page. The format has been created. You can then specify the attributes of the format template. See Edit the Attributes of a Format Template to learn more about it.
For developers and adventurers only:
Alternatively you can directly add a new format template file into the
/etc/bibformat/format_templates/ directory of your CDS Invenio installation, if you have
access to the server's files. Use the format extension .bft for your file.
You should also check that user www-data has read/write access to the file,
if you want to be able to modify the code and the attributes of the template through the web interface.
To remove a format template, go to the Manage Format Templates page and click on the "Delete" button facing the format template you want to delete. If you cannot click on the button (the button is not enabled), this means that you do not have sufficent priviledge to do so (Format is protected. Contact the administrator of the system).
For developers and adventurers only:
You can directly remove the format template from the /etc/bibformat/format_templates/ directory of your CDS Invenio installation.
You can change the formatting of records by modifying the code of a template.
To edit the code of a format template
go to the Manage Format Templates page. Click on
the format template you want to edit to load the template editor.
The format template editor contains three panels. The left upper panel is the code editor. This is were
you write the code that specifies the formatting of a template. The right-most panel is a short documentation
on the "bricks" you can use in your format template code. The panel at the bottom of the page allows you to preview the template.
The following sections explain how to write the code that specifies the formatting.
The first thing you have to know before editing the code is that everything you write in the
code editor is printed as such by BibFormat. Well almost everything (as you will discover later).
For example if you write "My Text", then for every record the output will be "My Text". Now let's say
you write "<b>My Text</b>": the output will still be "<b>My Text</b>", but as we display in a web browser, it will look like
"My Text" (The browser interprets the text inside tags <b></b> as "bold". Also note that the look may depend on the CSS style of your page).
Basically it means that you can write HTML to do the formatting. If you are not experienced with HTML you can use an HTML editor to create your layout, and the copy-paste the HTML code inside the template.
Do not forget to save your work by clicking on the save button before you leave the editor!
For developers and adventurers only:
You can edit the code of a template using exactly the same syntax as in the web interface. The code of the template
is in the template file located in the /etc/bibformat/format_templates/ directory of your CDS Invenio installation. You just
have to take care of the attributes of the template, which are saved in the same file as the code. See Edit the Attributes of a Format Template to learn more about it.
To add a dynamic behaviour to your format templates, that is display for example a different title
for each record or a different background color depending on the type of record, you can use the format elements.
Format elements are the smart bricks you can copy-paste in your code to get the attributes of template
that change depending on the record. A format element looks like a regular HTML tag.
For example, to print
the title of a record, you can write <BFE_TITLE /> in your template code where you want to diplay the title
Format elements can take values as parameters. This allows to customize the behaviour of an element. For example you can write <BFE_TITLE prefix="Title: " />, and BibFormat will take care of printing the title for you, with prefix "Title: ". The difference between Title: <BFE_TITLE /> and <BFE_TITLE prefix="Title: " /> is that the first option will always write "Title: " while the second one will only print "Title: " if there exist a title for the record in the database. Of course there are chances that there is always a title for each record, but this can be useful for less common fields.
Some parameters are available for all elements. This is the case for the following ones:
prefix: a prefix printed only if the record has a value for the element.
suffix: a suffix printed only if the record has a value for the element.
default: a default value printed if the record has no value for the element. In that case prefix and suffix are not printed.
Some parameters are specific to elements. To get information on all available format elements you can read the Format Elements Documentation, which is generated dynamically for all existing elements. it will show you what the element do and what parameters it can take.
While format elements looks like HTML tags, they differ in the followings ways from traditional ones:
A format element is a single tag: you cannot have <BFE_TITLE >some text<BFE_TITLE /> but only <BFE_TITLE />.
The values of the parameters accept any characters, including < and >. The only limitation is that you cannot use the type of quotes that delimit that value: you can have for example <BFE_TITLE someParam="a lot of single quotes ' ' ' ' "/> or <BFE_TITLE someParam='a lot of double quotes " " " '/>, but not <BFE_TITLE someParam="a lot of same quotes as delimiter " " " "/>.
Format elements names always start with BFE_.
Format element can expand on multiple lines.
Tips: you can use the special element <BFE_FIELD tag="" /> to print the value
of any field of a record in your templates. This practice is however not
recommended because it would necessitate to revise all format
templates if you did change the meaning of the MARC code schema.
To preview a format template go to the Manage Format Templates page and click on the format template you want to preview to open the template editor. The editor contains a preview panel at the bottom of the page.
Simply click on " Reload Preview" button to preview the template (you do not need to save the code before previewing).
Use the "Language" menu to preview the template in a given language
You can fill in the "Search Pattern" field to preview a specific record. The search pattern uses exactly the same
syntax as the one used in the web interface. The only difference with the regular search engine is that only the first matching record is shown.
For developers and adventurers only:
If you do not want to use the web interface to edit the templates but still would like to get previews, you can open the preview frame of any format in a new window/tab. In this mode you get a preview of the template (if it is placed in the /etc/bibformat/format_templates/ directory of your CDS Invenio installation). The parameters of the preview are specified in the url:
bft: the filename of the format template to preview
ln: the language to use for the preview
pattern_for_preview: the search pattern to use for the preview
You can add translations to your format templates. To do so enclose the text you want to localize
with tags corresponding to the two letters of the language. For example if we want to localize "title", write <en>Title</en>. Repeat this for each language in which you want to make "title" available: <en>Title</en><fr>Titre</fr><de>Titel</de>.
Finally enclose everything with <lang> </lang> tags: <lang><en>Title</en><fr>Titre</fr><de>Titel</de></lang>
For each <lang> group only the text in the user's language is displayed. If user's language is not
available in the <lang> group, your default CDS Invenio language is used.
To edit the attributes of a format template
go to the Manage Format Templates page, click on
the format template you want to edit, and then in the menu click on "Modify Template Attributes".
A format template contains two attributes:
Name: the name of the template
Description: a short description of the template
Note that changing these parameters has no impact on the formatting. Their purpose in only to
document the template.
If the name you have chosen already exists for another template, you name will be suffixed with an integer so that the name is unique.
You should also be aware that if you change the name of a format template, all output formats that were linking to this template will be changed to match the new name.
For developers and adventurers only:
You can change the attriutes of a template by editing its file in the /etc/bibformat/format_templates/ directory of your CDS Invenio installation. The attributes must be enclosed with tags <name> </name> and <description> </description> and should ideally be placed at the beginning of the file.
Also note that the admin web interface tries to keep the name of the template in sync with the filename of the template. If the name is changed through the web interface, the filename of the template is changed, and all output formats that use this template are updated. You have to do update output formats manually if you change the filename of the template without the web interface.
To check the dependencies of a format template
go to the Manage Format Template page, click on
the format template you want to check, and then in the menu click on "Check Dependencies".
The next page shows you:
The output formats that use this format template
the elements used in the template (and Marc tags use in these elements in parentheses)
A summary of all the Marc tags involved in the elements of the template
To check the validity of a format template, simply go to the Manage Format Templates page, and look at the column 'status' for the format template you want to check. If message "Ok" is there,
then no problem was found with the template. If message 'Not Ok' is in the column, click on it to see
the problems that have been found for the template.
Format elements are the bricks used in format templates to provide dynamic content to the formatting process.
Their purpose is to allow non computer literate persons to easily integrate data from the records in the database into their templates.
Format elements are typically written in Python (there is an exception to that point which is dicussed in Add a Format Element). This brings great flexibily and power to the formatting process. This however restricts the creation of format elements to developers.
The most typical way of adding a format element is to drop a .py file in the lib/python/invenio/bibformat_elements directory of your CDS Invenio installation. See Edit the Code of a Format Element to learn how to implement an element.
The most simple way to add a format element is to add a en entry in the "Logical Fields" management interface of the BibIndex module. When BibFormat cannot find the Python format element corresponding to a given name, it looks into this table for the name and prints the value of the field declared for this name. This lightweight way of doing is straightforward but does not allow complex handling of the data (it limits to printing the value of the field, or the values of the fields if multiple fields are declared under the same label).
To remove a Python format element simply remove the corresponding file from the lib/python/invenio/bibformat_elements directory of your CDS Invenio installation.
To remove a format element declared in the "Logical Fields" management interface of the BibIndex module simply remove the entry from the table.
This section only applies to Python format elements. Basic format elements declared in "Logical Fields" have non configurable behaviour.
A format element file is like any regular Python program. It has to implement a format function, which returns a string and takes at least bfo as first parameter (but can take as many others as needed).
Here is for example the code of the "bfe_title.py" element:
def format(bfo, separator=" "):
"""
Prints the title of a record.
@param separator separator between the different titles
"""
titles = []
title = bfo.field('245.a')
title_remainder = bfo.field('245.b')
titles.append( title + title_remainder )
title = bfo.field('0248_a')
if len(title) > 0:
titles.append( title )
title = bfo.field('246.a')
if len(title) > 0:
titles.append( title )
title = bfo.field('246_1.a')
if len(title) > 0:
titles.append( title )
return separator.join(titles)
In format templates this element can be called like a function, using HTML syntax: <BFE_TITLE separator="; "/>
Notice that the call uses (almost) the filename of your element. To find out which element to use, BibFormat tries different filenames until the element is found: it tries to
ignore the letter case
replace underscore with spaces
remove the BFE_ from the name
This means that even if the filename of your element is "my element.py", BibFormat can resolve the call <BFE_MY_ELEMENT /> in a format template. This also means that you must take care no to have two format elements filenames that only differ in term of the above parameters.
The string returned by the format function corresponds to the value that is printed instead of the format element name in the format template.
The bfo object taken as parameter by format stands for BibFormatObject: it is an object that represents the context in which the formatting takes place. For example it allows to retrieve the value of a given field for the record that is being formatted, or the language of the user. We see the details of the BibFormatObject further below.
The format function of an element can take other parameters, as well as default values for these parameters. The idea is that these parameters are accessible from the format template when calling the elements, and allow to parametrize the behaviour of the format element.
It is very important to document your element: this allows to generate a documentation for the elements accessible to people writing format templates. It is the only way for them to know what your element do. The key points are:
Provide a docstring for the format function
For each of the parameters of the format function (except for bfo), provide a description using a Java-like doc syntax in the doc string: @param my_param description for my param (one line per parameter)
You can use one @see followed by a comma separated list of elements filenames to provide a reference to other elements of interests related to this one: @see my_element1.py, my element2.py
Typically you will need to get access to some fields of a record to display as output. There are two ways to this: you can access the bfo object given as parameter and use the provided (basic) accessors, or import a dedicated module and use its advanced functionalities.
Method 1: Use accessors of bfo: bfo is an instance of the BibFormatObject class. The following methods are available:
get_record(): Returns the record of this BibFormatObject instance as a BibRecord structure. Allows advanced access on the structure using BibRecord.
control_field(tag): Returns the value of control field given by MARC tag.
field(tag):Returns the value of the field corresponding to MARC tag. If the value does not exist, return empty string.
fields(tag): Returns the list of values corresonding to MARC tag.If tag has an undefined subcode (such as 999C5), the function returns a list of dictionaries, whoose keys are the subcodes and the values are the values of tag.subcode. If the tag has a subcode, simply returns list of values corresponding to tag.
kb(kb, string, default=""): Returns the value of the string in the knowledge base kb. If kb does not exist or string does not exist in kb, returns default string.
Method 2: Use module BibRecord:
BibRecord is a module that provides advanced functionalities regarding access to the field of a record
bfo.get_record() returns a structure that can be understood by BibRecord's functions. Therefore you can import the module's functions to get access to the fields you want.
You can follow the standard internationalization procedure in use accross CDS Invenio sources. For example the following code will get you the translation for "Welcome" (assuming "Welcome" has been translated):
Notice the access to bfo.ln to get access to the current language of the user. For simpler translations or behaviour depending on the language you can simply check the value bfo.ln to return your custom text.
A format element has mainly four kinds of attributes:
Name: it corresponds to the filename of the element.
Description: the description is in the docstring of the format function (excepted lines prefixed with @param and @see).
Parameters descriptions: for each parameter of the format function, a line beginning with @paramparameter_name and followed by the description of the parameter is present in the docstring of the format function.
Reference to other elements: one line beginning with @see and followed by a list of comma-separated format elements filenames in the in the docstring of the format function provides a link to related elements.
There are two ways to check the dependencies of a format element. The simplest way is to go to the format elements documentation and click on "Dependencies of this element" for the element you want to check.
The second method to check the dependencies of an element is through regular unix tools: for example $ grep -r -i 'bfe_your_element_name' . inside the format templates directory will tell you which templates call your element.
There are two ways to check the validity of an element. The simplest one is to go to the format elements documentation and click on "Correctness of this element" for the element you want to check.
The second method to check the validity of an element is through regular Python methods: you can for example import the element in the interactive interpreter and feed it with test parameters. Notice that you will need to build a BibFormatObject instance to pass as bfo parameter to the format function of your element.
Go to the format elements documentation. There is a summary of all available format elements at the top of the page. You can click on an element to go to its detailed description in the second part of the page.
Each detailed documentation shows you:
A description of what the element does.
A list of all parameters you can use for this element.
For each parameter, a description and the default value when parameter is ommitted.
A link to a tool to track the dependencies of your element.
A link to a tool to check the correctness of your element.
A link to a tool to test your element with custom parameters.
You can play with a format element parameters and see the result of the element directly in the format elements documentation: for each element, under the section "See also", click on "Test this element". You are redirected to a page where you can enter a value for the parameters. A description is associated with each parameter as well as an indication of the default value of the parameter if you do not provide a custom value. Click on the "Test!" button to see the result of the element with your parameters.
Knowledge bases are a way to define easily extendable repositories of mappings. Their use is various, but their main purpose is to get, given a value, the normalized version of this value. For example you may use a knowledge base to hold a list of all ways to abbreviate the name of a journal, and map these abbreviations to the full journal name. This would be useful to get a normalized journal name accross all of your records.
The knowledge base itself offers no method to do this normalization. It is limited to the archiving of this knowledge. To benefit from the normalization you need to use a format element which is knowledge-base-aware. The element will look by iteself into the knowledge base to format a record. In that way you can extend the formatting capabilities of this element without having to modify it.
To add a knowledge base go to the Manage Knowledge Bases administration page.
At the bottom of the page click on the "Add New Knowledge Base" button. The knowledge base has been created and you are asked to fill in its attribute. See Edit the Attributes of a Knowledge Base to learn more about the attributes of knowledge bases.
To remove a knowledge base go to the Manage Knowledge Bases administration page. Click on the "Delete" button facing the knowledge base you want to remove and confim. The knowledge base and all the mapping it includes are removed.
Go to the Manage Knowledge Bases administration page and click on the knowledge base for which you want to add a mapping. Fill in the form of the "Add New Mapping" section on the left of the page with the new mapping, and click on "Add New Mapping". The mapping has been created. Alternatively you can create the mapping without its attributes, and fill them afterward (See Edit a Mapping).
Go to the Manage Knowledge Bases administration page and click on the knowledge base for which you want to remove a mapping. Click on the "Delete" button facing the mapping you want to delete.
Go to the Manage Knowledge Bases administration page and click on the knowledge base for which you want to edit a mapping. Locate the mapping in the list. You can click on the column headers to order the list by Map From or by Map To to help you find it. Once you have edited the mapping click on the corresponding "Save" button.
Go to the Manage Knowledge Bases administration page and click on the knowledge base you want to edit. In the top menu, click on "Knowledge Base Attributes". You can then give your knowledge base a name and a description. Finally click on the "Update Base Attributes" button.
To check the dependencies of a knowledge base
go to the Manage Knowledge Bases page, click on
the knowledge base you want to check, and then in the menu click on "Knowledge Base Dependencies".
The next page shows you the list of format elements that use this knowledge base.
The notation for accessing fields of a record are quite flexible. You can use a syntax strict regarding MARC 21, but also
a shortcut syntax, or a syntax that can have a special meaning.
The MARC syntax is the following one:
tag[indicator1][indicator2] [$ subfield] where tag is 3 digits, indicator1 and indicator2 are 1 character each, and subfield is 1 letter.
For example to get access to an abstract you can use the MARC notation 520 $a. You can use this syntax in BibFormat. However you can also:
Omit any whitespace character (or use as many as you want)
Omit the $ character (or use as many as you want)
Omit or use both indicators. You cannot specify only one indicator. If you need to use only one, use underscore _ character for the other indicator.
Use percent % instead of any character to specify all ("don'care" or wildcard character) for that character.
The new Python BibFormat formats are not backward compatible with the previous formats. New concepts and capabilities have been introduced and some have been dropped. If you have not modified the "Formats" or modified only a
little bit the "Behaviours" (or modified "Knowledge Bases"), then the transition will be painless and
automatic. Otherwise you will have to manually rewrite some of the
formats. This should however not be a big problem. Firstly because the
CDS Invenio installation will provide both versions of BibFormat for
some time. Secondly because both BibFormat versions can run side by
side, so that you can migrate your formats while your server still
works with the old formats. Thirdly because we provide a migration
kit that can help you go through this process. Finally because the
migration is not so difficult, and because it will be much easier for
you to customize how BibFormat formats your bibliographic data.
The first thing you should do is to read the Five Minutes Introduction to BibFormat to understand how the new BibFormat works. We also assume that you are familiar with the concepts of the old BibFormat. As the new formats separate the presentation from the business logic (i.e. the bindings to the database), it is not possible to automatically handle the translation. This is why you should at least be able to read and understand the formats that you want to migrate.
Differences between old and new BibFormat
The most noticeable differences are:
a) "Behaviours" have been renamed "Output formats".
b) "Formats" have been renamed "Format templates". They are now
written in HTML.
c) "User defined functions" have been dropped.
d) "Extraction rules" have been dropped.
e) "Link rules" have been dropped.
f) "File formats" have been dropped.
g) "Format elements" have been introduced. They are written in Python,
and can simulate c), d) and e).
h) Formats can be managed through web interface or through
human-readable config files.
i) Introduction of tools like validator and dependencies checker.
j) Better support for multi-language formatting.
Some of the advantages are:
+ Management of formats is much clearer and easier (less concepts,
more tools).
+ Writing formats is easier to learn : less concepts
to learn, redesigned work-flow, use of existing well known and
well documented languages.
+ Editing formats is easier: You can use your preferred HTML editor such as
Emacs, Dreamweaver or Frontpage to modify templates, or any text
editor for output formats and format elements. You can also use the
simplified web administration interface.
+ Faster and more powerful templating system.
+ Separation of business logic (output formats, format elements)
and presentation layer (format templates). This makes the management
of formats simpler.
The disadvantages are:
- No backward compatibility with old formats.
- Stricter separation of business logic and presentation layer:
no more use of statements such as if(), forall() inside templates,
and this requires more work to put logic inside format elements.
Migrating behaviours to output formats
Behaviours were previously stored in the database and did require to use the evaluation language to
provide the logic that choose which format to use for a record. They also let you enrich records
with some custom data. Now their use has been simplified and rectricted to equivalence tests on the value of a field
of the record to define the format template to use.
translates to the following output format (in textual configuration file):
tag 980.a:
PICTURE --- Picture_HTML_brief.bft
default: Default_HTML_brief.bft
or visual representation through web interface:
We suggest that you use the migration kit to produce initial output formats from
your behaviours, but that you go through the created .bfo files in the /etc/bibformat/output_formats/ directory of your CDS Invenio installation to check that they correspond to your behaviours.
Migrating formats to format templates and format elements
The migration of formats is the most difficult part of the migration. You will need to separate the presentation code (HTML) from the business code (iterations, tests and calls to the database). Here are some tips on how you can do this:
If you want to save the time of unescaping all HTML characters and understanding how the layout should look like, just go with your web browser to a formatted version of the format in your CDS Invenio installation, and copy the source of the web page. Identify the parts of the HTML code which are specific to the current record, and replace them with a call to the corresponding format element.
If you have made small modifications to the old default provided formats, we suggest that you use the new provided ones and modify them according to your needs.
We recommend that you do not use the migration kit for this part: it can help you create the initial files, but will never be able to provide a working implementation of the formats.
Migrating Knowledge Bases
We recomment yo use the migration kit to migrate your knowledge bases. It should have no problem to migrate this part of your configuration.
Migrating UDFs and Link rules
User Defined Functions and Link rules have been dropped in the new BibFormat. These concepts have no reasons to be as they can be fully implemented in the format elements. For example the AUTHOR_SEARCH link rule can directly be implemented in the Authors.bfe element.
As for the UDFs, most of them are directly built-in functions of Python. Whenever a special function as to be implemented, it can be defined in a regular Python file and used in any element.
The Migration Kit
The migration kit is available from the main BibFormat admin webpage or
directly here. The migration
kit has 3 steps, each migrating some part of your configuration. Just
click on the links to migrate each part and get the status of the migration.
You should note that each migration will create new files or entries in the database, such that
you will certainly want to click only once on each step (otherwise you will get duplicates).
The migration kit can:
a) Effortlessly migrate your behaviours, unless they include complex
logic, which usually they don't.
b) Help you migrate formats to format templates and format elements.
c) Effortlessly migrate your knowledge bases.
Point b) is the most difficult to achieve: previous formats did mix
business logic and code for the presentation, and could use PHP
functions. The new BibFormat separates business logic and
presentation, and does not support PHP. The transition kit will try to
move business logic to the format elements, and the presentation to
the format templates. These files will be created for you, includes
the original code and, if possible, a proposal of Python
translation. We recommend that you do not to use the transition kit to
translate formats, especially if you have not modified default
formats, or only modified default formats in some limited places. You
will get cleaner code if you write format elements and format
templates yourself.
You might want to migrate your formats over a long period of time, making new formats available to your
+users once they have been migrated, while old formats are still being used if they have not been translated.
+BibFormat will do this almost automatically. This section tells you what you should be aware of if you want this to work seamlessly.
+
When BibFormat has to format a record with a given output format code, it first tries to find a corresponding
+output format in the (new) output formats directory. If the output format cannot be found, it handle the formatting process
+to the old BibFormat, which will look for a behaviour with a name corresponding to code. This leads to the first rule you should follow:
+
For each of the Behaviours you want to migrate, you should have an Output Format with a code corresponding to the name of the Behaviour.
+
The second (and last) rule is as simple as the first one. Imagine you have a Behaviour "HD" that you want to migrate to Output Format "HB". Let's say that "HD" links to 'picture_HTML_detailed' format if field 980$a is equal to "Picture", and links to 'default_HTML_detailed' in all other cases, but that 'picture_HTML_detailed' has not been migrated to a new format template. Then second rule says:
+
Output Formats should have the same conditions on tags as Behaviours, even if format for that condition has not been migrated.
+
In our example if you open the "HD" ouput format in the web interface, we can add a rule that works on condition "If 980$a is PICTURE" and set the template to be used to "defined in old BibFormat" in the template menu. This looks strange, this is the only way to tell BibFormat that it should consider this condition and not go to the default rule and use the default template.
+For developers and adventurers only:
+
If you are to write Output Formats without the web interface, you should use the name migration_in_progress for each template which has not been migrated. The above example would therefore become:
+ tag 980.a :
+PICTURE --- migration_in_progress
+default: Default_HTML_detailed.bft
+
Why do we need output formats? Wouldn't format templates be sufficient?
As you potentially have a lot of records, it is not conceivable to specify for each of them which
format template they should use. This is why this rule-based decision layer has been introduced.
How can I protect a format?
As a web user, you cannot protect a format. If you are administrator of the
system and have access to the format files, you can simply use the permission rights of your system, as BibFormat
is aware of it.
Why cannot I edit/delete a format?
The format file has certainly been protected by the administrator of the server. You must ask the
administrator to unprotect the file if you want to edit it.
How can I add a format element from the web interface?
Format elements cannot be added, removed or edited through the web interface. This limitation
has been introduced to limit the security risks caused by the upload of Pythonic files on the server. The only possibility to add a basic format element from the web interface is to add a en entry in the "Logical Fields" management interface of the BibIndex module (see Add a Format Element)
Why are some Marc codes omitted in the "Check Dependencies" pages?
When you check the dependencies of a format, the page reminds you that
some use of Marc codes might not be indicated. This is because it is not
possible (or at least not trivial) to guess that the call to field(str(5+4)+"80"+".a")
is equal to a call to field("980.a"). You should then not completely rely on this indication.
How are displayed deleted record?
By default, CDS Invenio displays a standard "The record has been deleted." message for all
output formats with a 'text/html' content type. Your output format, format templates and format elements
are bypassed by the engine.
However, for more advanced output formats, CDS Invenio
goes through the regular formatting process and let your formats do the job. This allows you to customize how a record should be displayed once it has been deleted.
Why are some format elements omitted in the "Knowledge Base Dependencies" page?
When you check the dependencies of a knowledge base, the page reminds you that
format elements using this knowledge base might not be indicated. This is because it is not
possible (or at least not trivial) to guess that the call to kb(e.upper()+"journal"+"s") in a format element
is equal to a call to kb("Ejournals"). You should then not completely rely on this indication.
Why are some format elements defined in field table omitted in the format element documentation?
Some format elements defined in the "Logical Fields" management interface of the BibIndex module (the
basic format elements) are not shown in the format elements documentation pages. We do not show such an element if its name starts with a number.
This is to reduce the number of elements shown in the documentation as the logical fields
table contains a lot of not so useful fields to be used in templates.
The BibFormat admin interface enables you to specify how the
bibliographic data is presented to the end user in the search
interface and search results pages. For example, you may specify that
titles should be printed in bold font, the abstract in small italic,
etc. Moreover, the BibFormat is not only a simple bibliographic data
output formatter, but also an automated link
constructor. For example, from the information on journal name
and pages, it may automatically create links to publisher's site based
on some configuration rules.
2. Configuring BibFormat
By default, a simple HTML format based on the most common fields
(title, author, abstract, keywords, fulltext link, etc) is defined.
You certainly want to define your own ouput formats in case you have a
specific metadata structure.
Define one or more output BibFormat behaviours. These are then
passed as parameters to the BibFormat modules while executing
formatting.
Example: You can tell BibFormat that is has to enrich the
incoming metadata file by the created format, or that it only has to
print the format out.
Define how the metadata tags from input are mapped into internal
BibFormat variable names. The variable names can afterwards be used
in formatting and linking rules.
Example: You can tell that 100 $a field
should be mapped into $100.a internal variable that you
could use later.
Define rules for automated creation of URI links from mapped
internal variables.
Example: You can tell a rule how to create a link to
People database out of the $100.a internal variable
repesenting author's name. (The $100.a variable was mapped
in the previous step, see the Extraction Rules.)
Define file format types based on file extensions. This will be
used when proposing various fulltext services.
Example: You can tell that *.pdf files will
be treated as PDF files.
Define your own functions that you can reuse when creating your
own output formats. This enables you to do complex formatting without
ever touching the BibFormat core code.
Example: You can define a function how to match and
extract email addresses out of a text file.
Define the output formats, i.e. how to create the output out of
internal BibFormat variables that were extracted in a previous step.
This is the functionality you would want to configure most of the
time. It may reuse formats, user defined functions, knowledge bases,
etc.
Example: You can tell that authors should be printed in
italic, that if there are more than 10 authors only the first three
should be printed, etc.
Define one or more knowledge bases that enables you to transform
various forms of input data values into the unique standard form on
the output.
Example: You can tell that Phys Rev D and
Physical Review D are both the same journal and that these
names should be standardized to Phys Rev : D.
Enables you to test your formats on your sample data file. Useful
when debugging newly created formats.
To learn more on BibFormat configuration, you can consult the BibFormat Admin Guide.
3. Running BibFormat
3.1. From the Web interface
Run Reformat Records tool.
This tool permits you to update stored formats for bibliographic records.
It should normally be used after configuring BibFormat's
Behaviours and
Formats.
When these are ready, you can choose to rebuild formats for selected
collections or you can manually enter a search query and the web interface
will accomplish all necessary formatting steps.
Example: You can request Photo collections to have their HTML
brief formats rebuilt, or you can reformat all the records written by Ellis.
3.2. From the command-line interface
Consider having an XML MARC data file that is to be uploaded into
the CDS Invenio. (For example, it might have been harvested from other
sources and processed via BibConvert.)
Having configured BibFormat and its default output type behaviour, you
would then run this file throught BibFormat as follows:
that would create default HTML formats and would "enrich" the input
XML data file by this format. (You would then continue the upload
procedure by calling successively BibUpload and BibWords.)
Now consider a different situation. You would like to add a new
possible format, say "HTML portfolio" and "HTML captions" in order to
nicely format multiple photographs in one page. Let us suppose that
these two formats are called hp and hc and
are already loaded in the collection_format table.
(TODO: describe how this is done via WebAdmin.) You would then
proceed as follows: firstly, you would prepare the corresponding output behaviours called HP
and HC (TODO: note the uppercase!) that would not enrich
the input file but that would produce an XML file with only
001 and FMT tags. (This is in order not to
update the bibliographic information but the formats only.) You would
also prepare corresponding formats
at the same time. Secondly, you would launch the formatting as
follows:
that should give you an XML file containing only 001 and FMT tags.
Finally, you would upload the formats:
$ bibupload < /tmp/sample_fmts_only.xml
and that's it. The new formats should now appear in WebSearch.
4. Detailed Configuration Manual
What follows is a transcription of an old
FlexElink Configuration Manual v0.3 (2002-07-31). The text suffers
from missing screen snapshots, and the terminology may not be fully up-to-date
at places.
4.1. About BibFormat
BibFormat is a piece
of software that is part of the CDS Invenio (http://cdsweb.cern.ch).
Its mission, in few
words, is to provide a flexible mechanism to format the bibliographic records
that are shown as a result of CDS Search user queries allowing the
administrators or users customize the view of them. Besides, it offers the
possibility of using a linking system that can generate automatically all the
links included in the displayed records (fulltext access, electronic journals
reference, etc) reducing considerably maintenance.
To clarify this too
formal definition, we'll try to illustrate the role of BibFormat inside the CDS
Search module by showing the following figure. Please, note that this drawing
is trying to show the main role that BibFormat plays in the CDS structure and
it's quite simplified, but of course the underlying logic is a bit more
complex.
[Fig. 0]
As you can see, when a
user query is received, Weblib determines which records from the database match
it; then it ask BibFormat to format the obtained records. BibFormat looks at
its rule repository and for each record determines which format has to be
taken, applies the format specification and solves the possible links; gives
all this (in a formatted way) back to Weblib and it makes a nice HTML page
including the formatted results given by BibFormat among other info.
The good point in all
this is that anyone that has access to BibFormat rule repository is able to
modify the final appearance of a query result in the CDS Search module without
altering the logic of the search engine.
In order to be able to
modify this BibFormat rule repository, a web configuration interface is
provided. Trough this paper, we'll try to explain (in a friendly way and form
the user point of view) how to access this interface, how it's structured and
how to configure BibFormat trough it to achieve desired results.
4.2. How it works?
We've outlined which is
the role of BibFormat inside the CDS, so it's time now to have an overview of
how it works and how it's organized. We'll try not to be very technical,
however a few explanation about the BibFormat repository and architecture is
needed to understand how it works.
BibFormat, basically,
takes some bibliographic records as input and produces a formatted & linked
version of them as output. By "formatted" we mean that BibFormat can produce an
output containing a transformed version of the input data (normally an HTML
view); the good part is that you can entirely specify the transformation to
apply. At the same time, by "linked" we mean that you can ask BibFormat to
include (if necessary) inside this formatted version references to some
Internet resources that are related to the data from some pre-configured rules.
As an example, we could
imagine that you'd want to see the resulting records from CDS Search queries to
show their title in bold followed by their authors separated by comas. For
achieving this you'll have to go to the BibFormat configuration interface and
define a behavior for BibFormat in which you describe how to format incoming
records:
Figure 1.-
A very first Evaluation Language example
Don't be scared!! It's a
first approach to the way BibFormat allows you to describe formats. As you can
see, BibFormat uses a special language that you'll have to learn if you want to
be able to specify formats or links; it seems difficult (as much as a
programming language) but you'll see that it's quite more easy than it seems at
first sight.
In the next figure, is
shown how BibFormat works internally. When BibFormat is called, it receives a
set of bibliographic records to format. It separates each record and translates
it into a set of what we call "internal variables"; these "internal variables"
are simply an internal representation of the bibliographic record; the
important thing with them is that they will be available when you have to describe
the formats. Once it has these "internal vars", the processor module looks into
the behavior repository for that one (let's say format) you've asked BibFormat
to apply (when BibFormat is called, you can indicate which of the
pre-configured behaviors to apply; this allows it to have more than one
behavior); inside this behavior you can specify which data you want to appear,
how it has to appear, some links if they exist... in other words, the format
(actually, it's something more than a format, it describes how BibFormat has to
behave for a given input; that's why we refer to it as behavior). As we've already said, you can include links
in a behavior specification; links are a special BibFormat feature that helps
you to reduce the maintenance of your formats: you can include a link in
several formats or behaviors.
The picture below,
describes all this explanation.
[Fig. 2]
Summarizing, BibFormat can
transform an input made up of bibliographic records in an HTML output (not only
HTML but any text-based output) according to certain pre-configured
specifications (behaviors) that you can entirely define using a certain
language.
Just to mention, currently
BibFormat is working taking OAI MARC XML as format for input records, but it
can be adapted to other ways of inputs (reading a database, function call, etc)
with a little of development.
4.3. A first look at the web configuration interface
BibFormat can be
configured through its configuration interface that is accessible via web. It's
made up of a bunch of web pages that present you the main configuration aspects
of BibFormat allowing you to change them. In this section we are going to have
a first look at this web interface, how it's structured and its correspondence
with BibFormat features.
Before entering
these web pages you'll be asked for your accessing username & password.
Only certain users are allowed to access BibFormat WI; first you need a CDS
account that you can create easily by using the standard CDS account manager;
then you have to ask BibFormat administrator to give privileges to access the
WI.
. Once your password is accepted you'll access
the configuration interface. You'll see that is quite simple: It's structured
in different sections; each of them corresponds to a BibFormat feature and you
can navigate through them by using a navigation bar that is always present on
the left.
[Fig. 3]
Here you are a list
of the different sections the interface offers you and their correspondence
with BibFormat features:
Behaviors: This is the main section, the one you
enter by default when you access the web interface. It contains definitions for
the different pre-configured output types or behaviors that allow you to define
how you want BibFormat to behave when each output type is selected. More information
in chapter Defining output types: Behaviors of this manual.
OAI Extraction Rules: The input types and
mapping rules for OAI MARC XML inputs are defined here. You'll find here the
information about all the internal variables and their correspondence with the
input XML tags. See chapter Mapping the input of this manual for more
information.
Link Rules: Allows you to access the link rules
repository for defining the way links are generated. See chapter Defining
Links for a more detailed description about the BibFormat linking system.
UDFs: Presents you a list of all the User
Defined Functions (UDFs) that you can use inside Evaluation Language (EL)
statements that are used for specifying different configuration aspects.
You'll also be able to modify or extend this list within this section.
Everything about using UDFs and defining new ones in chapter User Defined
Functions (UDFs).
Formats: Another EL feature: You can
define a certain piece of EL code under a name for re-using it whenever
you want. See chapter Formats.
KBs: A complete management interface for Knowledge
Bases (KBs); those KBs will also be available inside EL
statements. See chapter Knowledge Bases(Kbs) for more specific
information.
Execution Test: You'll be able to execute
BibFormat from this section and view the results and some debug info in a web
page. You have to specify an input data file (through a URL).
User management: Allows you to define which CDS
users can access or not the BibFormat web interface.
Each section has
different particularities but the way of dealing with them follows a common
line through the interface. However, each section with their common things and
particular characteristics are treated in the following chapters of this
manual.
4.4. Mapping the input (OAI Extraction Rules)
We have already spoken a bit about BibFormat internal
variables. These are a key point to understand the BibFormat way of
working. As you know, BibFormat takes some bibliographic records as input and,
according to some pre-configured behavior, formats them into HTML, for example.
The problem is that this input records can come in several formats: different
XML conventions, database records, etc. For now, at CDS we only consider that
the input comes in OAI MARC XML but for the near future we'll may be have to
extend it to accept other input formats.
That's the reason why internal variables appear; they
provide a common way to refer to input data without relaying in any concrete
format. In other words, we will define BibFormat links and behaviors referring
to these internal variables and we'll have some rules that define how to
map an input format to them, so we would be able to use any BibFormat defined
behavior with any input that can be mapped to internal variables.
[Fig. 4]
You shouldn't worry about this because is more in the
development/administration side, but it's important to know where internal
variables come from and what they refer to. Besides, for CDS we only
consider the incoming data in OAI MARC XML format, so we'll talk only about
this case.
Internal variables are quite a simple concept: It's just
a label that represents some values from the input. Besides, a variable can
have fields that are also labels that represent values from the input
but that are related to other under the variable (e.g. You can have a variable
that maps authors and another that maps authors home institutes independently;
but if you want to have represent an author and his home institute you need to
relate these two variables in some way). Variables and their fields also
support multiple values.
Focusing on OAI MARC XML, the concept of variable and field is
already in the input structure.:
Each occurrence of OAI MARC XML varfield element
will correspond to a different variable value.
Each occurrence of OAI MARC XML subfield inside
a certain varfield element will correspond to a different field value of
the variable that maps the varfield.
So what we will have in BibFormat is a set of rules that tells a
variable name to which varfield element corresponds and each variable
field name which subfield element maps. Trough the web interface you'll
be able to add or delete new fields to variables or variables themselves,
you'll be able even to modify the mapping tags of variables (this way you can
keep your formats independent of changes in the meaning of MARC tags).
In the web interface, all this is located in OAI Ext. Rules
section as you can see in the following figure:
[Fig. 5]
Let's illustrate how BibFormat maps a certain input to variables
and fields with an example:
We have this variable & field definition on BibFormat:
Var. label
Mapping tag
Mult. V.
Fields
100
<varfield id="100" i1="" i2="">
Yes
Field label
Mapping tag
a
<subfield label="a">
e
<subfield label="e">
909C0
<varfield id="909" i1="C" i2="0">
No
Field label
Mapping tag
b
<subfield label="b">
And then a record like the following arrives as input:
Notice how varfield 037 is not considered because there
isn't an entry in the BibFormat configuration. Also notice how the values are
created: if "allow multiple values" is set to "Yes" each occurrence of a varfield
element determines a new value (variable "100"); in other case, the last value
is taken as single value for the variable (variable "909C0").
4.5. Defining output types: Behaviors
Now that we already know how internal variables are structured
and what they represent in the input, it's time to have a look at how to
configure BibFormat to transform that input data mapped into variables into
HTML results (although any text-based output could be generated).
When BibFormat is asked to format a bunch of bibliographic
records, it is also necessary to specify which output type it has to
use. This output type is a string that identifies a pre-configured set of
conditions and actions that tells BibFormat how to behave with the given input
data (that's why the terms output type and behavior are used
indifferently along this document).
BibFormat can have several pre-configured behaviors each one
identified by a different label. There are two different types of behaviors
(you can choose the behavior type when you define it):
Normal:
Consists in a behavior that outputs exactly the result of its evaluation.
Input Erich (only for XML inputs): It echoes each xml record
from the input inserting the behavior result just before the xml closing
element of the record.
Each behavior contains an ordered list of conditions; a
condition can contain zero or more associated actions (actions are ordered
inside a condition). A condition is a behavior item described by an Evaluation
Language expression that gives as result "TRUE" or "FALSE". An action is an
Evaluation Language (EL) statement that produces any output.
When BibFormat is called to format a set of input records with a
given behavior label, it looks for the behavior conditions. It evaluates their EL
in order and when one of them produces "TRUE" as result, it looks for their
associated actions. Then BibFormat evaluates the actions in the specified order
and concatenates their result.
By using different conditions you can specify alternative
formats inside a behavior (imagine that you want to format a record differently
depending on its base number); it's true that you could also reach this
solution by using EL IF statements, but it's more clear, efficient and
re-usable (you can change one condition without touching the rest or you can
give it more priority than others, that means give it the chance to be
evaluated before others, by changing its apply order).
Actions are used for specifying the format itself or the actions
you want to carry on with in case the condition is accomplished.
Through the web interface you can define new output types or
modify the ones that already exist. The use is quite easy: you just have to
select the link in the desired item with the operation you want to do over it.
[Fig. 6]
Let's have a look at a simple example to illustrate how to
define behavior that fit our needs:
Imagine a typical case where you want to format bibliographic
records but depending on their base number you want to apply different formats.
Whenever a record from base 27 (standards) arrives we want only to show its
title and the standard numbers, in other case a default format will be applied
in which the title and authors are shown. We'll assume CDS variable notation
and that the input rules are defined properly.
We are going to define a new NORMAL behavior for this new
situation, let's call it SIMPLE. In it we'll need two conditions to be
defined: one for applying the default format and another one for the 27-base
special one. The base number comes in variable 909C0.b, so the conditions would
be based on this variable content.
As you can see we have defined two conditions: one for
the 27-format and another for the default format. The point that is important
is the order in which we put the conditions: For each record in the input the
special one is evaluated first (because it has a lower evaluation number, 10)
and if the condition is true the format will be applied; in case the base is
not 27 the default condition is evaluated and because its condition EL code
is always true the default will be used to format the record.
Don't worry too much about the action code because it's
quite trivial. There are some "strange" things like the use of functions rep_prefix
and separator. These are special UDFs that have a special
behavior inside a FORALL statement:
rep_prefix: Prints the string argument only when we are in the
first iteration of a FORALL. In order words, put the prefix of the
string which is to be generated by the FORALL statement.
Separator: Prints the string argument in every FORALL
iteration but not in the last one.
4.6. Formats
Formats are a special construction that BibFormat Evaluation
Language (EL) offers. It allows you to group under an identifier
some EL code and after you can call it from every EL statement.
You can manage these formats using the web interface. It is
quite easy to do so: When you access the Formats section it will present
you a list with all the format identifiers that are already defined and a small
documentation about what's the format for. From there you can see the whole EL
code by using the link [Code]. You can add a new format by using the
set of input boxes that you'll find at the end of the page. Also delete and
modify operations are possible for already defined formats.
[Fig. 7]
Note: When defining formats,
one has to pay attention not to use "recursive" format calls (either direct or
indirect); this can lead to execution problems. For example, imagine that we
have a format called "ex 1" that has a call for itself:
Format "ex_1"
"hello world"
format("ex_1")
this is a "direct" recursive call; you
should never have these kind of calls as the web interface should warn you if
it finds these kind of troubles. However, "indirect" calls are not detected by
the web interface, so you have to care about them. One example of "indirect"
recursion:
Format "ex_1"
"hello world"
format("ex_2")
4.7. Knowledge bases (KBs)
This is yet another special feature provided by BibFormat Evaluation
Language. In a few words, this allows you to map one string value to
another according to a pre-stored set of key values that map to other values
(the knowledge bases). All the knowledge bases are identified by a label that
has to be unique (among other KBs identifiers); remember that identifiers are
not case-sensitive.
These sets of values, normally lived in a file, but with this
new development there was the need to have an easy KB management that was
integrated in BibFormat. For this reason, you can manage KBs from the BibFormat
configuration interface: section KBs.
When accessing to KBs section, the list of all the KBs
identifiers defined will be displayed. Below it you'll find a set of controls
to add new KBs; the use of these controls is as usual along the interface but
there's something a bit special: Normally, you shouldn't fill in the input box
that asks you for the Knowledge base table name; all the knowledge base
data is handled by a database in which each KB corresponds to a DB table; this
input box gets the internal table name for that KB; normally the KB manager
will generate it for you so you shouldn't need to use it.
[Fig. 8]
Each KB has a link for accessing the list of values that it
contains. If you click on it, a new window will show you the list of current
values (key and mapped ones) and a very easy interface to add new values or to
delete existing ones (KB values are case sensitive).
[Fig. 9]
4.8. User Defined Functions (UDFs)
The use of User Defined Functions (UDFs) is one of
the more powerful features of BibFormat Evaluation Language (EL).
The idea is that inside EL you can use operations or functions over
strings; normally a large number of different string transformations are needed
when talking about formatting but we cannot pretend implement all this
operations inside EL because it's in constant growing and new needs
appear all the time. For dealing with this problem, BibFormat defines a
mechanism that allows you to use define as much functions (UDFs) as you
want and use them inside any EL statement.
These functions are identified by a unique name and they receive
data (over which they do operations) by parameters. These functions are defined
in a programming language (PHP) and therefore good knowledge of this language
is needed.
BibFormat offers a complete UDF management through the UDFs
web interface section. There you'll see a complete list of all defined UDFs
with their identifier, parameters and a small documentation about what the UDF
does. You can also add, delete or modify UDFs or even have a look at the
PHP code of an already defined function (there you'll be able to launch small
tests over the defined functions).
[Fig. 10]
The definition of these functions should be reserved to
administrators and some particularities have to be taken into account when
defining UDFs:
When you want to add or modify a UDF you are
asked for the parameter list; you have to enter the parameter names separated
by comas. Ex: You want to define a new function for prefixing a given string
with another, so you need two parameters (one for the string which is going to
be prefixed, let's name it str, and another one for the prefix itself,
let's name it prefix); you should enter them in the parameter input box
like this: prefix, str
The order in which you specify the parameters when
defining a function is the order in which they have to be passed to the UDF
from an EL statement.
When defining the PHP code of a function, there are
some important things to consider:
The result of a function has to be a string.
The parameters are available inside the PHP code as
variables with the parameter name.
The result of the function has to be defined by a PHP
result clause giving the resulting string.
Make sure the PHP code is correct (there's no way to
know if the code is correct from BibFormat and it won't tell you if it is).
There are some special variables available inside the
PHP definition:
$FIRST_ITERATION: Is equal to "1" when we are in the first iteration
of an EL FORALL statement. "0" in other case. If the call is made outside
a FORALL is set to "1".
$LAST_ITERATION: Just the opposite case.
With these two
variables you can define FORALL special functions like a function to
print a separator.
4.9. Defining links
As we've already said, BibFormat is not only a formatter but it
also provides a link manager but, what do we mean by 'link manager'? The idea
is to have a set of rules that describe how to generate a link using certain
data; if the link can be generated from those rules, then the link manager can
check different things (i.e. see if the link is valid, if it's a link to a file
it can check if the file exists and in which formats it exists, etc) and
finally return the solved link. In other words, if you have a set of
bibliographic records that can contain a certain link and that link can be
coded in the link manager rules, you don't need to store each link in each
bibliographic record, you just use the link manager to generate them
dynamically; like this, you only have to maintain a small set of rules and not
thousands of static links in records.
BibFormat allows you to configure different link definitions
each of them identified by a unique name; each of these link definitions
have some associated parameters which are the information passed to the
rules defined for it. Then, when you call the link manager to solve a link
(from an EL statement, for example) you'll have to specify the
identifier of the link definition you want to be used and the value for
each of the parameters used by that link definition (always string
values). The link manager will retrieve the rules associated to the link
definition specified and will interpret those rules using the given
parameter values, informing you if the link was generated correctly and result
(the solved link).
BibFormat provides this mechanism and through the web interface
you can access to the rule repository for having a look at what are the
available link definitions, define new link rules or maintain already
defined ones. When adding or modifying a link definition you'll have to
specify the parameters, please remember to separate them by using comas.
[Fig. 10]
Link definitions are structurally quite similar to
behaviors: Although there can be different types of them (as we'll see later),
a link definition is made up of one or more conditions and each of these
conditions can have one or more actions that tell how the link has to be built
in case its condition is accomplished. In general, link rules (this includes
conditions and actions) have a particular structure and they are described in Evaluation
Language (EL) with one restriction: EL LINK statement
cannot be used. Each group of conditions-actions of a link definition can be of
a different solving type (actually, when you create a new link
definition, its solving type its asked; this is just because all
conditions that will be created for that link definition will have the selected
solving type as default; but you can change it afterwards having a
"mixed" link definition). Their structure and way the link manager interprets
them will depend in their solving type. Currently, there you can define
link conditions of two different solving types: EXTERNAL or INTERNAL. A
more detailed explanation about each type is given later.
As we've said a link definition is made up of various link
conditions. When a solving for a concrete link definition is asked, the
link manager retrieves all link conditions associated to it. Then it takes the
first of them (following the evaluation order - the lower is the
evaluation order number, the first the condition is considered), it evaluates
its EL code with the parameter values passed and if the result is "TRUE"
associated actions are executed, the link is returned and the solving process
finishes. In case a condition fails, it looks for the next one. If all the
conditions fail then the link manager returns that the link couldn't be solved.
This is the general behavior of the link manager, but the way of determining if
a link has been solved or not and the link building depends on the condition solving
type.
4.9.1. EXTERNAL link conditions
This is the simplest way of solving links. It's intended to be
used when you want to generate a link that points to an external resource
(normally a web page). In this case the link condition is composed by only one
action that will be evaluated if the associated condition is "TRUE". When a
condition of this type is evaluated "TRUE" and the action is executed, the
result of the action is given as the solved link and the link manager finishes.
[Fig. 11]
4.9.2. INTERNAL link conditions
This condition solving type is intended to be used when you want
to link to a document which is a file (inside or outside your file system) and
that can be in different file formats.
This case is a bit more complex than the previous one, so we'll
go step-by-step explaining differences and special features:
An INTERNAL condition has a base file path and a
base URL associated. The base file path is the string that will
be used as prefix when looking for a file generated by the actions associated
to that condition. On the other hand, the base URL will be a string to
which the link string (resulting from the actions) will be added (i.e. if the base
file path of a condition is /tmp/docs
and the base URL is http://doc.cern.ch/,
if the condition is true and the result of the actions is test.pdf, the file path the link manager
will have to check will be /tmp/docs/test.pdf
and, if the file exists, the generated link will be http:/doc.cern.ch/test.pdf)
Any condition of this type can several associated file
formats. This is a new concept that is only used for INTERNAL condition
solving. A file format is simply a set of file extensions that are
grouped under an identifier. Then, you can associate a file format
identifier with a link condition. When the condition is true the link manager
will combine each result from the condition actions with the associated file
formats to check the existence of a file of any format; this means that when an
action is evaluated, the link manager takes the file extensions of each
associated file format identifier and checks if the file base path +
resulting action string + file extension exists in the file system.
One condition of this type can have more than one
associated action. Each of its actions describes an alternative way of building
the file path. When a condition of this type is evaluated to "TRUE", the link
manager retrieves its actions (following actions apply order) and
evaluates the first one; with the action result it builds the file path in this
way: file base path + resulting action string, and then combines this
string with each of the file extensions. If any of the combination
exists in the file system, the link is generated (if there are more than one
file format combination that exist, the link variable will have multiple values
containing the different links); if not, it starts the same process with the
next action. If any of the actions drive to a existing file, the link is not
generated.
When calling the link manager from a EL
statement (see chapter Evaluation Language Reference), if the link is
solved we'll be able to access to a special internal variable that contains as
value the resulting link. In the INTERNAL condition links, we have said that
this variable can contain multiple values in case the link manager finds
different file formats. In this case, there's another extension that consists
in having some special variable fields containing special values for each value
in the LINK variable and to which you can access when the link is solved;
here's a table detailing the different variable LINK fields which are defined
when a INTERNAL condition link is solved:
Field name
Value that contains
url
The same value as the LINK variable: The solved URL.
file
Contains the local full path to the file the solved URL points to.
format_id
Contains the file format id string
format_desc
Contains the file format description string (this
is defined for each file format)
4.9.2 Example
As the link generation is quite a complex topic (specially when
talking about INTERNAL linking) we'll try to illustrate it with a simple
example.
Let's imagine we want to create a new link definition for
generating full-text access to the documents that are archived on a document
server (a file system which contains document's electronic versions). These
documents are organized systematically depending in three characteristics that
are included in the bibliographic records: BASE, CATEGORY and ID. When the base
corresponds to "CERNREP" then the files are archived below directory /pub/www/home/cernrep/
and can be stored following two different criteria that depend on the CATEGORY
and ID values; the documents are all HTML. However, if the base is "PREPRINT"
and the CATEGORY is either "HEP-TH" or "HEP-PH" they are stored under directory
/archive/electronic|/pub/www/home/ following a certain criteria; in this
case the documents can be in several file formats: PDF, Postscript, MS Word.
Of course, we want only the link to be created if the files
corresponding to the bibliographic records exist.
So we start creating a new link definition that we'll call FULLTEXT.
It will receive three parameters that are the information we need for
generating this kind of links: BASE, CATEGORY and ID. We select INTERNAL as
solving type as default and then we fill it the base file path and url with
some default values (these values are not important, they will be copied by
default to the conditions we are going to create afterwards).
[Fig. 12]
Then we create a condition for the first possibility: when BASE
is "CERNREP". We select INTERNAL as link condition because we want to link to a
file and we want to check its existence and we fill in the base file path and
URL with the corresponding values. Then we assign the file format types and we
enter the file archiving criteria as different actions.
[Fig. 13]
For the other possibility we proceed in the same way by adapting
the definition to the requirements; we'll have something like this as result:
[Fig. 14]
Once we have finished the link definition, we can insert links
of this type from a BibFormat behavior, for example. Let's imagine we have
included a piece of EL code like this in a behavior because we want to
insert a link to the full-text documents of any record:
This EL statement will include the string "Fulltext: "
followed by a link to all the documents found for the values of internal
variables $base, $category, $id separated by " - ".
4.10. User management
The BibFormat web interface (WI) comes with a security mechanism
which allows you to define which users can access the WI. BibFormat doesn't
have a user management incorporated; instead it uses CDS user schema (as is a
part of CDS). So if you are not registered as CDS user and you want to have
access to BibFormat WI, first thing to do is to register in CDS through the
standard procedure (for example via the CDS Search interface you can access the
CDs account management system).
BibFormat WI access policy is rather simple: it keeps a list of
CDS users that can access the WI. Then if someone tries to access any part of
the WI, the system will ask the user to identify him as CDS user. If the CDS
login is successful and the user is in BibFormat's access list, then the user
will gain access to the WI.
There's a section in the WI which allows you to define which CDS
users will have access to the WI. The use is rather simple: You can add CDS
users to the access list by specifying either their CDS user id or their CDS
login; then you can delete a CDS uses from the access list by simply selecting
the link "delete" for the corresponding user.
[Fig. 15]
When you install BibFormat for the first time and you access to
the WI you'll see that no login or password is asked. The security mechanism
doesn't get activate until at least one user is added to the BibFormat's access
list. So if you don't want to limit the access to BibFormat WI keep the access
list without any user in.
4.11. Evaluation Language Reference
In this section we'll present a more or less formal definition
of the Evaluation Language (EL); although we are using some formal
methods to describe it we'll also make a quick explanation about the elements
that made up the language and how to combine them to arrive to desired results.
Just below you can find the EL definition, expressed in terms of
EBNF (Extended Backus-Naur Form) notation. We have used capital letters
to express non-terminal elements and non-capital/bold characters for the
terminal ones. There's one remark to make: Whenever you find the mark
[REX] after
any definition, it means that we have used a regular expression just before in
order to express a set of non-terminals.
This is just a formal way of describing the language, but don't
worry if you don't understand it very well because just below these lines we'll
try to describe it in a more informal way.
To begin with, you should know that EL is a language designed to
work with strings (a string is a collection of characters) but it has also some
logic and comparison operations. One important thing you have to be aware of is
that in EL blank spaces, tabulators or carriage returns have no more meaning
than separator for elements of the language; that means that between two basic
elements you can have as many spaces or carriage returns as you want.
One of the basic elements of the language is what we call LITERALS.
These things represent constant string values; they are delimited by a pair of
double quote (") symbols surrounding the string you want to express. Everything
you put inside the double quotes will be considered as it is, so inside a
literal several spaces or carriage have meaning (it's the only case). If you
want to express a double quote symbol inside a literal you have to escape
it using \.
Some examples of literals:
If you want to represent the string hello,
inside the EL you'll have to use "hello".
For the string hello "big" man, the representation in EL is "hello \"big\" man"
(notice the escape characters and that spaces have meaning).
Let's see \"" string has to be expressed in this
way "Let's see \\\"\"".
Another important basic element of the language is VARIABLES.
These elements represent string data from the input to which you can refer
inside of the language (and is considered also as a string). Variables are defined
in advance by the administrator (or even users) so you have to know which of
them you have access to. Additionally, variables can contain FIELDS that
are simply other input values that are grouped under a variable because they
have some kind of relationship between them (for example, you could have a
variable for the information about the author and fields like name, born place,
etc for it). If you want to know more about variables and their correspondence
with the input you can look at the Mapping the Input section. The way of
expressing a variable in EL is by a dollar symbol followed by any letter,
number or underscore; variables are case-insensitive. To refer to any field of
a variable, you simply put a dot followed by the field name (which is also made
up of any character, number or underscore).
Some examples about variables and fields:
Imagine you have a variable which contains the author
information and which is called author, to represent in EL you would
have to write "Author: " $100.a. In every place
that $author appears BibFormat will consider the value defined for it from the
current input record.
Then you know that the field name of variable author
contains the author full name and you want to refer to it inside an EL
statement, so you'd write $author.name.
If we speak about CDS configuration, variable and field
names correspond to MARC 21 tag & indicator names; so to refer to the main
title of a bibliographic record we should use variable 245 field b,
in EL terms: $245.b.
Now that we know basic elements of the language we can start
thinking about how to combine them. The most important (and unique) string
operation is concatenation: adding strings. This operation is implicit to the
language, so we just put language elements one before another, and the representation
result will be the result of the basic elements one after another.
Some samples:
To represent the constant string Author:
followed by the name of the author of the input record you should write "Author: " $100.a (it's supposed CDS
configuration in which MARC 21 notation is used; authors correspond to variable
100 field a).
You want to output the title in bold (always HTML
speaking) followed by the author in normal chars separated of the title by char
/: "<b>" $245.b "</b>/" $100.a
These two, literals and variables, are only basic elements of
the EL. You can combine them using concatenation to get new strings. But, of
course, there are some more operations you can apply over strings: UDFs (User
Defined Functions). We'll also name these elements as functions, because
they are that: functions or operations to be applied over strings; when talking
about strings we include basic elements or resulting string from applying any
operations. A UDF has a name that identifies it uniquely and needs to get some information
that we call parameters. A UDF gives another string as result depending
on the parameter values (always strings). So to represent a function in EL you
need its name followed by an open parenthesis, the parameter values separated
by comas and a closing parenthesis. There's a list of UDF you can look at
through the interface but this list can be extended to fit your needs (look at UDFs
section of this manual).
Some examples:
You want to ensure that the title of a bibliographic
record is always going to be in capital letters; good, there's a function
called upper that takes one parameter and gives as result the parameter
transformed in capital letters. You have to write the call like this: upper($245.b).
You want only the 3 first chars of an author name to
appear in capital letters. We've seen there's a function for uppercasing a
string but there's another one, called copy that gets a sub string from
a string passed as first parameter from the char position indicated by the 2nd
parameter and with the length given by the 3rd one: copy(
upper($100.a), "0", "3").
As you can see, these UDFs are very powerful because you can
concatenate their result with another element (literal, variable or even
function) and the parameters can be basic elements or expressions. We can
extend this ensuring that any element or expression of the EL that gives as
result a string value can be combined with other EL expressions or elements.
Another very useful feature of EL is the possibility to use KWONLEDGE
BASES (KBs). A KB is just a set of key values that map (one-to-one) another
set of values; may be knowledge bases isn't a very appropriate name because
they are more like translation tables. BibFormat offers tools to create and
maintain KBs that can be used in the EL afterwards (see chapter KBs
management in this manual). You can see KB invocation as a special function
(the syntax for calling it is the same) with name kb and that takes two
parameters: one for indicating the KB name (BibFormat can handle several KBs)
and another one for the key value to translate. The result is the mapped KB
value or an empty string if it doesn't exist as a key value in the specified
KB. A typical example is when you have months with numbers and you want to
translate them into month names; you could have a KB that maps all the month
numbers to month names and then call it like this kb("MONTH", $m).
Now let's move to FORMATS. Formats are some EL
code which is grouped under a label (a name) and that can be used in any other
EL statement. BibFormat allows the user to define as many formats as he wants
and identify each of them with a simple name. In few words, formats allow you
to reuse EL code; within a format you can put any EL code (even other format
calls) and all the variable values are completely available. Again, a format call in EL follows the same
convention as functions: the word format followed by the format name (a
string) between parenthesis. When you call a format is like if the EL code
define inside that format was pasted, as it is in the place you make the call.
Example: Imagine you have to write the title of a bibliographic
records with a certain format, let's say in bold and red; but this formatted
title you are going to use it in several places. So can take advantages of EL
formats and define a format called TITLE that contains the code "<font color=\"red\"><b>" $245.b
"</b></font>". Once this is done, you could use it to format
records by printing their title in that way and their author after it:
format("TITLE") "/" $100.a. The good thing
is that if some day you decide to change the title formatting you'd only need
to modify the TITLE format definition and not all the places where you
show the title.
At this point, you have seen basic elements and operations with
EL. You may think that is powerful enough to express your formatting work, but
there are more complex situations that you'll have to face. We have tried to
design the EL to be easy enough but with the next advanced structures,
sometimes, can arrive to be a bit complex.
All these basic elements and operations are quite OK. But there
are sometimes where you want to compare expressions and decide what to do
depending on the result of the comparison. For this purpose, EL has an IF
statement and a few comparison and logic operators built in (don't forget that
any functionality needed can be achieved by defining new UDFs; EL gives basic
operations to provide this possibility). Let's go step by step: First let's
talk about the set of operators that can be used in a comparison:
Comparison operators: Equal and non-equal (=, !=). They take
two operators that have to be strings and produce a logic (true or false)
value.
Logical operators: AND, OR and NOT (&&, ||, !). All of
them have to be used over logical values, taking two operators AND and OR, and
one operator NOT.
All of them are right associative (except NOT which is unary
left-associative) and their precedence goes like this (more to less): NOT,
(EQUAL, NON-EQUAL), (AND, OR). These operators cannot be used anywhere, only
inside statements that expect a logic value as result, in other words, inside
condition statements.
The IF structure is quite easy to learn: First we indicate the
word IF followed by a condition statement surrounded by parenthesis;
then a EL statement into braces can be specified, this statement will be
executed only if the condition was true; optionally, we can add an ELSE
word followed by another EL statement into braces, that will only be triggered
if the IF condition was not true.
Let's have a look at some examples:
I want the title of a record to appear followed by the
constant Author: and its author afterwards. But it could be nice if the
constant string appeared only if the record has author:
BibFormat is not only an EL processor. Among others, it contains
a link solver that contains it's own rule repository in order to be able to
automatically solve links (see chapter Link solver of this manual). EL has one
special structure for asking the link solver for some links and including them
in the formatted version of the bibliographic record. This way links are easy
to maintain (you modify the rules independently from where the link is being
used) and as re-usable as formats or UDFs. Links are identified by a label and
need some information to be passed as parameters; then an EL statement has to
be specified which will be effective only if the link is solved and inside
which, you'll have access to an special variable, named LINK, which
contains the solved link among other information (see chapter Link solver for
more information about which values are accessible); additionally, an else
statement can be added (following the same syntax as in the IF construction)
that will be effective only if the link can't be solved by the Link solver.
Example:
We are with our typical example of the simple format
that contains the title and the author, but now we want the author to be linked
to the search. Supposing that a this kind of link is already defined under the
label "AUTHOR_SEARCH" we should proceed like this:
The next step when talking about EL components is to deal with
multiple values. Life is no so easy and, of course, and a bibliographic record
can have more than one author or can have a related document which is in more
than one format and that has to be linked. In other words, BibFormat supports
having variables and fields with multiple values (see chapter Mapping input),
consequently a way of applying an EL statement over all the values of a
variable or a field would be quite useful. FORALL is our construction!!
It allows you to specify a variable or a field followed by a EL statement
(between braces) that will be applied for every value of the variable or the
field; any reference to the iteration variable inside the FORALL EL
statement will be related to the current iteration variable value (if you refer
to a variable that has multiple values outside a FORALL the first value
is considered). One limitation is that you shouldn't nest FORALL
statements, in other words, never put a FORALL inside another one. This
construction let's you also limit the number of times you want to iterate over
a variable or field by adding a literal with the number of iterations.
Some examples:
Let's continue refining our simple format; now we have
to consider that there can be more than one author for one bibliographic
record, so we want to show all of them with the link included, of course.
Although this FORALL construction could seem not
very useful, it's used a lot when defining formats or behaviors. Quite often
you will have the case where you want only some EL piece of code to be
effective if a certain variable or field exist; FORALL can also be used
in that situation and it has to be said that is the most comfortable way of
doing it. Imagine the case you want the title, the constant string "Author: "
followed by the authors of a bibliographic record; but you don't want the
constant "Author: " to appear if there's no author at all. You could use
something like this:
As you can see we are
using a new function: rep_prefix. In fact this is an UDF which prints
the string passed as parameter only once at the beginning inside a FORALL
statement. But the interesting thing here is the FORALL application.
Finally, there's still one EL special function: COUNT.
Due to certain special situations or strange input data in the variables,
sometimes is useful to know how many values contain a variable or a field. So
this function, simply takes a variable or field as argument and returns a
string with the number of values that contains; if the value returned is 0,
that means that no value is in the variable, what means that variable doesn't
exist or there weren't any values mapped from the input.
Examples:
As this is the last example, let's do it a bit more
complicated: Continuing with our very well known simple format, we want all the
authors of the record appear if there are less than 10, in any other case we
want only the first one to appear followed by the string "et al.". We'll also
use a function called GT which returns a non-empty string if the first
parameter is greater than the second one.
diff --git a/modules/bibformat/lib/bibformat.py b/modules/bibformat/lib/bibformat.py
index 6aaed7067..8da67ae70 100644
--- a/modules/bibformat/lib/bibformat.py
+++ b/modules/bibformat/lib/bibformat.py
@@ -1,278 +1,285 @@
# -*- coding: utf-8 -*-
## $Id$
## BibFormat. Format records using specified format.
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005 CERN.
##
## The CDSware is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## The CDSware is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDSware; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
Format records using specified format.
API functions: format_record, format_records, create_excel, get_output_format_content_type
Used to wrap the BibFormat engine and associated functions. This is also where
special formatting of multiple records (that the engine does not handle, as it works
on a single record basis) should be put, with name create_*.
SEE: bibformat_utils.py
FIXME: currently copies record_exists() code from search engine. Refactor later.
"""
import zlib
from invenio import bibformat_dblayer
from invenio import bibformat_engine
from invenio import bibformat_utils
from invenio.config import cdslang, weburl, php
from invenio.bibformat_config import use_old_bibformat
try:
import invenio.template
websearch_templates = invenio.template.load('websearch')
except:
pass
# Functions to format a single record
##
def format_record(recID, of, ln=cdslang, verbose=0, search_pattern=[], xml_record=None, uid=None):
"""
Formats a record given output format.
Returns a formatted version of the record in
the specified language, search pattern, and with the specified output format.
The function will define which format template must be applied.
The record to be formatted can be specified with its ID (with 'recID' parameter) or given
as XML representation(with 'xml_record' parameter). If both are specified 'recID' is ignored.
'uid' allows to grant access to some functionalities on a page depending
on the user's priviledges. Typically use webuser.getUid(req). This uid has sense
only in the case of on-the-fly formatting.
@param recID the ID of record to format
@param of an output format code (or short identifier for the output format)
@param ln the language to use to format the record
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings, stop if error in format elements
9: errors and warnings, stop if error (debug mode ))
@param search_pattern list of strings representing the user request in web interface
@param xml_record an xml string represention of the record to format
@param uid the user id of the person who will view the formatted page (if applicable)
@return formatted record
"""
############### FIXME: REMOVE WHEN MIGRATION IS DONE ###############
if use_old_bibformat and php:
return bibformat_engine.call_old_bibformat(recID, format=of)
############################# END ##################################
+ return bibformat_engine.format_record(recID=recID,
+ of=of,
+ ln=ln,
+ verbose=verbose,
+ search_pattern=search_pattern,
+ xml_record=xml_record,
+ uid=uid)
try:
return bibformat_engine.format_record(recID=recID,
of=of,
ln=ln,
verbose=verbose,
search_pattern=search_pattern,
xml_record=xml_record,
uid=uid)
except:
#Failsafe execution mode
if of == 'hd':
return websearch_templates.tmpl_print_record_detailed(
ln = ln,
recID = recID,
weburl = weburl,
)
return websearch_templates.tmpl_print_record_brief(ln = ln,
recID = recID,
weburl = weburl,
)
def record_get_xml(recID, format='xm', decompress=zlib.decompress):
"""
Returns an XML string of the record given by recID.
The function builds the XML directly from the database,
without using the standard formatting process.
'format' allows to define the flavour of XML:
- 'xm' for standard XML
- 'marcxml' for MARC XML
- 'oai_dc' for OAI Dublin Core
- 'xd' for XML Dublin Core
If record does not exist, returns empty string.
@param recID the id of the record to retrieve
@return the xml string of the record
"""
return bibformat_utils.record_get_xml(recID=recID, format=format)
# Helper functions to do complex formatting of multiple records
#
# You should not modify format_records when adding a complex
# formatting of multiple records, but add a create_* method
# that relies on format_records to do the formatting.
##
def format_records(recIDs, of, ln=cdslang, verbose=0, search_pattern=None, xml_records=None, uid=None,
prefix=None, separator=None, suffix=None, req=None):
"""
Returns a list of formatted records given by a list of record IDs or a list of records as xml.
Adds a prefix before each record, a suffix after each record, plus a separator between records.
You can either specify a list of record IDs to format, or a list of xml records,
but not both (if both are specified recIDs is ignored).
'separator' is a function that returns a string as separator between records.
The function must take an integer as unique parameter, which is the index
in recIDs (or xml_records) of the record that has just been formatted. For example
separator(i) must return the separator between recID[i] and recID[i+1].
Alternatively separator can be a single string, which will be used to separate
all formatted records.
'req' is an optional parameter on which the result of the function
are printed lively (prints records after records) if it is given.
This function takes the same parameters as 'format_record' except for:
@param recIDs a list of record IDs
@param xml_records a list of xml string representions of the records to format
@param header a string printed before all formatted records
@param separator either a string or a function that returns string to separate formatted records
@param req an optional request object where to print records
"""
formatted_records = ''
#Fill one of the lists with Nones
if xml_records != None:
recIDs = map(lambda x:None, xml_records)
else:
xml_records = map(lambda x:None, recIDs)
total_rec = len(recIDs)
last_iteration = False
for i in range(total_rec):
if i == total_rec - 1:
last_iteration = True
#Print prefix
if prefix != None:
if isinstance(prefix, str):
formatted_records += prefix
if req != None:
req.write(prefix)
else:
string_prefix = prefix(i)
formatted_records += string_prefix
if req != None:
req.write(string_prefix)
#Print formatted record
formatted_record = format_record(recIDs[i], of, ln, verbose, search_pattern, xml_records[i], uid)
formatted_records += formatted_record
if req != None:
req.write(formatted_record)
#Print suffix
if suffix != None:
if isinstance(suffix, str):
formatted_records += suffix
if req != None:
req.write(suffix)
else:
string_suffix = suffix(i)
formatted_records += string_suffix
if req != None:
req.write(string_suffix)
#Print separator if needed
if separator != None and not last_iteration:
if isinstance(separator, str):
formatted_records += separator
if req != None:
req.write(separator)
else:
string_separator = separator(i)
formatted_records += string_separator
if req != None:
req.write(string_separator)
return formatted_records
def create_excel(recIDs, req=None, ln=cdslang):
"""
Returns an Excel readable format containing the given recIDs.
If 'req' is given, also prints the output in 'req' while individual
records are being formatted.
This method shows how to create a custom formatting of multiple
records.
The excel format is a basic HTML table that most spreadsheets
applications can parse.
@param recIDs a list of record IDs
@return a string in Excel format
"""
# Prepare the column headers to display in the Excel file
column_headers_list = ['Title',
'Authors',
'Addresses',
'Affiliation',
'Date',
'Publisher',
'Place',
'Abstract',
'Keywords',
'Notes']
# Prepare Content
column_headers = '
'
#Apply content_type and print column headers
if req != None:
req.content_type = get_output_format_content_type('excel')
req.headers_out["Content-Disposition"] = "inline; filename=%s" % 'results.xls'
req.send_http_header()
req.write(column_headers)
#Format the records
excel_formatted_records = format_records(recIDs, 'excel', ln=cdslang,
separator='\n', req=req)
if req != None:
req.write(footer)
return column_headers + excel_formatted_records + footer
# Utility functions
##
def get_output_format_content_type(of):
"""
Returns the content type (eg. 'text/html' or 'application/ms-excel') \
of the given output format.
@param of the code of output format for which we want to get the content type
"""
content_type = bibformat_dblayer.get_output_format_content_type(of)
if content_type == '':
content_type = 'text/html'
return content_type
diff --git a/modules/bibformat/lib/bibformat_engine.py b/modules/bibformat/lib/bibformat_engine.py
index d1a08436e..b866b1d8c 100644
--- a/modules/bibformat/lib/bibformat_engine.py
+++ b/modules/bibformat/lib/bibformat_engine.py
@@ -1,1603 +1,1606 @@
# -*- coding: utf-8 -*-
## $Id$
## Bibformt engine. Format XML Marc record using specified format.
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005 CERN.
##
## The CDSware is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## The CDSware is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDSware; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
Formats a single XML Marc record using specified format.
There is no API for the engine. Instead use bibformat.py.
SEE: bibformat.py, bibformat_utils.py
"""
import re
import sys
import os
import inspect
import traceback
import zlib
from invenio.errorlib import register_errors, get_msgs_for_code_list
from invenio.config import *
from invenio.bibrecord import create_record, record_get_field_instances, record_get_field_value, record_get_field_values
from invenio.dbquery import run_sql
from invenio.messages import language_list_long, wash_language
from invenio import bibformat_dblayer
from invenio.bibformat_config import format_template_extension, format_output_extension, templates_path, elements_path, outputs_path, elements_import_path
from bibformat_utils import record_get_xml
+from xml.dom import minidom #Remove when call_old_bibformat is removed
+
__lastupdated__ = """$Date$"""
#Cache for data we have allready read and parsed
format_templates_cache = {}
format_elements_cache = {}
format_outputs_cache = {}
kb_mappings_cache = {}
cdslangs = language_list_long()
#Regular expression for finding ... tag in format templates
pattern_lang = re.compile(r'''
#closing start tag
(?P.*?) #anything but the next group (greedy)
() #end tag
''', re.IGNORECASE | re.DOTALL | re.VERBOSE)
#Builds regular expression for finding each known language in tags
ln_pattern_text = r"<("
for lang in cdslangs:
ln_pattern_text += lang[0] +r"|"
ln_pattern_text = ln_pattern_text.rstrip(r"|")
ln_pattern_text += r")>(.*?)\1>"
ln_pattern = re.compile(ln_pattern_text)
#Regular expression for finding tag in format templates
pattern_format_template_name = re.compile(r'''
#closing start tag
(?P.*?) #name value. any char that is not end tag
()(\n)? #end tag
''', re.IGNORECASE | re.DOTALL | re.VERBOSE)
#Regular expression for finding tag in format templates
pattern_format_template_desc = re.compile(r'''
#closing start tag
(?P.*?) #description value. any char that is not end tag
(\n)? #end tag
''', re.IGNORECASE | re.DOTALL | re.VERBOSE)
#Regular expression for finding tags in format templates
pattern_tag = re.compile(r'''
[^/\s]+) #any char but a space or slash
\s* #any number of spaces
(?P(\s* #params here
(?P([^=\s])*)\s* #param name: any chars that is not a white space or equality. Followed by space(s)
=\s* #equality: = followed by any number of spaces
(?P[\'"]) #one of the separators
(?P.*?) #param value: any chars that is not a separator like previous one
(?P=sep) #same separator as starting one
)*) #many params
\s* #any number of spaces
(/)?> #end of the tag
''', re.IGNORECASE | re.DOTALL | re.VERBOSE)
#Regular expression for finding params inside tags in format templates
pattern_function_params = re.compile('''
(?P([^=\s])*)\s* # Param name: any chars that is not a white space or equality. Followed by space(s)
=\s* # Equality: = followed by any number of spaces
(?P[\'"]) # One of the separators
(?P.*?) # Param value: any chars that is not a separator like previous one
(?P=sep) # Same separator as starting one
''', re.VERBOSE | re.DOTALL )
#Regular expression for finding format elements "params" attributes (defined by @param)
pattern_format_element_params = re.compile('''
@param\s* # Begins with @param keyword followed by space(s)
(?P[^\s=]*)\s* # A single keyword, and then space(s)
#(=\s*(?P[\'"]) # Equality, space(s) and then one of the separators
#(?P.*?) # Default value: any chars that is not a separator like previous one
#(?P=sep) # Same separator as starting one
#)?\s* # Default value for param is optional. Followed by space(s)
(?P.*) # Any text that is not end of line (thanks to MULTILINE parameter)
''', re.VERBOSE | re.MULTILINE)
#Regular expression for finding format elements "see also" attribute (defined by @see)
pattern_format_element_seealso = re.compile('''@see\s*(?P.*)''', re.VERBOSE | re.MULTILINE)
#Regular expression for finding 2 expressions in quotes, separated by comma (as in template("1st","2nd") )
#Used when parsing output formats
## pattern_parse_tuple_in_quotes = re.compile('''
## (?P[\'"])
## (?P.*)
## (?P=sep1)
## \s*,\s*
## (?P[\'"])
## (?P.*)
## (?P=sep2)
## ''', re.VERBOSE | re.MULTILINE)
def call_old_bibformat(recID, format="HD"):
"""
FIXME: REMOVE FUNCTION WHEN MIGRATION IS DONE
Calls BibFormat for the record RECID in the desired output format FORMAT.
Note: this functions always try to return HTML, so when
bibformat returns XML with embedded HTML format inside the tag
FMT $g, as is suitable for prestoring output formats, we
perform un-XML-izing here in order to return HTML body only.
"""
# look for formatted notice existence:
- query = "SELECT value FROM bibfmt WHERE id_bibrec='%s' AND format='%s'" % (recID, of)
+ query = "SELECT value FROM bibfmt WHERE id_bibrec='%s' AND format='%s'" % (recID, format)
res = run_sql(query, None, 1)
if res:
# record 'recID' is formatted in 'format', so print it
decompress = zlib.decompress
return "%s" % decompress(res[0][0])
else:
# record 'recID' is not formatted in 'format', so try to call BibFormat on the fly or use default format:
out = ""
pipe_input, pipe_output, pipe_error = os.popen3(["%s/bibformat" % bindir, "otype=%s" % format], 'rw')
#pipe_input.write(print_record(recID, "xm"))
pipe_input.write(record_get_xml(recID, "xm"))
pipe_input.close()
bibformat_output = pipe_output.read()
pipe_output.close()
pipe_error.close()
if bibformat_output.startswith(""):
dom = minidom.parseString(bibformat_output)
for e in dom.getElementsByTagName('subfield'):
if e.getAttribute('code') == 'g':
for t in e.childNodes:
out += t.data.encode('utf-8')
else:
out = bibformat_output
return out
def format_record(recID, of, ln=cdslang, verbose=0, search_pattern=[], xml_record=None, uid=None):
"""
Formats a record given output format. Main entry function of bibformat engine.
Returns a formatted version of the record in
the specified language, search pattern, and with the specified output format.
The function will define which format template must be applied.
You can either specify an record ID to format, or give its xml representation.
if 'xml_record' != None, then use it instead of recID.
'uid' allows to grant access to some functionalities on a page depending
on the user's priviledges.
@param recID the ID of record to format
@param of an output format code (or short identifier for the output format)
@param ln the language to use to format the record
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings, stop if error in format elements
9: errors and warnings, stop if error (debug mode ))
@param search_pattern list of strings representing the user request in web interface
@param xml_record an xml string representing the record to format
@param uid the user id of the person who will view the formatted page
@return formatted record
"""
errors_ = []
# Temporary workflow (during migration of formats):
# Call new BibFormat
# But if format not found for new BibFormat, then call old BibFormat
#Create a BibFormat Object to pass that contain record and context
bfo = BibFormatObject(recID, ln, search_pattern, xml_record, uid)
#Find out which format template to use based on record and output format.
template = decide_format_template(bfo, of)
- if template == None:
- ############### FIXME: REMOVE WHEN MIGRATION IS DONE ###############
+ ############### FIXME: REMOVE WHEN MIGRATION IS DONE ###############
+ path = "%s%s%s" % (templates_path, os.sep, template)
+ if template == None or not os.access(path, os.R_OK):
# template not found in new BibFormat. Call old one
if php:
return call_old_bibformat(recID, format=of)
- ############################# END ##################################
+ ############################# END ##################################
error = get_msgs_for_code_list([("ERR_BIBFORMAT_NO_TEMPLATE_FOUND", of)],
file='error', ln=cdslang)
errors_.append(error)
if verbose == 0:
register_errors(error, 'error')
elif verbose > 5:
return error[0][1]
return ""
#Format with template
(out, errors) = format_with_format_template(template, bfo, verbose)
errors_.extend(errors)
return out
def decide_format_template(bfo, of):
"""
Returns the format template name that should be used for formatting
given output format and BibFormatObject.
Look at of rules, and take the first matching one.
If no rule matches, returns None
To match we ignore lettercase and spaces before and after value of
rule and value of record
@param bfo a BibFormatObject
@param of the code of the output format to use
"""
output_format = get_output_format(of)
for rule in output_format['rules']:
value = bfo.field(rule['field']).strip()#Remove spaces
pattern = rule['value'].strip() #Remove spaces
if re.match(pattern, value, re.IGNORECASE) != None:
return rule['template']
template = output_format['default']
if template != '':
return template
else:
return None
def format_with_format_template(format_template_filename, bfo, verbose=0, format_template_code=None):
"""
Format a record given a format template. Also returns errors
Returns a formatted version of the record represented by bfo,
in the language specified in bfo, and with the specified format template.
Parameter format_template_filename will be ignored if format_template_code is provided.
This allows to preview format code without having to save file on disk
@param format_template_filename the dilename of a format template
@param bfo the object containing parameters for the current formatting
@param format_template_code if not empty, use code as template instead of reading format_template_filename (used for previews)
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return tuple (formatted text, errors)
"""
errors_ = []
if format_template_code != None:
format_content = str(format_template_code)
else:
format_content = get_format_template(format_template_filename)['code']
localized_format = filter_languages(format_content, bfo.lang)
(evaluated_format, errors) = eval_format_template_elements(localized_format, bfo, verbose)
errors_ = errors
return (evaluated_format, errors)
def eval_format_template_elements(format_template, bfo, verbose=0):
"""
Evalutes the format elements of the given template and replace each element with its value.
Also returns errors.
Prepare the format template content so that we can directly replace the marc code by their value.
This implies: 1) Look for special tags
2) replace special tags by their evaluation
@param format_template the format template code
@param bfo the object containing parameters for the current formatting
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return tuple (result, errors)
"""
errors_ = []
#First define insert_element_code(match), used in re.sub() function
def insert_element_code(match):
"""
Analyses 'match', interpret the corresponding code, and return the result of the evaluation.
Called by substitution in 'eval_format_template_elements(...)'
@param match a match object corresponding to the special tag that must be interpreted
"""
function_name = match.group("function_name")
format_element = get_format_element(function_name, verbose)
params = {}
#look for function parameters given in format template code
all_params = match.group('params')
if all_params != None:
function_params_iterator = pattern_function_params.finditer(all_params)
for param_match in function_params_iterator:
name = param_match.group('param')
value = param_match.group('value')
params[name] = value
#Evaluate element with params and return (Do not return errors)
(result, errors) = eval_format_element(format_element, bfo, params, verbose)
errors_ = errors
return result
#Substitute special tags in the format by our own text.
#Special tags have the form
format = pattern_tag.sub(insert_element_code, format_template)
return (format, errors_)
def eval_format_element(format_element, bfo, parameters={}, verbose=0):
"""
Returns the result of the evaluation of the given format element
name, with given BibFormatObject and parameters. Also returns
the errors of the evaluation.
@param format_element a format element structure as returned by get_format_element
@param bfo a BibFormatObject used for formatting
@param parameters a dict of parameters to be used for formatting. Key is parameter and value is value of parameter
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return tuple (result, errors)
"""
errors = []
#Load special values given as parameters
prefix = parameters.get('prefix', "")
suffix = parameters.get('suffix', "")
default_value = parameters.get('default', "")
#3 possible cases:
#a) format element file is found: we execute it
#b) format element file is not found, but exist in tag table (e.g. bfe_isbn)
#c) format element is totally unknown. Do nothing or report error
if format_element != None and format_element['type'] == "python":
#a)
#We found an element with the tag name, of type "python"
#Prepare a dict 'params' to pass as parameter to 'format' function of element
params = {}
#look for parameters defined in format element
#fill them with specified default values and values
#given as parameters
for param in format_element['attrs']['params']:
name = param['name']
default = param['default']
params[name] = parameters.get(name, default)
#Add BibFormatObject
params['bfo'] = bfo
#execute function with given parameters and return result.
output_text = ""
function = format_element['code']
output_text = apply(function, (), params)
try:
output_text = apply(function, (), params)
except Exception, e:
output_text = ""
name = format_element['attrs']['name']
error = ("ERR_BIBFORMAT_EVALUATING_ELEMENT", name, str(params))
errors.append(error)
if verbose == 0:
register_errors(errors, 'error')
elif verbose >=5:
tb = sys.exc_info()[2]
error_string = get_msgs_for_code_list(error, file='error', ln=cdslang)
stack = traceback.format_exception(Exception, e, tb, limit=None)
output_text = ''+error_string[0][1] + "".join(stack) +' '
if output_text == None:
output_text = ""
else:
output_text = str(output_text)
#Add prefix and suffix if they have been given as parameters and if
#the evaluation of element is not empty
if output_text.strip() != "":
output_text = prefix + output_text + suffix
#Add the default value if output_text is empty
if output_text == "":
output_text = default_value
return (output_text, errors)
elif format_element != None and format_element['type'] =="field":
#b)
#We have not found an element in files that has the tag name. Then look for it
#in the table "tag"
#
#
#
#Load special values given as parameters
separator = parameters.get('separator ', "")
nbMax = parameters.get('nbMax', "")
#Get the fields tags that have to be printed
tags = format_element['attrs']['tags']
output_text = []
#Get values corresponding to tags
for tag in tags:
values = bfo.fields(tag)#Retrieve record values for tag
if len(values)>0 and isinstance(values[0], dict):#flatten dict to its values only
values_list = map(lambda x: x.values(), values)
#output_text.extend(values)
for values in values_list:
output_text.extend(values)
else:
output_text.extend(values)
if nbMax != "":
try:
nbMax = int(nbMax)
output_text = output_text[:nbMax]
except:
name = format_element['attrs']['name']
error = ("ERR_BIBFORMAT_NBMAX_NOT_INT", name)
errors.append(error)
if verbose < 5:
register_errors(error, 'error')
elif verbose >=5:
error_string = get_msgs_for_code_list(error, file='error', ln=cdslang)
output_text = output_text.append(error_string[0][1])
#Add prefix and suffix if they have been given as parameters and if
#the evaluation of element is not empty.
#If evaluation is empty string, return default value if it exists. Else return empty string
if ("".join(output_text)).strip() != "":
return (prefix + separator.join(output_text) + suffix, errors)
else:
#Return default value
return (default_value, errors)
else:
#c) Element is unknown
error = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_RESOLVE_ELEMENT_NAME", format_element)],
file='error', ln=cdslang)
errors.append(error)
if verbose < 5:
register_errors(error, 'error')
return ("", errors)
elif verbose >=5:
if verbose >= 9:
sys.exit(error[0][1])
return (''+error[0][1]+'', errors)
def filter_languages(format_template, ln='en'):
"""
Filters the language tags that do not correspond to the specified language.
@param format_template the format template code
@param ln the language that is NOT filtered out from the template
@return the format template with unnecessary languages filtered out
"""
#First define search_lang_tag(match) and clean_language_tag(match), used
#in re.sub() function
def search_lang_tag(match):
"""
Searches for the ... tag and remove inner localized tags
such as , , that are not current_lang.
If current_lang cannot be found inside ... , try to use 'cdslang'
@param match a match object corresponding to the special tag that must be interpreted
"""
current_lang = ln
def clean_language_tag(match):
"""
Return tag text content if tag language of match is output language.
Called by substitution in 'filter_languages(...)'
@param match a match object corresponding to the special tag that must be interpreted
"""
if match.group(1) == current_lang:
return match.group(2)
else:
return ""
#End of clean_language_tag
lang_tag_content = match.group("langs")
#Try to find tag with current lang. If it does not exists, then current_lang
#becomes cdslang until the end of this replace
pattern_current_lang = re.compile(r"<"+current_lang+"\s*>(.*?)"+current_lang+"\s*>")
if re.search(pattern_current_lang, lang_tag_content) == None:
current_lang = cdslang
cleaned_lang_tag = ln_pattern.sub(clean_language_tag, lang_tag_content)
return cleaned_lang_tag
#End of search_lang_tag
filtered_format_template = pattern_lang.sub(search_lang_tag, format_template)
return filtered_format_template
def parse_tag(tag):
"""
Parse a marc code and decompose it in a table with: 0-tag 1-indicator1 2-indicator2 3-subfield
The first 3 chars always correspond to tag.
The indicators are optional. However they must both be indicated, or both ommitted.
If indicators are ommitted or indicated with underscore '_', they mean "No indicator".
The subfield is optional. It can optionally be preceded by a dot '.' or '$$' or '$'
Any of the chars can be replaced by wildcard %
THE FUNCTION DOES NOT CHECK WELLFORMNESS OF 'tag'
Any empty chars is not considered
For example:
>> parse_tag('245COc') = ['245', 'C', 'O', 'c']
>> parse_tag('245C_c') = ['245', 'C', '', 'c']
>> parse_tag('245__c') = ['245', '', '', 'c']
>> parse_tag('245__$$c') = ['245', '', '', 'c']
>> parse_tag('245__$c') = ['245', '', '', 'c']
>> parse_tag('245 $c') = ['245', '', '', 'c']
>> parse_tag('245 $$c') = ['245', '', '', 'c']
>> parse_tag('245__.c') = ['245', '', '', 'c']
>> parse_tag('245 .c') = ['245', '', '', 'c']
>> parse_tag('245C_$c') = ['245', 'C', '', 'c']
>> parse_tag('245CO$$c') = ['245', 'C', 'O', 'c']
>> parse_tag('245C_.c') = ['245', 'C', '', 'c']
>> parse_tag('245$c') = ['245', '', '', 'c']
>> parse_tag('245.c') = ['245', '', '', 'c']
>> parse_tag('245$$c') = ['245', '', '', 'c']
>> parse_tag('245__%') = ['245', '', '', '']
>> parse_tag('245__$$%') = ['245', '', '', '']
>> parse_tag('245__$%') = ['245', '', '', '']
>> parse_tag('245 $%') = ['245', '', '', '']
>> parse_tag('245 $$%') = ['245', '', '', '']
>> parse_tag('245$%') = ['245', '', '', '']
>> parse_tag('245.%') = ['245', '', '', '']
>> parse_tag('245$$%') = ['245', '', '', '']
>> parse_tag('2%5$$a') = ['2%5', '', '', 'a']
"""
p_tag = ['', '', '', '']
tag = tag.replace(" ", "") #Remove empty characters
tag = tag.replace("$", "") #Remove $ characters
tag = tag.replace(".", "") #Remove . characters
#tag = tag.replace("_", "") #Remove _ characters
p_tag[0] = tag[0:3] #tag
if len(tag) == 4:
p_tag[3] = tag[3] #subfield
elif len(tag) == 5:
ind1 = tag[3]#indicator 1
if ind1 != "_":
p_tag[1] = ind1
ind2 = tag[4]#indicator 2
if ind2 != "_":
p_tag[2] = ind2
elif len(tag) == 6:
p_tag[3] = tag[5]#subfield
ind1 = tag[3]#indicator 1
if ind1 != "_":
p_tag[1] = ind1
ind2 = tag[4]#indicator 2
if ind2 != "_":
p_tag[2] = ind2
return p_tag
def get_format_template(filename, with_attributes=False):
"""
Returns the structured content of the given formate template.
if 'with_attributes' is True, returns the name and description. Else 'attrs' is not
returned as key in dictionary (it might, if it has already been loaded previously)
{'code':"Some template code"
'attrs': {'name': "a name", 'description': "a description"}
}
@param filename the filename of an format template
@param with_attributes if True, fetch the attributes (names and description) for format'
@return strucured content of format template
"""
#Get from cache whenever possible
global format_templates_cache
if not filename.endswith("."+format_template_extension):
return None
if format_templates_cache.has_key(filename):
#If we must return with attributes and template exist in cache with attributes
#then return cache. Else reload with attributes
if with_attributes == True and format_templates_cache[filename].has_key('attrs'):
return format_templates_cache[filename]
format_template = {'code':""}
try:
path = "%s%s%s" % (templates_path, os.sep, filename)
format_file = open(path)
format_content = format_file.read()
format_file.close()
#Load format template code
#Remove name and description
code_and_description = pattern_format_template_name.sub("", format_content)
code = pattern_format_template_desc.sub("", code_and_description)
# Escape % chars in code (because we will use python formatting capabilities)
format_template['code'] = code
except Exception, e:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_READ_TEMPLATE_FILE", filename, str(e))],
file='error', ln=cdslang)
register_errors(errors, 'error')
#Save attributes if necessary
if with_attributes:
format_template['attrs'] = get_format_template_attrs(filename)
#cache and return
format_templates_cache[filename] = format_template
return format_template
def get_format_templates(with_attributes=False):
"""
Returns the list of all format templates
if 'with_attributes' is True, returns the name and description. Else 'attrs' is not
returned as key in each dictionary (it might, if it has already been loaded previously)
[{'code':"Some template code"
'attrs': {'name': "a name", 'description': "a description"}
},
...
}
@param with_attributes if True, fetch the attributes (names and description) for formats
"""
format_templates = {}
files = os.listdir(templates_path)
for filename in files:
if filename.endswith("."+format_template_extension):
format_templates[filename] = get_format_template(filename, with_attributes)
return format_templates
def get_format_template_attrs(filename):
"""
Returns the attributes of the format template with given filename
The attributes are {'name', 'description'}
Caution: the function does not check that path exists or
that the format element is valid.
@param the path to a format element
"""
attrs = {}
attrs['name'] = ""
attrs['description'] = ""
try:
template_file = open("%s%s%s"%(templates_path, os.sep, filename))
code = template_file.read()
template_file.close()
match = pattern_format_template_name.search(code)
if match != None:
attrs['name'] = match.group('name')
else:
attrs['name'] = filename
match = pattern_format_template_desc.search(code)
if match != None:
attrs['description'] = match.group('desc').rstrip('.')
except Exception, e:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_READ_TEMPLATE_FILE", filename, str(e))],
file='error', ln=cdslang)
register_errors(errors, 'error')
attrs['name'] = filename
return attrs
def get_format_element(element_name, verbose=0, with_built_in_params=False):
"""
Returns the format element structured content.
Return None if element cannot be loaded (file not found, not readable or
invalid)
The returned structure is {'attrs': {some attributes in dict. See get_format_element_attrs_from_*}
'code': the_function_code,
'type':"field" or "python" depending if element is defined in file or table}
@param element_name the name of the format element to load
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@param with_built_in_params if True, load the parameters built in all elements
@return a dictionary with format element attributes
"""
#Get from cache whenever possible
global format_elements_cache
#Resolve filename and prepare 'name' as key for the cache
filename = resolve_format_element_filename(element_name)
if filename != None:
name = filename.upper()
else:
name = element_name.upper()
if format_elements_cache.has_key(name):
element = format_elements_cache[name]
if with_built_in_params == False or (with_built_in_params == True and element['attrs'].has_key('builtin_params') ):
return element
if filename == None:
#element is maybe in tag table
if bibformat_dblayer.tag_exists_for_name(element_name):
format_element = {'attrs': get_format_element_attrs_from_table(element_name, with_built_in_params),
'code':None,
'type':"field"}
#Cache and returns
format_elements_cache[name] = format_element
return format_element
else:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_FORMAT_ELEMENT_NOT_FOUND", element_name)],
file='error', ln=cdslang)
if verbose == 0:
register_errors(errors, 'error')
elif verbose >=5:
sys.stderr.write(errors[0][1])
return None
else:
format_element = {}
module_name = filename
if module_name.endswith(".py"):
module_name = module_name[:-3]
try:
module = __import__(elements_import_path+"."+module_name)
#Load last module in import path
#For eg. load bibformat_elements in invenio.elements.bibformat_element
#Used to keep flexibility regarding where elements directory is (for eg. test cases)
components = elements_import_path.split(".")
for comp in components[1:]:
module = getattr(module, comp)
function_format = module.__dict__[module_name].format
format_element['code'] = function_format
format_element['attrs'] = get_format_element_attrs_from_function(function_format,
element_name,
with_built_in_params)
format_element['type'] = "python"
#cache and return
format_elements_cache[name] = format_element
return format_element
except Exception, e:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_FORMAT_ELEMENT_NOT_FOUND", element_name)],
file='error', ln=cdslang)
if verbose == 0:
register_errors(errors, 'error')
elif verbose >= 5:
sys.stderr.write(str(e))
sys.stderr.write(errors[0][1])
if verbose >= 7:
raise e
return None
def get_format_elements(with_built_in_params=False):
"""
Returns the list of format elements attributes as dictionary structure
Elements declared in files have priority over element declared in 'tag' table
The returned object has this format:
{element_name1: {'attrs': {'description':..., 'seealso':...
'params':[{'name':..., 'default':..., 'description':...}, ...]
'builtin_params':[{'name':..., 'default':..., 'description':...}, ...]
},
'code': code_of_the_element
},
element_name2: {...},
...}
Returns only elements that could be loaded (not error in code)
@return a dict of format elements with name as key, and a dict as attributes
@param with_built_in_params if True, load the parameters built in all elements
"""
format_elements = {}
mappings = bibformat_dblayer.get_all_name_tag_mappings()
for name in mappings:
format_elements[name.upper().replace(" ", "_").strip()] = get_format_element(name, with_built_in_params=with_built_in_params)
files = os.listdir(elements_path)
for filename in files:
filename_test = filename.upper().replace(" ", "_")
if filename_test.endswith(".PY") and filename != "__INIT__.PY":
if filename_test.startswith("BFE_"):
filename_test = filename_test[4:]
element_name = filename_test[:-3]
element = get_format_element(element_name, with_built_in_params=with_built_in_params)
if element != None:
format_elements[element_name] = element
return format_elements
def get_format_element_attrs_from_function(function, element_name, with_built_in_params=False):
"""
Returns the attributes of the function given as parameter.
It looks for standard parameters of the function, default
values and comments in the docstring.
The attributes are {'description', 'seealso':['element.py', ...],
'params':{name:{'name', 'default', 'description'}, ...], name2:{}}
The attributes are {'name' : "name of element" #basically the name of 'name' parameter
'description': "a string description of the element",
'seealso' : ["element_1.py", "element_2.py", ...] #a list of related elements
'params': [{'name':"param_name", #a list of parameters for this element (except 'bfo')
'default':"default value",
'description': "a description"}, ...],
'builtin_params': {name: {'name':"param_name",#the parameters builtin for all elem of this kind
'default':"default value",
'description': "a description"}, ...},
}
@param function the formatting function of a format element
@param element_name the name of the element
@param with_built_in_params if True, load the parameters built in all elements
"""
attrs = {}
attrs['description'] = ""
attrs['name'] = element_name.replace(" ", "_").upper()
attrs['seealso'] = []
docstring = function.__doc__
if isinstance(docstring, str):
#Look for function description in docstring
#match = pattern_format_element_desc.search(docstring)
description = docstring.split("@param")[0]
description = description.split("@see")[0]
attrs['description'] = description.strip().rstrip('.')
#Look for @see in docstring
match = pattern_format_element_seealso.search(docstring)
if match != None:
elements = match.group('see').rstrip('.').split(",")
for element in elements:
attrs['seealso'].append(element.strip())
params = {}
#Look for parameters in function definition
(args, varargs, varkw, defaults) = inspect.getargspec(function)
#Prepare args and defaults_list such that we can have a mapping from args to defaults
args.reverse()
if defaults != None:
defaults_list = list(defaults)
defaults_list.reverse()
else:
defaults_list = []
for arg, default in map(None, args, defaults_list):
if arg == "bfo":
continue #Don't keep this as parameter. It is hidden to users, and exists in all elements of this kind
param = {}
param['name'] = arg
if default == None:
param['default'] = "" #In case no check is made inside element, we prefer to print "" (nothing) than None in output
else:
param['default'] = default
param['description'] = "(no description provided)"
params[arg] = param
if isinstance(docstring, str):
#Look for @param descriptions in docstring.
#Add description to existing parameters in params dict
params_iterator = pattern_format_element_params.finditer(docstring)
for match in params_iterator:
name = match.group('name')
if params.has_key(name):
params[name]['description'] = match.group('desc').rstrip('.')
attrs['params'] = params.values()
#Load built-in parameters if necessary
if with_built_in_params == True:
builtin_params = []
#Add 'prefix' parameter
param_prefix = {}
param_prefix['name'] = "prefix"
param_prefix['default'] = ""
param_prefix['description'] = "A prefix printed only if the record has a value for this element"
builtin_params.append(param_prefix)
#Add 'suffix' parameter
param_suffix = {}
param_suffix['name'] = "suffix"
param_suffix['default'] = ""
param_suffix['description'] = "A suffix printed only if the record has a value for this element"
builtin_params.append(param_suffix)
#Add 'default' parameter
param_default = {}
param_default['name'] = "default"
param_default['default'] = ""
param_default['description'] = "A default value printed if the record has no value for this element"
builtin_params.append(param_default)
attrs['builtin_params'] = builtin_params
return attrs
def get_format_element_attrs_from_table(element_name, with_built_in_params=False):
"""
Returns the attributes of the format element with given name in 'tag' table.
Returns None if element_name does not exist in tag table.
The attributes are {'name' : "name of element" #basically the name of 'element_name' parameter
'description': "a string description of the element",
'seealso' : [] #a list of related elements. Always empty in this case
'params': [], #a list of parameters for this element. Always empty in this case
'builtin_params': [{'name':"param_name", #the parameters builtin for all elem of this kind
'default':"default value",
'description': "a description"}, ...],
'tags':["950.1", 203.a] #the list of tags printed by this element
}
@param element_name an element name in database
@param element_name the name of the element
@param with_built_in_params if True, load the parameters built in all elements
"""
attrs = {}
tags = bibformat_dblayer.get_tags_from_name(element_name)
field_label = "field"
if len(tags)>1:
field_label = "fields"
attrs['description'] = "Prints %s %s of the record" % (field_label, ", ".join(tags))
attrs['name'] = element_name.replace(" ", "_").upper()
attrs['seealso'] = []
attrs['params'] = []
attrs['tags'] = tags
#Load built-in parameters if necessary
if with_built_in_params == True:
builtin_params = []
#Add 'prefix' parameter
param_prefix = {}
param_prefix['name'] = "prefix"
param_prefix['default'] = ""
param_prefix['description'] = "A prefix printed only if the record has a value for this element"
builtin_params.append(param_prefix)
#Add 'suffix' parameter
param_suffix = {}
param_suffix['name'] = "suffix"
param_suffix['default'] = ""
param_suffix['description'] = "A suffix printed only if the record has a value for this element"
builtin_params.append(param_suffix)
#Add 'separator' parameter
param_separator = {}
param_separator['name'] = "separator"
param_separator['default'] = " "
param_separator['description'] = "A separator between elements of the field"
builtin_params.append(param_separator)
#Add 'nbMax' parameter
param_nbMax = {}
param_nbMax['name'] = "nbMax"
param_nbMax['default'] = ""
param_nbMax['description'] = "The maximum number of values to print for this element. No limit if not specified"
builtin_params.append(param_nbMax)
#Add 'default' parameter
param_default = {}
param_default['name'] = "default"
param_default['default'] = ""
param_default['description'] = "A default value printed if the record has no value for this element"
builtin_params.append(param_default)
attrs['builtin_params'] = builtin_params
return attrs
def get_output_format(code, with_attributes=False, verbose=0):
"""
Returns the structured content of the given output format
If 'with_attributes' is True, also returns the names and description of the output formats,
else 'attrs' is not returned in dict (it might, if it has already been loaded previously).
if output format corresponding to 'code' is not found return an empty structure.
See get_output_format_attrs() to learn more on the attributes
{'rules': [ {'field': "980__a",
'value': "PREPRINT",
'template': "filename_a.bft",
},
{...}
],
'attrs': {'names': {'generic':"a name", 'sn':{'en': "a name", 'fr':"un nom"}, 'ln':{'en':"a long name"}}
'description': "a description"
'code': "fnm1",
'content_type': "application/ms-excel"
}
'default':"filename_b.bft"
}
@param code the code of an output_format
@param with_attributes if True, fetch the attributes (names and description) for format
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return strucured content of output format
"""
output_format = {'rules':[], 'default':""}
filename = resolve_output_format_filename(code, verbose)
if filename == None:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_OUTPUT_FORMAT_CODE_UNKNOWN", code)],
file='error', ln=cdslang)
register_errors(errors, 'error')
if with_attributes == True: #Create empty attrs if asked for attributes
output_format['attrs'] = get_output_format_attrs(code, verbose)
return output_format
#Get from cache whenever possible
global format_outputs_cache
if format_outputs_cache.has_key(filename):
#If was must return with attributes but cache has not attributes, then load attributes
if with_attributes == True and not format_outputs_cache[filename].has_key('attrs'):
format_outputs_cache[filename]['attrs'] = get_output_format_attrs(code, verbose)
return format_outputs_cache[filename]
try:
if with_attributes == True:
output_format['attrs'] = get_output_format_attrs(code, verbose)
path = "%s%s%s" % (outputs_path, os.sep, filename )
format_file = open(path)
current_tag = ''
for line in format_file:
line = line.strip()
if line == "":
#ignore blank lines
continue
if line.endswith(":"):
#retrieve tag
clean_line = line.rstrip(": \n\r") #remove : spaces and eol at the end of line
current_tag = "".join(clean_line.split()[1:]).strip() #the tag starts at second position
elif line.find('---') != -1:
words = line.split('---')
template = words[-1].strip()
condition = ''.join(words[:-1])
value = ""
output_format['rules'].append({'field': current_tag,
'value': condition,
'template': template,
})
elif line.find(':') != -1:
#Default case
default = line.split(':')[1].strip()
output_format['default'] = default
except Exception, e:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_READ_OUTPUT_FILE", filename, str(e))],
file='error', ln=cdslang)
register_errors(errors, 'error')
#cache and return
format_outputs_cache[filename] = output_format
return output_format
def get_output_format_attrs(code, verbose=0):
"""
Returns the attributes of an output format.
The attributes contain 'code', which is the short identifier of the output format
(to be given as parameter in format_record function to specify the output format),
'description', a description of the output format, and 'names', the localized names
of the output format. If 'content_type' is specified then the search_engine will
send a file with this content type and with result of formatting as content to the user.
The 'names' dict always contais 'generic', 'ln' (for long name) and 'sn' (for short names)
keys. 'generic' is the default name for output format. 'ln' and 'sn' contain long and short
localized names of the output format. Only the languages for which a localization exist
are used.
{'names': {'generic':"a name", 'sn':{'en': "a name", 'fr':"un nom"}, 'ln':{'en':"a long name"}}
'description': "a description"
'code': "fnm1",
'content_type': "application/ms-excel"
}
@param code the short identifier of the format
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return strucured content of output format attributes
"""
if code.endswith("."+format_output_extension):
code = code[:-(len(format_output_extension) + 1)]
attrs = {'names':{'generic':"",
'ln':{},
'sn':{}},
'description':'',
'code':code.upper(),
'content_type':""}
filename = resolve_output_format_filename(code, verbose)
if filename == None:
return attrs
attrs['names'] = bibformat_dblayer.get_output_format_names(code)
attrs['description'] = bibformat_dblayer.get_output_format_description(code)
attrs['content_type'] = bibformat_dblayer.get_output_format_content_type(code)
return attrs
def get_output_formats(with_attributes=False):
"""
Returns the list of all output format, as a dictionary with their filename as key
If 'with_attributes' is True, also returns the names and description of the output formats,
else 'attrs' is not returned in dicts (it might, if it has already been loaded previously).
See get_output_format_attrs() to learn more on the attributes
{'filename_1.bfo': {'rules': [ {'field': "980__a",
'value': "PREPRINT",
'template': "filename_a.bft",
},
{...}
],
'attrs': {'names': {'generic':"a name", 'sn':{'en': "a name", 'fr':"un nom"}, 'ln':{'en':"a long name"}}
'description': "a description"
'code': "fnm1"
}
'default':"filename_b.bft"
},
'filename_2.bfo': {...},
...
}
@return the list of output formats
"""
output_formats = {}
files = os.listdir(outputs_path)
for filename in files:
if filename.endswith("."+format_output_extension):
code = "".join(filename.split(".")[:-1])
output_formats[filename] = get_output_format(code, with_attributes)
return output_formats
def get_kb_mapping(kb, string, default=""):
"""
Returns the value of the string' in the knowledge base 'kb'.
If kb does not exist or string does not exist in kb, returns 'default'
string value.
@param kb a knowledge base name
@param string a key in a knowledge base
@param default a default value if 'string' is not in 'kb'
@return the value corresponding to the given string in given kb
"""
global kb_mappings_cache
if kb_mappings_cache.has_key(kb):
kb_cache = kb_mappings_cache[kb]
if kb_cache.has_key(string):
value = kb_mappings_cache[kb][string]
if value == None:
return default
else:
return value
else:
#Precreate for caching this kb
kb_mappings_cache[kb] = {}
value = bibformat_dblayer.get_kb_mapping_value(kb, string)
kb_mappings_cache[kb][str(string)] = value
if value == None:
return default
else:
return value
def resolve_format_element_filename(string):
"""
Returns the filename of element corresponding to string
This is necessary since format templates code call
elements by ignoring case, for eg. is the
same as .
It is also recommended that format elements filenames are
prefixed with bfe_ . We need to look for these too.
The name of the element has to start with "BFE_".
@param name a name for a format element
@return the corresponding filename, with right case
"""
if not string.endswith(".py"):
name = string.replace(" ", "_").upper() +".PY"
else:
name = string.replace(" ", "_").upper()
files = os.listdir(elements_path)
for filename in files:
test_filename = filename.replace(" ", "_").upper()
if test_filename == name or \
test_filename == "BFE_" + name or \
"BFE_" + test_filename == name:
return filename
#No element with that name found
#Do not log error, as it might be a normal execution case:
#element can be in database
return None
def resolve_output_format_filename(code, verbose=0):
"""
Returns the filename of output corresponding to code
This is necessary since output formats names are not case sensitive
but most file systems are.
@param code the code for an output format
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return the corresponding filename, with right case, or None if not found
"""
code = re.sub(r"[^.0-9a-zA-Z]", "", code) #Remove non alphanumeric chars (except .)
if not code.endswith("."+format_output_extension):
code = re.sub(r"\W", "", code)
code += "."+format_output_extension
files = os.listdir(outputs_path)
for filename in files:
if filename.upper() == code.upper():
return filename
#No output format with that name found
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_RESOLVE_OUTPUT_NAME", code)],
file='error', ln=cdslang)
if verbose == 0:
register_errors(errors, 'error')
elif verbose >= 5:
sys.stderr.write(errors[0][1])
if verbose >= 9:
sys.exit(errors[0][1])
return None
def get_fresh_format_template_filename(name):
"""
Returns a new filename and name for template with given name.
Used when writing a new template to a file, so that the name
has no space, is unique in template directory
Returns (unique_filename, modified_name)
@param a name for a format template
@return the corresponding filename, and modified name if necessary
"""
#name = re.sub(r"\W", "", name) #Remove non alphanumeric chars
name = name.replace(" ", "_")
filename = name
filename = re.sub(r"[^.0-9a-zA-Z]", "", filename) #Remove non alphanumeric chars (except .)
path = templates_path + os.sep + filename + "." + format_template_extension
index = 1
while os.path.exists(path):
index += 1
filename = name + str(index)
path = templates_path + os.sep + filename + "." + format_template_extension
if index > 1:
returned_name = (name + str(index)).replace("_", " ")
else:
returned_name = name.replace("_", " ")
return (filename + "." + format_template_extension, returned_name) #filename.replace("_", " "))
def get_fresh_output_format_filename(code):
"""
Returns a new filename for output format with given code.
Used when writing a new output format to a file, so that the code
has no space, is unique in output format directory. The filename
also need to be at most 6 chars long, as the convention is that
filename == output format code (+ .extension)
We return an uppercase code
Returns (unique_filename, modified_code)
@param code the code of an output format
@return the corresponding filename, and modified code if necessary
"""
#code = re.sub(r"\W", "", code) #Remove non alphanumeric chars
code = code.upper().replace(" ", "_")
code = re.sub(r"[^.0-9a-zA-Z]", "", code) #Remove non alphanumeric chars (except .)
if len(code) > 6:
code = code[:6]
filename = code
path = outputs_path + os.sep + filename + "." + format_output_extension
index = 2
while os.path.exists(path):
filename = code + str(index)
if len(filename) > 6:
filename = code[:-(len(str(index)))]+str(index)
index += 1
path = outputs_path + os.sep + filename + "." + format_output_extension
#We should not try more than 99999... Well I don't see how we could get there.. Sanity check.
if index >= 99999:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_NB_OUTPUTS_LIMIT_REACHED", code)],
file='error', ln=cdslang)
register_errors(errors, 'error')
sys.exit("Output format cannot be named as %s"%code)
return (filename + "." + format_output_extension, filename)
def clear_caches():
"""
Clear the caches (Output Format, Format Templates and Format Elements)
"""
global format_templates_cache, format_elements_cache , format_outputs_cache, kb_mappings_cache
format_templates_cache = {}
format_elements_cache = {}
format_outputs_cache = {}
kb_mappings_cache = {}
class BibFormatObject:
"""
An object that encapsulates a record and associated methods, and that is given
as parameter to all format elements 'format' function.
The object is made specifically for a given formatting, i.e. it includes
for example the language for the formatting.
The object provides basic accessors to the record. For full access, one can get
the record with get_record() and then use BibRecord methods on the returned object.
"""
#The record
record = None
#The language in which the formatting has to be done
lang = cdslang
#A list of string describing the context in which the record has to be formatted.
#It represents the words of the user request in web interface search
search_pattern = []
#The id of the record
recID = 0
#The user id of the person who will view the formatted page (if applicable)
#This allows for example to print a "edit record" link for people
#who have right to edit a record.
uid = None
def __init__(self, recID, ln=cdslang, search_pattern=[], xml_record=None, uid=None):
"""
Creates a new bibformat object, with given record.
You can either specify an record ID to format, or give its xml representation.
if 'xml_record' != None, use 'xml_record' instead of recID for the record.
'uid' allows to grant access to some functionalities on a page depending
on the user's priviledges.
@param recID the id of a record
@param ln the language in which the record has to be formatted
@param search_pattern list of string representing the request used by the user in web interface
@param xml_record a xml string of the record to format
@param uid the user id of the person who will view the formatted page
"""
if xml_record != None:
#If record is given as parameter
self.record = create_record(xml_record)[0]
recID = record_get_field_value(self.record,"001")
self.lang = wash_language(ln)
self.search_pattern = search_pattern
self.recID = recID
self.uid = uid
def get_record(self):
"""
Returns the record of this BibFormatObject instance
@return the record structure as returned by BibRecord
"""
#Create record if necessary
if self.record == None:
record = create_record(record_get_xml(self.recID, 'xm'))
self.record = record[0]
return self.record
def control_field(self, tag):
"""
Returns the value of control field given by tag in record
@param record the record to retrieve values from
@param tag the marc code of a field
@return value of field tag in record
"""
if self.get_record() == None: #Case where BibRecord could not parse object
return ''
p_tag = parse_tag(tag)
return record_get_field_value(self.get_record(),
p_tag[0],
p_tag[1],
p_tag[2],
p_tag[3])
def field(self, tag):
"""
Returns the value of the field corresponding to tag in the
current record.
if the value does not exist, return empty string
@param record the record to retrieve values from
@param tag the marc code of a field
@return value of field tag in record
"""
list_of_fields = self.fields(tag)
if len(list_of_fields) > 0:
return list_of_fields[0]
else:
return ""
def fields(self, tag):
"""
Returns the list of values corresonding to "tag".
If tag has an undefined subcode (such as 999C5),
the function returns a list of dictionaries, whoose keys
are the subcodes and the values are the values of tag.subcode.
If the tag has a subcode, simply returns list of values
corresponding to tag.
@param record the record to retrieve values from
@param tag the marc code of a field
@return values of field tag in record
"""
if self.get_record() == None: #Case where BibRecord could not parse object
return []
p_tag = parse_tag(tag)
if p_tag[3] != "":
#Subcode has been defined. Simply returns list of values
return record_get_field_values(self.get_record(),
p_tag[0],
p_tag[1],
p_tag[2],
p_tag[3])
else:
#Subcode is undefined. Returns list of dicts.
#However it might be the case of a control field.
list_of_dicts = []
instances = record_get_field_instances(self.get_record(),
p_tag[0],
p_tag[1],
p_tag[2])
for instance in instances:
instance_dict = dict(instance[0])
list_of_dicts.append(instance_dict)
return list_of_dicts
def kb(self, kb, string, default=""):
"""
Returns the value of the "string" in the knowledge base "kb".
If kb does not exist or string does not exist in kb,
returns 'default' string or empty string if not specified.
@param kb a knowledge base name
@param string the string we want to translate
@param default a default value returned if 'string' not found in 'kb'
"""
if string == None:
return default
val = get_kb_mapping(kb, string, default)
if val == None:
return default
else:
return val
def bf_profile():
"""
Runs a benchmark
"""
for i in range(50):
format_record(i, "HD", ln=cdslang, verbose=9, search_pattern=[])
return
if __name__ == "__main__":
import profile
import pstats
bf_profile()
profile.run('bf_profile()', "bibformat_profile")
p = pstats.Stats("bibformat_profile")
p.strip_dirs().sort_stats("cumulative").print_stats()
diff --git a/modules/bibformat/lib/bibformat_templates.py b/modules/bibformat/lib/bibformat_templates.py
index 5301ebb10..0be65bc43 100644
--- a/modules/bibformat/lib/bibformat_templates.py
+++ b/modules/bibformat/lib/bibformat_templates.py
@@ -1,2044 +1,2063 @@
# -*- coding: utf-8 -*-
## $Id$
## Administration of BibFormat config files
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005 CERN.
##
## The CDSware is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## The CDSware is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDSware; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""HTML Templates for BibFormat administration"""
__lastupdated__ = """$Date$"""
# non Invenio imports
import cgi
# Invenio imports
from invenio.messages import gettext_set_language
from invenio.textutils import indent_text
from invenio.config import weburl, sweburl
from invenio.messages import language_list_long
from invenio.config import php
class Template:
"""Templating class, refer to bibformat.py for examples of call"""
def tmpl_admin_index(self, ln, warnings, is_admin):
"""
Returns the main BibFormat admin page.
@param ln language
@param warnings a list of warnings to display at top of page. None if no warning
@param is_admin indicate if user is authorized to use BibFormat
@return main BibFormat admin page
"""
_ = gettext_set_language(ln) # load the right message language
out = ''
if warnings:
out += '''
%(warnings)s
''' % {'warnings': ' '.join(warnings)}
if php:
# If PHP enabled, old bibformat can still run
comment_on_php_admin_interface = '''
For some time the old BibFormat will still run along the new one, so that you can transition smoothly (See old Admin Interface further below).
'''
out += '''
BibFormat has changed!
You will need to migrate your old formats if you are not a new user. You can read the documentation to learn how to write
formats, or use the migration assistant.
%(comment_on_php_admin_interface)s
''' % {'weburl':weburl,
'comment_on_php_admin_interface':comment_on_php_admin_interface}
out += '''
This is where you can edit the formatting styles available for the records. '''
if not is_admin:
out += '''You need to
login to enter.
''' % {'weburl':weburl}
out += '''
'''% {'weburl':weburl, 'ln':ln}
if php:
#Show PHP admin only if PHP is enabled
out += '''
Old
BibFormat admin interface (in gray box)
The BibFormat admin interface enables you to specify how the
bibliographic data is presented to the end user in the search
interface and search results pages. For example, you may specify that
titles should be printed in bold font, the abstract in small italic,
etc. Moreover, the BibFormat is not only a simple bibliographic data
output formatter, but also an automated link
constructor. For example, from the information on journal name
and pages, it may automatically create links to publisher's site based
on some configuration rules.
Configuring BibFormat
By default, a simple HTML format based on the most common fields
(title, author, abstract, keywords, fulltext link, etc) is defined.
You certainly want to define your own ouput formats in case you have a
specific metadata structure.
Define one or more output BibFormat behaviours. These are then
passed as parameters to the BibFormat modules while executing
formatting.
Example: You can tell BibFormat that is has to enrich the
incoming metadata file by the created format, or that it only has to
print the format out.
Define how the metadata tags from input are mapped into internal
BibFormat variable names. The variable names can afterwards be used
in formatting and linking rules.
Example: You can tell that 100 $a field
should be mapped into $100.a internal variable that you
could use later.
Define rules for automated creation of URI links from mapped
internal variables.
Example: You can tell a rule how to create a link to
People database out of the $100.a internal variable
repesenting author's name. (The $100.a variable was mapped
in the previous step, see the Extraction Rules.)
Define file format types based on file extensions. This will be
used when proposing various fulltext services.
Example: You can tell that *.pdf files will
be treated as PDF files.
Define your own functions that you can reuse when creating your
own output formats. This enables you to do complex formatting without
ever touching the BibFormat core code.
Example: You can define a function how to match and
extract email addresses out of a text file.
Define the output formats, i.e. how to create the output out of
internal BibFormat variables that were extracted in a previous step.
This is the functionality you would want to configure most of the
time. It may reuse formats, user defined functions, knowledge bases,
etc.
Example: You can tell that authors should be printed in
italic, that if there are more than 10 authors only the first three
should be printed, etc.
Define one or more knowledge bases that enables you to transform
various forms of input data values into the unique standard form on
the output.
Example: You can tell that Phys Rev D and
Physical Review D are both the same journal and that these
names should be standardized to Phys Rev : D.
Enables you to test your formats on your sample data file. Useful
when debugging newly created formats.
To learn more on BibFormat configuration, you can consult the BibFormat Admin Guide.
Running BibFormat
From the Web interface
Run Reformat Records tool.
This tool permits you to update stored formats for bibliographic records.
It should normally be used after configuring BibFormat's
Behaviours and
Formats.
When these are ready, you can choose to rebuild formats for selected
collections or you can manually enter a search query and the web interface
will accomplish all necessary formatting steps.
Example: You can request Photo collections to have their HTML
brief formats rebuilt, or you can reformat all the records written by Ellis.
From the command-line interface
Consider having an XML MARC data file that is to be uploaded into
the CDS Invenio. (For example, it might have been harvested from other
sources and processed via BibConvert.)
Having configured BibFormat and its default output type behaviour, you
would then run this file throught BibFormat as follows:
that would create default HTML formats and would "enrich" the input
XML data file by this format. (You would then continue the upload
procedure by calling successively BibUpload and BibWords.)
Now consider a different situation. You would like to add a new
possible format, say "HTML portfolio" and "HTML captions" in order to
nicely format multiple photographs in one page. Let us suppose that
these two formats are called hp and hc and
are already loaded in the collection_format table.
(TODO: describe how this is done via WebAdmin.) You would then
proceed as follows: firstly, you would prepare the corresponding output behaviours called HP
and HC (TODO: note the uppercase!) that would not enrich
the input file but that would produce an XML file with only
001 and FMT tags. (This is in order not to
update the bibliographic information but the formats only.) You would
also prepare corresponding formats
at the same time. Secondly, you would launch the formatting as
follows:
that should give you an XML file containing only 001 and FMT tags.
Finally, you would upload the formats:
$ bibupload < /tmp/sample_fmts_only.xml
and that's it. The new formats should now appear in WebSearch.
''' % {'weburl':weburl, 'ln':ln}
return indent_text(out)
def tmpl_admin_format_template_show_attributes(self, ln, name, description, filename, editable):
"""
Returns a page to change format template name and description
@param ln language
@param name the name of the format
@param description the description of the format
@param filename the filename of the template
@param editable True if we let user edit, else False
@return editor for 'format'
"""
_ = gettext_set_language(ln) # load the right message language
out = ""
out += '''
''' % {'ln':ln,
'menu':_("Menu"),
'filename':filename,
'close_editor': _("Close Editor"),
'modify_template_attributes': _("Modify Template Attributes"),
'template_editor': _("Template Editor"),
'check_dependencies': _("Check Dependencies")
}
disabled = ""
readonly = ""
if not editable:
disabled = 'disabled="disabled"'
readonly = 'readonly="readonly"'
out += '''
''' % {"name": name,
"description": description,
'ln':ln,
'filename':filename,
'disabled':disabled,
'readonly':readonly,
'description_label': _("Description"),
'name_label': _("Name"),
'update_format_attributes': _("Update Format Attributes"),
'weburl':weburl
}
return out
def tmpl_admin_format_template_show_dependencies(self, ln, name, filename, output_formats, format_elements, tags):
"""
Shows the dependencies (on elements) of the given format.
@param name the name of the template
@param filename the filename of the template
@param format_elements the elements (and list of tags in each element) this template depends on
@param output_formats the output format that depend on this template
@param tags the tags that are called by format elements this template depends on.
"""
_ = gettext_set_language(ln) # load the right message language
out = '''
'
for output_format in output_formats:
name = output_format['names']['generic']
filename = output_format['filename']
out += ''' %(name)s''' % {'filename':filename,
'name':name,
'ln':ln}
if len(output_format['tags']) > 0:
out += "("+", ".join(output_format['tags'])+")"
out += " "
#Print format elements (and tags)
out += '
'
if len(format_elements) == 0:
out += '
This format template uses no format element.
'
for format_element in format_elements:
name = format_element['name']
out += ''' %(name)s''' % {'name':"bfe_"+name.lower(),
'anchor':name.upper(),
'ln':ln}
if len(format_element['tags']) > 0:
out += "("+", ".join(format_element['tags'])+")"
out += " "
#Print tags
out += '
'
if len(tags) == 0:
out += '
This format template uses no tag.
'
for tag in tags:
out += '''%(tag)s ''' % { 'tag':tag}
out += '''
*Note: Some tags linked with this format template might not be shown. Check manually.
'''
return out
def tmpl_admin_format_template_show(self, ln, name, description, code, filename, ln_for_preview, pattern_for_preview, editable, content_type_for_preview, content_types):
"""
Returns the editor for format templates. Edit 'format'
@param ln language
@param format the format to edit
@param filename the filename of the template
@param ln_for_preview the language for the preview (for bfo)
@param pattern_for_preview the search pattern to be used for the preview (for bfo)
@param editable True if we let user edit, else False
@return editor for 'format'
"""
_ = gettext_set_language(ln) # load the right message language
out = ""
out += '''
''' % {'weburl':weburl, 'ln':ln}
return out
def tmpl_admin_format_template_show_short_doc(self, ln, format_elements):
"""
Prints the format element documentation in a condensed way to display
inside format template editor.
This page is different from others: it is displayed inside a
''' % {'kb_name': kb_name,
'kb_description': description,
'kb_id':kb_id}
return indent_text(out)
def tmpl_admin_kb_show_dependencies(self, ln, kb_id, kb_name, sortby, format_elements):
"""
Returns the attributes of a knowledge base.
@param ln language
@param kb_id the id of the kb
@param kb_name the name of the kb
@param sortby the sorting criteria ('from' or 'to')
@param format_elements the elements that use this kb
"""
_ = gettext_set_language(ln) # load the right message language
out = '''
''' % {'ln':ln,
'kb_id':kb_id,
'sortby':sortby,
'close': _("Close Editor"),
'menu' : _("Menu"),
'mappings': _("Knowledge Base Mappings"),
'attributes':_("Knowledge Base Attributes"),
'dependencies':_("Knowledge Base Dependencies")}
out += '''
'''
out += '''
Format Elements used by %(name)s*
''' % {"name": kb_name}
if len(format_elements) == 0:
out += '
This knowledge base is not used in any format elements.
'
for format_element in format_elements:
name = format_element['name']
out += '''%(name)s ''' % {'name':"bfe_"+name.lower(),
'anchor':name.upper(),
'ln':ln}
out += '''
*Note: Some knowledge base usages might not be shown. Check manually.
'''
return indent_text(out)
def tmpl_admin_validate_format(self, ln, errors):
"""
Prints the errors of the validation of a format (might be any
kind of format)
@param ln language
@param errors a list of tuples (error code, string error message)
"""
_ = gettext_set_language(ln) # load the right message language
out = ""
if len(errors) == 0:
out += '''%s.''' % _('No problem found with format')
elif len(errors) == 1:
out += '''%s: ''' % _('An error has been found')
else:
out += '''%s: ''' % _('The following errors have been found')
for error in errors:
out += error + " "
return indent_text(out)
def tmpl_admin_dialog_box(self, url, ln, title, message, options):
"""
Prints a dialog box with given title, message and options
@param url the url of the page that must process the result of the dialog box
@param ln language
@param title the title of the dialog box
@param message a formatted message to display inside dialog box
@param options a list of string options to display as button to the user
"""
out = ""
out += '''
'''
return indent_text(out)
diff --git a/modules/bibformat/lib/bibformatadminlib.py b/modules/bibformat/lib/bibformatadminlib.py
index 5d27e1424..a35876f50 100644
--- a/modules/bibformat/lib/bibformatadminlib.py
+++ b/modules/bibformat/lib/bibformatadminlib.py
@@ -1,1474 +1,1476 @@
# -*- coding: utf-8 -*-
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005 CERN.
##
## The CDSware is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## The CDSware is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDSware; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
Handle requests from the web interface to configure BibFormat.
"""
__lastupdated__ = """$Date$"""
import os
import re
import stat
import time
from invenio.config import cdslang, weburl, etcdir
from invenio.bibformat_config import templates_path, outputs_path, elements_path, format_template_extension
from invenio.urlutils import wash_url_argument
from invenio.errorlib import get_msgs_for_code_list
from invenio.messages import gettext_set_language, wash_language, language_list_long
from invenio.search_engine import perform_request_search, encode_for_xml
from invenio import bibformat_dblayer
from invenio import bibformat_engine
import invenio.template
bibformat_templates = invenio.template.load('bibformat')
def getnavtrail(previous = '', ln=cdslang):
"""Get the navtrail"""
previous = wash_url_argument(previous, 'str')
ln = wash_language(ln)
_ = gettext_set_language(ln)
navtrail = '''%s > %s ''' % (weburl, ln, _("Admin Area"), weburl, ln, _("BibFormat Admin"))
navtrail = navtrail + previous
return navtrail
def perform_request_index(ln=cdslang, warnings=None, is_admin=False):
"""
Returns the main BibFormat admin page.
This is the only page where the code needs to be cleaned
when the migration kit will be removed. #TODO: remove when removing migration_kit
@param ln language
@param warnings a list of messages to display at top of the page, that prevents writability in etc
@param is_admin indicate if user is authorized to use BibFormat
@return the main admin page
"""
if warnings != None and len(warnings) > 0:
warnings = get_msgs_for_code_list(warnings, 'warning', ln)
warnings = [x[1] for x in warnings] # Get only message, not code
return bibformat_templates.tmpl_admin_index(ln, warnings, is_admin)
def perform_request_format_templates_management(ln=cdslang, checking=0):
"""
Returns the main management console for format templates
@param ln language
@param checking the level of checking (0: basic, 1:extensive (time consuming) )
@return the main page for format templates management
"""
#Reload in case a format was changed
bibformat_engine.clear_caches()
#get formats lists of attributes
formats = bibformat_engine.get_format_templates(with_attributes=True)
formats_attrs = []
for filename in formats:
attrs = formats[filename]['attrs']
attrs['filename'] = filename
attrs['editable'] = can_write_format_template(filename)
path = templates_path + os.sep + filename
attrs['last_mod_date'] = time.ctime(os.stat(path)[stat.ST_MTIME])
status = check_format_template(filename, checking)
if len(status) > 1 or (len(status)==1 and status[0][0] != 'ERR_BIBFORMAT_CANNOT_READ_TEMPLATE_FILE'):
status = '''
Not OK
''' % {'weburl':weburl,
'ln':ln,
'bft':filename}
else:
status = 'OK'
attrs['status'] = status
formats_attrs.append(attrs)
def sort_by_attr(seq):
intermed = [ (x['name'], i, x) for i, x in enumerate(seq)]
intermed.sort()
return [x[-1] for x in intermed]
sorted_format_templates = sort_by_attr(formats_attrs)
return bibformat_templates.tmpl_admin_format_templates_management(ln, sorted_format_templates)
def perform_request_format_template_show(bft, ln=cdslang, code=None,
ln_for_preview=cdslang, pattern_for_preview="",
content_type_for_preview="text/html"):
"""
Returns the editor for format templates.
@param ln language
@param bft the template to edit
@param code, the code being edited
@param ln_for_preview the language for the preview (for bfo)
@param pattern_for_preview the search pattern to be used for the preview (for bfo)
@return the main page for formats management
"""
format_template = bibformat_engine.get_format_template(filename=bft, with_attributes=True)
#Either use code being edited, or the original code inside template
if code == None:
code = format_template['code']#.replace('%%','%') #.replace("<","<").replace(">","/>").replace("&","&")
#Build a default pattern if it is empty
if pattern_for_preview == "":
recIDs = perform_request_search()
if len(recIDs) > 0:
recID = recIDs[0]
pattern_for_preview = "recid:%s" % recID
editable = can_write_format_template(bft)
#Look for all existing content_types
content_types = bibformat_dblayer.get_existing_content_types()
return bibformat_templates.tmpl_admin_format_template_show(ln, format_template['attrs']['name'],
format_template['attrs']['description'],
code, bft,
ln_for_preview=ln_for_preview,
pattern_for_preview=pattern_for_preview,
editable=editable,
content_type_for_preview=content_type_for_preview,
content_types=content_types)
def perform_request_format_template_show_dependencies(bft, ln=cdslang):
"""
Show the dependencies (on elements) of the given format.
@param ln language
@param bft the filename of the template to show
"""
format_template = bibformat_engine.get_format_template(filename=bft, with_attributes=True)
name = format_template['attrs']['name']
output_formats = get_outputs_that_use_template(bft)
format_elements = get_elements_used_by_template(bft)
tags = []
for output_format in output_formats:
for tag in output_format['tags']:
tags.append(tag)
for format_element in format_elements:
for tag in format_element['tags']:
tags.append(tag)
tags.sort()
return bibformat_templates.tmpl_admin_format_template_show_dependencies(ln,
name,
bft,
output_formats,
format_elements,
tags)
def perform_request_format_template_show_attributes(bft, ln=cdslang):
"""
Page for template name and descrition attributes edition.
@param ln language
@param bft the template to edit
@return the main page for format templates attributes edition
"""
format_template = bibformat_engine.get_format_template(filename=bft, with_attributes=True)
name = format_template['attrs']['name']
description = format_template['attrs']['description']
editable = can_write_format_template(bft)
return bibformat_templates.tmpl_admin_format_template_show_attributes(ln,
name,
description,
bft,
editable)
def perform_request_format_template_show_short_doc(ln=cdslang, search_doc_pattern=""):
"""
Returns the format elements documentation to be included inside format templated editor.
Keep only elements that have 'search_doc_pattern' text inside description,
if pattern not empty
@param ln language
@param search_doc_pattern a search pattern that specified which elements to display
@return a brief version of the format element documentation
"""
#get format elements lists of attributes
elements = bibformat_engine.get_format_elements(with_built_in_params=True)
keys = elements.keys()
keys.sort()
elements = map(elements.get, keys)
def filter_elem(element):
"""Keep element if is string representation contains all keywords of search_doc_pattern,
and if its name does not start with a number (to remove 'garbage' from elements in tags table)"""
if element['type'] != 'python' and \
element['attrs']['name'][0] in ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']:
return False
text = str(element).upper() #Basic text representation
if search_doc_pattern != "":
for word in search_doc_pattern.split():
if word.upper() != "AND" and text.find(word.upper()) == -1:
return False
return True
elements = filter(filter_elem, elements)
return bibformat_templates.tmpl_admin_format_template_show_short_doc(ln, elements)
def perform_request_format_elements_documentation(ln=cdslang):
"""
Returns the main management console for format elements.
Includes list of format elements and associated administration tools.
@param ln language
@return the main page for format elements management
"""
#get format elements lists of attributes
elements = bibformat_engine.get_format_elements(with_built_in_params=True)
keys = elements.keys()
keys.sort()
elements = map(elements.get, keys)
#Remove all elements found in table and that begin with a number (to remove 'garbage')
filtered_elements = [element for element in elements if element['type'] == 'python' or \
element['attrs']['name'][0] not in ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']]
return bibformat_templates.tmpl_admin_format_elements_documentation(ln, filtered_elements)
def perform_request_format_element_show_dependencies(bfe, ln=cdslang):
"""
Show the dependencies of the given format.
@param ln language
@param bfe the filename of the format element to show
"""
format_templates = get_templates_that_use_element(bfe)
tags = get_tags_used_by_element(bfe)
return bibformat_templates.tmpl_admin_format_element_show_dependencies(ln,
bfe,
format_templates,
tags)
def perform_request_format_element_test(bfe, ln=cdslang, param_values=None, uid=None):
"""
Show the dependencies of the given format.
'param_values' is the list of values to pass to 'format'
function of the element as parameters, in the order ...
If params is None, this means that they have not be defined by user yet.
@param ln language
@param bfe the name of the format element to show
@param params the list of parameters to pass to element format function
@param uid the user id for this request
"""
_ = gettext_set_language(ln)
format_element = bibformat_engine.get_format_element(bfe, with_built_in_params=True)
#Load parameter names and description
##
param_names = []
param_descriptions = []
#First value is a search pattern to choose the record
param_names.append(_("Test with record:")) # Caution: keep in sync with same text below
param_descriptions.append(_("Enter a search query here."))
#Parameters defined in this element
for param in format_element['attrs']['params']:
param_names.append(param['name'])
param_descriptions.append(param['description'])
#Parameters common to all elements of a kind
for param in format_element['attrs']['builtin_params']:
param_names.append(param['name'])
param_descriptions.append(param['description'])
#Load parameters values
##
if param_values == None: #First time the page is loaded
param_values = []
#Propose an existing record id by default
recIDs = perform_request_search()
if len(recIDs) > 0:
recID = recIDs[0]
param_values.append("recid:%s" % recID)
#Default values defined in this element
for param in format_element['attrs']['params']:
param_values.append(param['default'])
#Parameters common to all elements of a kind
for param in format_element['attrs']['builtin_params']:
param_values.append(param['default'])
#Execute element with parameters
##
params = dict(zip(param_names, param_values))
#Find a record corresponding to search pattern
search_pattern = params[_("Test with record:")] # Caution keep in sync with same text above and below
recIDs = perform_request_search(p=search_pattern)
del params[_("Test with record:")] # Caution keep in sync with same text above
if len(recIDs) > 0:
bfo = bibformat_engine.BibFormatObject(recIDs[0], ln, search_pattern, None, uid)
(result, errors) = bibformat_engine.eval_format_element(format_element, bfo, params)
else:
result = get_msgs_for_code_list([("ERR_BIBFORMAT_NO_RECORD_FOUND_FOR_PATTERN", search_pattern)],
file='error', ln=cdslang)[0][1]
return bibformat_templates.tmpl_admin_format_element_test(ln,
bfe,
format_element['attrs']['description'],
param_names,
param_values,
param_descriptions,
result)
def perform_request_output_formats_management(ln=cdslang, sortby="code"):
"""
Returns the main management console for output formats.
Includes list of output formats and associated administration tools.
@param ln language
@param sortby the sorting crieteria (can be 'code' or 'name')
@return the main page for output formats management
"""
#Reload in case a format was changed
bibformat_engine.clear_caches()
#get output formats lists of attributes
output_formats_list = bibformat_engine.get_output_formats(with_attributes=True)
output_formats = {}
for filename in output_formats_list:
output_format = output_formats_list[filename]
code = output_format['attrs']['code']
path = outputs_path + os.sep + filename
output_format['editable'] = can_write_output_format(code)
output_format['last_mod_date'] = time.ctime(os.stat(path)[stat.ST_MTIME])
+ #Validate the output format
status = check_output_format(code)
+ # If there is an error but the error is just 'format is not writable', do not display as error
if len(status) > 1 or (len(status)==1 and status[0][0] != 'ERR_BIBFORMAT_CANNOT_WRITE_OUTPUT_FILE'):
status = '''
Not OK
''' % {'weburl':weburl,
'ln':ln,
'bfo':code}
else:
status = 'OK'
output_format['status'] = status
output_formats[filename] = output_format
#sort according to code or name, inspired from Python Cookbook
def get_attr(dic, attr):
if attr == "code":
return dic['attrs']['code']
else:
return dic['attrs']['names']['generic']
def sort_by_attr(seq, attr):
intermed = [ (get_attr(x, attr), i, x) for i, x in enumerate(seq)]
intermed.sort()
return [x[-1] for x in intermed]
if sortby != "code" and sortby != "name":
sortby = "code"
sorted_output_formats = sort_by_attr(output_formats.values(), sortby)
return bibformat_templates.tmpl_admin_output_formats_management(ln, sorted_output_formats)
def perform_request_output_format_show(bfo, ln=cdslang, r_fld=[], r_val=[], r_tpl=[], default="", r_upd="", args={}):
"""
Returns the editing tools for a given output format.
The page either shows the output format from file, or from user's
POST session, as we want to let him edit the rules without
saving. Policy is: r_fld, r_val, rules_tpl are list of attributes
of the rules. If they are empty, load from file. Else use
POST. The i th value of each list is one of the attributes of rule
i. Rule i is the i th rule in order of evaluation. All list have
the same number of item.
r_upd contains an action that has to be performed on rules. It
can composed of a number (i, the rule we want to modify) and an
operator : "save" to save the rules, "add" or "del".
syntax: operator [number]
For eg: r_upd = _("Save Changes") saves all rules (no int should be specified).
For eg: r_upd = _("Add New Rule") adds a rule (no int should be specified).
For eg: r_upd = _("Remove Rule") + " 5" deletes rule at position 5.
The number is used only for operation delete.
An action can also be in **args. We must look there for string starting
with '(+|-) [number]' to increase (+) or decrease (-) a rule given by its
index (number).
For example "+ 5" increase priority of rule 5 (put it at fourth position).
The string in **args can be followed by some garbage that looks like .x
or .y, as this is returned as the coordinate of the click on the
. We HAVE to use args and reason on its keys, because for of
type image, iexplorer does not return the value of the tag, but only the name.
Action is executed only if we are working from user's POST session
(means we must have loaded the output format first, which is
totally normal and expected behaviour)
IMPORTANT: we display rules evaluation index starting at 1 in
interface, but we start internally at 0
@param ln language
@param bfo the filename of the output format to show
@param r_fld the list of 'field' attribute for each rule
@param r_val the list of 'value' attribute for each rule
@param r_tpl the list of 'template' attribute for each rule
@param default the default format template used by this output format
@param r_upd the rule that we want to increase/decrease in order of evaluation
"""
output_format = bibformat_engine.get_output_format(bfo, with_attributes=True)
format_templates = bibformat_engine.get_format_templates(with_attributes=True)
name = output_format['attrs']['names']['generic']
rules = []
debug = ""
if len(r_fld) == 0 and r_upd=="":
#Retrieve rules from file
rules = output_format['rules']
default = output_format['default']
else:
#Retrieve rules from given lists
#Transform a single rule (not considered as a list with length
#1 by the templating system) into a list
if not isinstance(r_fld, list):
r_fld = [r_fld]
r_val = [r_val]
r_tpl = [r_tpl]
for i in range(len(r_fld)):
rule = {'field': r_fld[i],
'value': r_val[i],
'template': r_tpl[i]}
rules.append(rule)
#Execute action
_ = gettext_set_language(ln)
if r_upd.startswith(_("Remove Rule")):
#Remove rule
index = int(r_upd.split(" ")[-1]) -1
del rules[index]
elif r_upd.startswith(_("Save Changes")):
#Save
update_output_format_rules(bfo, rules, default)
elif r_upd.startswith(_("Add New Rule")):
#Add new rule
rule = {'field': "",
'value': "",
'template': ""}
rules.append(rule)
else:
#Get the action in 'args'
#The action must be constructed from string of the kind:
# + 5 or - 4 or + 5.x or -4.y
for button_val in args.keys():#for all elements of form not handled yet
action = button_val.split(" ")
if action[0] == '-' or action[0] == '+':
index = int(action[1].split(".")[0]) -1
if action[0] == '-':
#Decrease priority
rule = rules[index]
del rules[index]
rules.insert(index + 1, rule)
#debug = 'Decrease rule '+ str(index)
break
elif action[0] == '+':
#Increase priority
rule = rules[index]
del rules[index]
rules.insert(index - 1, rule)
#debug = 'Increase rule ' + str(index)
break
editable = can_write_output_format(bfo)
return bibformat_templates.tmpl_admin_output_format_show(ln,
bfo,
name,
rules,
default,
format_templates,
editable)
def perform_request_output_format_show_dependencies(bfo, ln=cdslang):
"""
Show the dependencies of the given format.
@param ln language
@param bfo the filename of the output format to show
"""
output_format = bibformat_engine.get_output_format(code=bfo, with_attributes=True)
name = output_format['attrs']['names']['generic']
format_templates = get_templates_used_by_output(bfo)
return bibformat_templates.tmpl_admin_output_format_show_dependencies(ln,
name,
bfo,
format_templates)
def perform_request_output_format_show_attributes(bfo, ln=cdslang):
"""
Page for output format names and description attributes edition.
@param ln language
@param bfo filename of output format to edit
@return the main page for output format attributes edition
"""
output_format = bibformat_engine.get_output_format(code=bfo, with_attributes=True)
name = output_format['attrs']['names']['generic']
description = output_format['attrs']['description']
content_type = output_format['attrs']['content_type']
#Get translated names. Limit to long names now.
#Translation are given in order of languages in language_list_long()
names_trans = []
for lang in language_list_long():
name_trans = output_format['attrs']['names']['ln'].get(lang[0], "")
names_trans.append({'lang':lang[1], 'trans':name_trans})
editable = can_write_output_format(bfo)
return bibformat_templates.tmpl_admin_output_format_show_attributes(ln,
name,
description,
content_type,
bfo,
names_trans,
editable)
def perform_request_knowledge_bases_management(ln=cdslang):
"""
Returns the main page for knowledge bases management.
@param ln language
@return the main page for knowledge bases management
"""
kbs = bibformat_dblayer.get_kbs()
return bibformat_templates.tmpl_admin_kbs_management(ln, kbs)
def perform_request_knowledge_base_show(kb_id, ln=cdslang, sortby="to"):
"""
Show the content of a knowledge base
@param ln language
@param kb a knowledge base id
@param sortby the sorting criteria ('from' or 'to')
@return the content of the given knowledge base
"""
name = bibformat_dblayer.get_kb_name(kb_id)
mappings = bibformat_dblayer.get_kb_mappings(name, sortby)
return bibformat_templates.tmpl_admin_kb_show(ln, kb_id, name, mappings, sortby)
def perform_request_knowledge_base_show_attributes(kb_id, ln=cdslang, sortby="to"):
"""
Show the attributes of a knowledge base
@param ln language
@param kb a knowledge base id
@param sortby the sorting criteria ('from' or 'to')
@return the content of the given knowledge base
"""
name = bibformat_dblayer.get_kb_name(kb_id)
description = bibformat_dblayer.get_kb_description(name)
return bibformat_templates.tmpl_admin_kb_show_attributes(ln, kb_id, name, description, sortby)
def perform_request_knowledge_base_show_dependencies(kb_id, ln=cdslang, sortby="to"):
"""
Show the dependencies of a kb
@param ln language
@param kb a knowledge base id
@param sortby the sorting criteria ('from' or 'to')
@return the dependencies of the given knowledge base
"""
name = bibformat_dblayer.get_kb_name(kb_id)
format_elements = get_elements_that_use_kb(name)
return bibformat_templates.tmpl_admin_kb_show_dependencies(ln, kb_id, name, sortby, format_elements)
def add_format_template():
"""
Adds a new format template (mainly create file with unique name)
@return the filename of the created format
"""
(filename, name) = bibformat_engine.get_fresh_format_template_filename("Untitled")
out = '%(name)s' % {'name':name}
path = templates_path + os.sep + filename
format = open(path, 'w')
format.write(out)
format.close
return filename
def delete_format_template(filename):
"""
Delete a format template given by its filename
If format template is not writable, do not remove
@param filename the format template filename
"""
if not can_write_format_template(filename):
return
path = templates_path + os.sep + filename
os.remove(path)
bibformat_engine.clear_caches()
def update_format_template_code(filename, code=""):
"""
Saves code inside template given by filename
"""
format_template = bibformat_engine.get_format_template_attrs(filename)
name = format_template['name']
description = format_template['description']
out = '''
%(name)s%(description)s
%(code)s
''' % {'name':name, 'description':description, 'code':code}
path = templates_path + os.sep + filename
format = open(path, 'w')
format.write(out)
format.close
bibformat_engine.clear_caches()
def update_format_template_attributes(filename, name="", description=""):
"""
Saves name and description inside template given by filename.
the filename must change according to name, and every output format
having reference to filename must be updated.
If name already exist, use fresh filename (we never overwrite other templates) amd
remove old one.
@return the filename of the modified format
"""
format_template = bibformat_engine.get_format_template(filename, with_attributes=True)
code = format_template['code']
if format_template['attrs']['name'] != name:
#name has changed, so update filename
old_filename = filename
old_path = templates_path + os.sep + old_filename
#Remove old one
os.remove(old_path)
(filename, name) = bibformat_engine.get_fresh_format_template_filename(name)
#Change output formats that calls this template
output_formats = bibformat_engine.get_output_formats()
for output_format_filename in output_formats:
if can_read_output_format(output_format_filename) and can_write_output_format(output_format_filename):
output_path = outputs_path + os.sep + output_format_filename
format = open(output_path, 'r')
output_text = format.read()
format.close
output_pattern = re.compile("---(\s)*" + old_filename, re.IGNORECASE)
mod_output_text = output_pattern.sub("--- " + filename, output_text)
if output_text != mod_output_text:
format = open(output_path, 'w')
format.write(mod_output_text)
format.close
#Write updated format template
out = '''%(name)s%(description)s%(code)s''' % {'name':name,
'description':description,
'code':code}
path = templates_path + os.sep + filename
format = open(path, 'w')
format.write(out)
format.close
bibformat_engine.clear_caches()
return filename
def add_output_format():
"""
Adds a new output format (mainly create file with unique name)
@return the code of the created format
"""
(filename, code) = bibformat_engine.get_fresh_output_format_filename("UNTLD")
#Add entry in database
bibformat_dblayer.add_output_format(code)
bibformat_dblayer.set_output_format_name(code, "Untitled", lang="generic")
bibformat_dblayer.set_output_format_content_type(code, "text/html")
#Add file
out = ""
path = outputs_path + os.sep + filename
format = open(path, 'w')
format.write(out)
format.close
return code
def delete_output_format(code):
"""
Delete a format template given by its code
if file is not writable, don't remove
@param code the 6 letters code of the output format to remove
"""
if not can_write_output_format(code):
return
#Remove entry from database
bibformat_dblayer.remove_output_format(code)
#Remove file
filename = bibformat_engine.resolve_output_format_filename(code)
path = outputs_path + os.sep + filename
os.remove(path)
bibformat_engine.clear_caches()
def update_output_format_rules(code, rules=[], default=""):
"""
Saves rules inside output format given by code
"""
#Generate output format syntax
#Try to group rules by field
previous_field = ""
out = ""
for rule in rules:
field = rule["field"]
value = rule["value"]
template = rule["template"]
if previous_field != field:
out += "tag %s:\n" % field
out +="%(value)s --- %(template)s\n" % {'value':value, 'template':template}
previous_field = field
out += "default: %s" % default
filename = bibformat_engine.resolve_output_format_filename(code)
path = outputs_path + os.sep + filename
format = open(path, 'w')
format.write(out)
format.close
bibformat_engine.clear_caches()
def update_output_format_attributes(code, name="", description="", new_code="", content_type="", names_trans=[]):
"""
Saves name and description inside output format given by filename.
If new_code already exist, use fresh code (we never overwrite other output).
@param description the new description
@param name the new name
@param code the new short code (== new bfo) of the output format
@param code the code of the output format to update
@param names_trans the translations in the same order as the languages from get_languages()
@param content_type the new content_type of the output format
@return the filename of the modified format
"""
bibformat_dblayer.set_output_format_description(code, description)
bibformat_dblayer.set_output_format_content_type(code, content_type)
bibformat_dblayer.set_output_format_name(code, name, lang="generic")
i = 0
for lang in language_list_long():
if names_trans[i] != "":
bibformat_dblayer.set_output_format_name(code, names_trans[i], lang[0])
i += 1
new_code = new_code.upper()
if code != new_code:
#If code has changed, we must update filename with a new unique code
old_filename = bibformat_engine.resolve_output_format_filename(code)
old_path = outputs_path + os.sep + old_filename
(new_filename, new_code) = bibformat_engine.get_fresh_output_format_filename(new_code)
new_path = outputs_path + os.sep + new_filename
os.rename(old_path, new_path)
bibformat_dblayer.change_output_format_code(code, new_code)
bibformat_engine.clear_caches()
return new_code
def add_kb_mapping(kb_name, key, value=""):
"""
Adds a new mapping to given kb
@param kb_name the name of the kb where to insert the new value
@param key the key of the mapping
@param value the value of the mapping
"""
bibformat_dblayer.add_kb_mapping(kb_name, key, value)
def remove_kb_mapping(kb_name, key):
"""
Delete an existing kb mapping in kb
@param kb_name the name of the kb where to insert the new value
@param key the key of the mapping
"""
bibformat_dblayer.remove_kb_mapping(kb_name, key)
def update_kb_mapping(kb_name, old_key, key, value):
"""
Update an existing kb mapping with key old_key with a new key and value
@param kb_name the name of the kb where to insert the new value
@param the key of the mapping in the kb
@param key the new key of the mapping
@param value the new value of the mapping
"""
bibformat_dblayer.update_kb_mapping(kb_name, old_key, key, value)
def get_kb_name(kb_id):
"""
Returns the name of the kb given by id
"""
return bibformat_dblayer.get_kb_name(kb_id)
def update_kb_attributes(kb_name, new_name, new_description):
"""
Updates given kb_name with a new name and new description
@param kb_name the name of the kb to update
@param new_name the new name for the kb
@param new_description the new description for the kb
"""
bibformat_dblayer.update_kb(kb_name, new_name, new_description)
def add_kb(kb_name="Untitled"):
"""
Adds a new kb in database, and returns its id
The name of the kb will be 'Untitled#'
such that it is unique.
@param kb_name the name of the kb
@return the id of the newly created kb
"""
name = kb_name
i = 1
while bibformat_dblayer.kb_exists(name):
name = kb_name + " " + str(i)
i += 1
kb_id = bibformat_dblayer.add_kb(name, "")
return kb_id
def delete_kb(kb_name):
"""
Deletes given kb from database
"""
bibformat_dblayer.delete_kb(kb_name)
def can_read_format_template(filename):
"""
Returns 0 if we have read permission on given format template, else
returns other integer
"""
path = "%s%s%s" % (templates_path, os.sep, filename)
return os.access(path, os.R_OK)
def can_read_output_format(bfo):
"""
Returns 0 if we have read permission on given output format, else
returns other integer
"""
filename = bibformat_engine.resolve_output_format_filename(bfo)
path = "%s%s%s" % (outputs_path, os.sep, filename)
return os.access(path, os.R_OK)
def can_read_format_element(name):
"""
Returns 0 if we have read permission on given format element, else
returns other integer
"""
filename = bibformat_engine.resolve_format_element_filename(name)
path = "%s%s%s" % (elements_path, os.sep, filename)
return os.access(path, os.R_OK)
def can_write_format_template(bft):
"""
Returns 0 if we have write permission on given format template, else
returns other integer
"""
if not can_read_format_template(bft):
return False
path = "%s%s%s" % (templates_path, os.sep, bft)
return os.access(path, os.W_OK)
def can_write_output_format(bfo):
"""
Returns 0 if we have write permission on given output format, else
returns other integer
"""
if not can_read_output_format(bfo):
return False
filename = bibformat_engine.resolve_output_format_filename(bfo)
path = "%s%s%s" % (outputs_path, os.sep, filename)
return os.access(path, os.W_OK)
def can_write_etc_bibformat_dir():
"""
Returns true if we can write in etc/bibformat dir.
"""
path = "%s%sbibformat" % (etcdir, os.sep)
return os.access(path, os.W_OK)
def get_outputs_that_use_template(filename):
"""
Returns a list of output formats that call the given format template.
The returned output formats also give their dependencies on tags.
We don't return the complete output formats but some reference to
them (filename + names)
[ {'filename':"filename_1.bfo"
'names': {'en':"a name", 'fr': "un nom", 'generic':"a name"}
'tags': ['710__a', '920__']
},
...
]
Returns output formats references sorted by (generic) name
@param filename a format template filename
"""
output_formats_list = {}
tags = []
output_formats = bibformat_engine.get_output_formats(with_attributes=True)
for output_format in output_formats:
name = output_formats[output_format]['attrs']['names']['generic']
#First look at default template, and add it if necessary
if output_formats[output_format]['default'] == filename:
output_formats_list[name] = {'filename':output_format,
'names':output_formats[output_format]['attrs']['names'],
'tags':[]}
#Second look at each rule
found = False
for rule in output_formats[output_format]['rules']:
if rule['template'] == filename:
found = True
tags.append(rule['field']) #Also build dependencies on tags
#Finally add dependency on template from rule (overwrite default dependency,
#which is weaker in term of tag)
if found == True:
output_formats_list[name] = {'filename':output_format,
'names':output_formats[output_format]['attrs']['names'],
'tags':tags}
keys = output_formats_list.keys()
keys.sort()
return map(output_formats_list.get, keys)
def get_elements_used_by_template(filename):
"""
Returns a list of format elements that are called by the given format template.
The returned elements also give their dependencies on tags
The list is returned sorted by name
[ {'filename':"filename_1.py"
'name':"filename_1"
'tags': ['710__a', '920__']
},
...
]
Returns elements sorted by name
@param filename a format template filename
"""
format_elements = {}
format_template = bibformat_engine.get_format_template(filename=filename, with_attributes=True)
code = format_template['code']
format_elements_iter = bibformat_engine.pattern_tag.finditer(code)
for result in format_elements_iter:
function_name = result.group("function_name").lower()
if function_name != None and not format_elements.has_key(function_name):
filename = bibformat_engine.resolve_format_element_filename("BFE_"+function_name)
if filename != None:
tags = get_tags_used_by_element(filename)
format_elements[function_name] = {'name':function_name.lower(),
'filename':filename,
'tags':tags}
keys = format_elements.keys()
keys.sort()
return map(format_elements.get, keys)
# Format Elements Dependencies
##
def get_tags_used_by_element(filename):
"""
Returns a list of tags used by given format element
APPROXIMATIVE RESULTS: the tag are retrieved in field(), fields()
and control_field() function. If they are used computed, or saved
in a variable somewhere else, they are not retrieved
@TODO: There is room for improvements. For example catch call
to BibRecord functions, or use of
Returns tags sorted by value
@param filename a format element filename
"""
tags = {}
format_element = bibformat_engine.get_format_element(filename)
if format_element == None:
return []
elif format_element['type']=="field":
tags = format_element['attrs']['tags']
return tags
filename = bibformat_engine.resolve_format_element_filename(filename)
path = elements_path + os.sep + filename
format = open(path, 'r')
code = format.read()
format.close
tags_pattern = re.compile('''
(field|fields|control_field)\s* #Function call
\(\s* #Opening parenthesis
[\'"]+ #Single or double quote
(?P.+?) #Tag
[\'"]+\s* #Single or double quote
\) #Closing parenthesis
''', re.VERBOSE | re.MULTILINE)
tags_iter = tags_pattern.finditer(code)
for result in tags_iter:
tags[result.group("tag")] = result.group("tag")
return tags.values()
def get_templates_that_use_element(name):
"""
Returns a list of format templates that call the given format element.
The returned format templates also give their dependencies on tags.
[ {'filename':"filename_1.bft"
'name': "a name"
'tags': ['710__a', '920__']
},
...
]
Returns templates sorted by name
@param name a format element name
"""
format_templates = {}
tags = []
files = os.listdir(templates_path) #Retrieve all templates
for file in files:
if file.endswith(format_template_extension):
format_elements = get_elements_used_by_template(file) #Look for elements used in template
format_elements = map(lambda x: x['name'].lower(), format_elements)
try: #Look for element
format_elements.index(name.lower()) #If not found, get out of "try" statement
format_template = bibformat_engine.get_format_template(filename=file, with_attributes=True)
template_name = format_template['attrs']['name']
format_templates[template_name] = {'name':template_name,
'filename':file}
except:
print name+" not found in "+str(format_elements)
pass
keys = format_templates.keys()
keys.sort()
return map(format_templates.get, keys)
# Output Formats Dependencies
##
def get_templates_used_by_output(code):
"""
Returns a list of templates used inside an output format give by its code
The returned format templates also give their dependencies on elements and tags
[ {'filename':"filename_1.bft"
'name': "a name"
'elements': [{'filename':"filename_1.py", 'name':"filename_1", 'tags': ['710__a', '920__']
}, ...]
},
...
]
Returns templates sorted by name
"""
format_templates = {}
output_format = bibformat_engine.get_output_format(code, with_attributes=True)
filenames = map(lambda x: x['template'], output_format['rules'])
if output_format['default'] != "":
filenames.append(output_format['default'])
for filename in filenames:
template = bibformat_engine.get_format_template(filename, with_attributes=True)
name = template['attrs']['name']
elements = get_elements_used_by_template(filename)
format_templates[name] = {'name':name,
'filename':filename,
'elements':elements}
keys = format_templates.keys()
keys.sort()
return map(format_templates.get, keys)
# Knowledge Bases Dependencies
##
def get_elements_that_use_kb(name):
"""
Returns a list of elements that call given kb
[ {'filename':"filename_1.py"
'name': "a name"
},
...
]
Returns elements sorted by name
"""
format_elements = {}
files = os.listdir(elements_path) #Retrieve all elements in files
for filename in files:
if filename.endswith(".py"):
path = elements_path + os.sep + filename
format = open(path, 'r')
code = format.read()
format.close
#Search for use of kb inside code
kb_pattern = re.compile('''
(bfo.kb)\s* #Function call
\(\s* #Opening parenthesis
[\'"]+ #Single or double quote
(?P%s) #kb
[\'"]+\s* #Single or double quote
, #comma
''' % name, re.VERBOSE | re.MULTILINE | re.IGNORECASE)
result = kb_pattern.search(code)
if result != None:
name = ("".join(filename.split(".")[:-1])).lower()
if name.startswith("bfe_"):
name = name[4:]
format_elements[name] = {'filename':filename, 'name': name}
keys = format_elements.keys()
keys.sort()
return map(format_elements.get, keys)
# Validation tools
##
def perform_request_format_validate(ln=cdslang, bfo=None, bft=None, bfe=None):
"""
Returns a page showing the status of an output format or format
template or format element. This page is called from output
formats management page or format template management page or
format elements documentation.
The page only shows the status of one of the format, depending on
the specified one. If multiple are specified, shows the first one.
@param ln language
@param bfo an output format 6 chars code
@param bft a format element filename
@param bfe a format element name
"""
if bfo != None:
errors = check_output_format(bfo)
messages = get_msgs_for_code_list(code_list = errors, ln=ln)
elif bft != None:
errors = check_format_template(bft, checking=1)
messages = get_msgs_for_code_list(code_list = errors, ln=ln)
elif bfe != None:
errors = check_format_element(bfe)
messages = get_msgs_for_code_list(code_list = errors, ln=ln)
if messages == None:
messages = []
messages = map(lambda x: encode_for_xml(x[1]), messages)
return bibformat_templates.tmpl_admin_validate_format(ln, messages)
def check_output_format(code):
"""
Returns the list of errors in the output format given by code
The errors are the formatted errors defined in bibformat_config.py file.
@param code the 6 chars code of the output format to check
@return a list of errors
"""
errors = []
filename = bibformat_engine.resolve_output_format_filename(code)
if can_read_output_format(code):
path = outputs_path + os.sep + filename
format = open(path)
current_tag = ''
i = 0
for line in format:
i += 1
if line.strip() == "":
#ignore blank lines
continue
clean_line = line.rstrip("\n\r ") #remove spaces and eol
if line.strip().endswith(":") or (line.strip().lower().startswith("tag") and line.find('---') == -1):
#check tag
if not clean_line.endswith(":"):
#column misses at the end of line
errors.append(("ERR_BIBFORMAT_OUTPUT_RULE_FIELD_COL", line, i))
if not clean_line.lower().startswith("tag"):
#tag keyword is missing
errors.append(("ERR_BIBFORMAT_OUTPUT_TAG_MISSING", line, i))
elif not clean_line.startswith("tag"):
#tag was not lower case
errors.append(("ERR_BIBFORMAT_OUTPUT_WRONG_TAG_CASE", line, i))
clean_line = clean_line.rstrip(": ") #remove : and spaces at the end of line
current_tag = "".join(clean_line.split()[1:]).strip() #the tag starts at second position
if len(clean_line.split()) > 2: #We should only have 'tag' keyword and tag
errors.append(("ERR_BIBFORMAT_INVALID_OUTPUT_RULE_FIELD", i))
else:
if len(check_tag(current_tag)) > 0:
#Invalid tag
errors.append(("ERR_BIBFORMAT_INVALID_OUTPUT_RULE_FIELD_tag", current_tag, i))
if not clean_line.startswith("tag"):
errors.append(("ERR_BIBFORMAT_INVALID_OUTPUT_RULE_FIELD", i))
elif line.find('---') != -1:
#check condition
if current_tag == "":
errors.append(("ERR_BIBFORMAT_OUTPUT_CONDITION_OUTSIDE_FIELD", line, i))
words = line.split('---')
if len(words) != 2:
errors.append(("ERR_BIBFORMAT_INVALID_OUTPUT_CONDITION", line, i))
template = words[-1].strip()
path = templates_path + os.sep + template
if not os.path.exists(path):
errors.append(("ERR_BIBFORMAT_WRONG_OUTPUT_RULE_TEMPLATE_REF", template, i))
elif line.find(':') != -1 or (line.strip().lower().startswith("default") and line.find('---') == -1):
#check default template
clean_line = line.strip()
if line.find(':') == -1:
#column misses after default
errors.append(("ERR_BIBFORMAT_OUTPUT_RULE_DEFAULT_COL", line, i))
if not clean_line.startswith("default"):
#default keyword is missing
errors.append(("ERR_BIBFORMAT_OUTPUT_DEFAULT_MISSING", line, i))
if not clean_line.startswith("default"):
#default was not lower case
errors.append(("ERR_BIBFORMAT_OUTPUT_WRONG_DEFAULT_CASE", line, i))
default = "".join(line.split(':')[1]).strip()
path = templates_path + os.sep + default
if not os.path.exists(path):
errors.append(("ERR_BIBFORMAT_WRONG_OUTPUT_RULE_TEMPLATE_REF", default, i))
else:
#check others
errors.append(("ERR_BIBFORMAT_WRONG_OUTPUT_LINE", line, i))
else:
errors.append(("ERR_BIBFORMAT_CANNOT_READ_OUTPUT_FILE", filename, ""))
return errors
def check_format_template(filename, checking=0):
"""
Returns the list of errors in the format template given by its filename
The errors are the formatted errors defined in bibformat_config.py file.
@param filename the filename of the format template to check
@param checking the level of checking (0:basic, >=1 extensive (time-consuming))
@return a list of errors
"""
errors = []
if can_read_format_template(filename):#Can template be read?
#format_template = bibformat_engine.get_format_template(filename, with_attributes=True)
format = open("%s%s%s" % (templates_path, os.sep, filename))
code = format.read()
format.close()
#Look for name
match = bibformat_engine.pattern_format_template_name.search(code)
if match == None:#Is tag defined in template?
errors.append(("ERR_BIBFORMAT_TEMPLATE_HAS_NO_NAME", filename))
#Look for description
match = bibformat_engine.pattern_format_template_desc.search(code)
if match == None:#Is tag defined in template?
errors.append(("ERR_BIBFORMAT_TEMPLATE_HAS_NO_DESCRIPTION", filename))
format_template = bibformat_engine.get_format_template(filename, with_attributes=False)
code = format_template['code']
#Look for calls to format elements
#Check existence of elements and attributes used in call
elements_call = bibformat_engine.pattern_tag.finditer(code)
for element_match in elements_call:
element_name = element_match.group("function_name")
filename = bibformat_engine.resolve_format_element_filename(element_name)
if filename == None and not bibformat_dblayer.tag_exists_for_name(element_name): #Is element defined?
errors.append(("ERR_BIBFORMAT_TEMPLATE_CALLS_UNDEFINED_ELEM", filename, element_name))
else:
format_element = bibformat_engine.get_format_element(element_name, with_built_in_params=True)
if format_element == None:#Can element be loaded?
if not can_read_format_element(element_name):
errors.append(("ERR_BIBFORMAT_TEMPLATE_CALLS_UNREADABLE_ELEM", filename, element_name))
else:
errors.append(("ERR_BIBFORMAT_TEMPLATE_CALLS_UNLOADABLE_ELEM", element_name, filename))
else:
#are the parameters used defined in element?
params_call = bibformat_engine.pattern_function_params.finditer(element_match.group())
all_params = {}
for param_match in params_call:
param = param_match.group("param")
value = param_match.group("value")
all_params[param] = value
allowed_params = []
#Built-in params
for allowed_param in format_element['attrs']['builtin_params']:
allowed_params.append(allowed_param['name'])
#Params defined in element
for allowed_param in format_element['attrs']['params']:
allowed_params.append(allowed_param['name'])
if not param in allowed_params:
errors.append(("ERR_BIBFORMAT_TEMPLATE_WRONG_ELEM_ARG",
element_name, param, filename))
# The following code is too much time consuming. Only do where really requested
if checking > 0:
#Try to evaluate, with any object and pattern
recIDs = perform_request_search()
if len(recIDs) > 0:
recID = recIDs[0]
bfo = bibformat_engine.BibFormatObject(recID, search_pattern="Test")
(result, errors_) = bibformat_engine.eval_format_element(format_element, bfo, all_params, verbose=7)
errors.extend(errors_)
else:#Template cannot be read
errors.append(("ERR_BIBFORMAT_CANNOT_READ_TEMPLATE_FILE", filename, ""))
return errors
def check_format_element(name):
"""
Returns the list of errors in the format element given by its name
The errors are the formatted errors defined in bibformat_config.py file.
@param name the name of the format element to check
@return a list of errors
"""
errors = []
filename = bibformat_engine.resolve_format_element_filename(name)
if filename != None:#Can element be found in files?
if can_read_format_element(name):#Can element be read?
#Try to load
try:
module_name = filename
if module_name.endswith(".py"):
module_name = module_name[:-3]
module = __import__("invenio.bibformat_elements."+module_name)
function_format = module.bibformat_elements.__dict__[module_name].format
#Try to evaluate, with any object and pattern
recIDs = perform_request_search()
if len(recIDs) > 0:
recID = recIDs[0]
bfo = bibformat_engine.BibFormatObject(recID, search_pattern="Test")
element = bibformat_engine.get_format_element(name)
(result, errors_) = bibformat_engine.eval_format_element(element, bfo, verbose=7)
errors.extend(errors_)
except Exception, e:
errors.append(("ERR_BIBFORMAT_IN_FORMAT_ELEMENT", name, e))
else:
errors.append(("ERR_BIBFORMAT_CANNOT_READ_ELEMENT_FILE", filename, ""))
elif bibformat_dblayer.tag_exists_for_name(name):#Can element be found in database?
pass
else:
errors.append(("ERR_BIBFORMAT_CANNOT_RESOLVE_ELEMENT_NAME", name))
return errors
def check_tag(tag):
"""
Checks the validity of a tag
"""
errors = []
return errors