## -*- mode: html; coding: utf-8; -*- ## This file is part of Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

BibMatch matches bibliographic data in a MARCXML file against the database content. With a MARCXML input file, the produced output shows a selection of records in the input that match the database content. This way, it is possible to identify potential duplicate entries, before they are uploaded in a database.

Note: BibMatch only matches against public records attached to the home collection.

BibMatch commmand-line tool



Examples

To match records on title and print out only new (unmatched) ones:
 $ bibmatch [--print-new] --field=\"title\" < input.xml > output.xml
To print potential duplicate entries before manual upload, run:
 $ bibmatch --print-match --field=\"245__a\" --mode=\"a\" < input.xml > output.xml
To print undecided result from a match on multiple fields, including predefined fields (title, author etc.):
 $ bibmatch --print-ambiguous --query-string=\"245__a||author\" < input.xml > output.xml
To print "fuzzy" (almost matching by title) records:
 $ bibmatch --print-fuzzy  < input.xml > output.xml
To match against public records on an remote Invenio installation (i.e http://cdsweb.cern.ch):
 $ bibmatch --print-match -i input.xml -r 'http://cdsweb.cern.ch'
Using text-marc as output-format:
 $ bibmatch -b out.marc -t < input.xml
To print matched or fuzzy matched records replacing old identifier (controlfield 001) with one from the matched record, i.e to then be used with BibUpload to update record:
 $ bibmatch -a -1 < input.xml > modified_match.xml
Command line options:
 -0 --print-new (default) print unmatched in stdout
 -1 --print-match print matched records in stdout
 -2 --print-ambiguous print records that match more than 1 existing records
 -3 --print-fuzzy print records that match the longest words in existing records

 -b --batch-output=(filename). filename.0 will be new records, filename.1 will be matched,
      filename.2 will be ambiguous, filename.3 will be fuzzy match
 -t --text-marc-output transform the output to text-marc format instead of the default MARCXML

 Simple query:

 -f --field=(field)

 Advanced query:

 -c --config=(config-filename)
 -q --query-string=(uploader_querystring)
 -m --mode=(a|e|o|p|r)
 -o --operator=(a|o)

 Where mode is:
  "a" all of the words,
  "o" any of the words,
  "e" exact phrase,
  "p" partial phrase,
  "r" regular expression.

 Operator is:
  "a" and,
  "o" or.

 General options:

 -n   --noprocess          Do not print records in stdout.
 -i,  --input              use a named file instead of stdin for input
 -h,  --help               print this help and exit
 -V,  --version            print version information and exit
 -v,  --verbose=LEVEL      verbose level (from 0 to 9, default 1)
 -r,  --remote=URL         match against a remote invenio installation (URL, no trailing '/')
                           Beware: Only searches public records attached to home collection
 -a,  --alter-recid        The recid (controlfield 001) of matched or fuzzy matched records in
                           output will be replaced by the 001 value of the matched record.
                           Useful to prepare files to then be used with BibUpload.

 Common predefined fields used in querystrings: (for Invenio demo site, your fields may vary!)

 'abstract', 'affiliation', 'anyfield', 'author', 'coden', 'collaboration',
 'collection', 'datecreated', 'datemodified', 'division', 'exactauthor',
 'experiment', 'fulltext', 'isbn', 'issn', 'journal', 'keyword', 'recid',
 'reference', 'reportnumber', 'subject', 'title', 'year'