<h3><a name="overview">1.1 What is BibMatch</a></h3>
<p>BibMatch is a tool for matching bibliographic meta-data records against a local or remote Invenio repository.
The incoming records can be matched against zero, one or more then one records. This way,
it is possible to identify potential duplicate entries, before they are uploaded into the repository.
This can also be helpful in detecting already existing duplicates in the database. </p>
<p>BibMatch also acts as a filter for incoming records, by splitting the records into 'new records' or 'existing records'.
In most cases this separation makes a big difference when ingesting content into a digital repository.</p>
<h3><a name="features">1.2 Features</a></h3>
<ul>
<li>Matches meta-data records locally and remotely</li>
<li>Supports user authentication to allow matching against restricted collections</li>
<li>Highly configurable match validation step for reliable matching results</li>
<li>Allows full customisation of the search queries used to find matching candidates</li>
<li>Incoming meta-data can be manipulated through BibConvert formatting functions</li>
<li>Supports transliteration of Unicode meta-data into ASCII (useful for legacy systems)</li>
</ul>
<h2><a name="usage">2. Usage</a></h2>
<h3><a name="basic">2.1 Basic usage</a></h3>
<h4>Input records</h4>
<p>BibMatch needs a set of records to match. You can give BibMatch these records in two ways:
<p>By standard input:</p>
<blockquote>
<pre>
$ bibmatch < input.xml
</pre>
</blockquote>
<p>or, by <em>-i</em> parameter:</p>
<blockquote>
<pre>
$ bibmatch -i input.xml
</pre>
</blockquote>
</p>
<h4>Output records</h4>
<p>When BibMatch matches records, they will be classified in one of 4 ways:<br/>
<ul>
<li><strong>Match</strong> - exact match found</li>
<li><strong>Ambiguous</strong> - record matches more then one record</li>
<li><strong>Fuzzy</strong> - record <em>may</em> match a record</li>
<li><strong>New</strong> - record does not match any record</li>
</ul>
</p>
<p>You can choose which types of records to output after matching has completed by specifying it in the command:</p>
<blockquote>
<pre>
$ bibmatch --print-match < input.xml
</pre>
</blockquote>
<blockquote>
<pre>
$ bibmatch --print-new < input.xml
</pre>
</blockquote>
<p>You can also output all matching results to a set of files, by specifying a filename-prefix to the command-line option <em>-b</em>:</p>
<blockquote>
<pre>
$ bibmatch -b output_results < input.xml
</pre>
</blockquote>
<p>This will create a set of 4 files: <em>output_results.matched.xml</em>, <em>output_results.new.xml</em>, <em>output_results.fuzzy.xml</em>, <em>output_results.ambiguous.xml</em></p>
<h4>Matching queries</h4>
<p>By default, BibMatch will try to find potential record matches using the MARC tag 245__$$a, i.e. the title. In many cases this is not an efficient metric
to find all potential matching records with, because of its ambiguous nature. As such, BibMatch provide users a way to specify the exact queries to use
and where to extract meta-data to put in the queries.</p>
<p>Using the command-line option <em>-q</em> you can specify your own <em>querystrings</em> to use when searching for records.
Read more about querystrings <a href="#querystrings">here</a>.</p>
<p>For example, if available in the meta-data, you can search using the ISBN or DOI, which <em>usually</em> are stable identifiers:</p>
<blockquote>
<pre>
$ bibmatch -q "[020__a] or [0247_a]" -b output_results < input.xml
</pre>
</blockquote>
<p>As you can see, any data from the input record you want to replace in the query is referenced using square-brackets [] containing the exact MARC notation.</p>
<p>You can specify several <em>-q</em> queries and they will all be performed <u>in order</u>, until a match is found. If you want to avoid
specifying long complicated query-strings every time, you can use short-hand <em>template queries</em> that can be defined in the configuration of BibMatch. (See hacking guide)</p>
<h4>Match remote installations</h4>
<p>By default, BibMatch will try to match records on the local installation. In order to match against records on a remote Invenio installation
(like <a href="http://cds.cern.ch" target=_blank>http://cds.cern.ch</a> or <a href="http://inspire-hep.net" target=_blank>http://inspire-hep.net</a>)