Page MenuHomec4science

websubmit-file-metadata.webdoc
No OneTemporary

File Metadata

Created
Sun, Aug 4, 14:21

websubmit-file-metadata.webdoc

# -*- mode: html; coding: utf-8; -*-
# This file is part of Invenio.
# Copyright (C) 2009, 2010, 2011 CERN.
#
# Invenio is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
# published by the Free Software Foundation; either version 2 of the
# License, or (at your option) any later version.
#
# Invenio is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Invenio; if not, write to the Free Software Foundation, Inc.,
# 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
<!-- WebDoc-Page-Title: Websubmit_file_metadata APIs and Plugin Development-->
<!-- WebDoc-Page-Navtrail: <a class="navtrail" href="<CFG_SITE_URL>/help/hacking">Hacking Invenio</a> -->
<!-- WebDoc-Page-Navbar-Select: hacking -->
<h2>Contents</h2>
<ul style="list-style-type:None">
<li><strong>1. <a href="#apis">APIs</a></strong></li>
<li><strong>2. <a href="#plugins">Plugin development</a></strong>
<ul style="list-style-type:None">
<li>2.1&nbsp;&nbsp;<a href="#specifications">Specifications</a></li>
<li>2.2&nbsp;&nbsp;<a href="#dependencies">Dependencies on External Libraries</a></li>
<li>2.3&nbsp;&nbsp;<a href="#conflicts">Conflict With Other Plugins</a></li>
</ul>
</li>
</ul>
<style type="text/css">
<!--
.code {
background-color: #ddd;
border: 1px solid #bbb;
}-->
</style>
<p>The websubmit_file_metadata library enables extraction and update of file metadata.<br/>
It can be called from Python sources or run from the command line.<br/>
The library can be extended to support various formats thanks to plugins (which must be dropped in <code>/opt/invenio/lib/python/invenio/websubmit_file_metadata_plugins/</code> directory).</p>
<h2><a name="apis">1. APIs</a></h2>
<p>Two main functions can be imported from websubmit_file_metadata:</p>
<ul>
<li><a href="#read_metadata">read_metadata</a></li>
<li><a href="#write_metadata">write_metadata</a></li>
</ul>
<a name="read_metadata"></a>
<b>def read_metadata(inputfile, force=None, remote=False, loginpw=None, verbose=0):</b>
<pre>
Returns metadata extracted from given file as dictionary.
Availability depends on input file format and installed plugins
(return TypeError if unsupported file format).
Parameters:
* inputfile (string) - path to a file
* force (string) - name of plugin to use, to skip plugin auto-discovery
* remote (boolean) - if the file is accessed remotely or not
* loginpw (string) - credentials to access secure servers (username:password)
* verbose (int) - verbosity
Returns: dict
dictionary of metadata tags as keys, and (interpreted) value as value
Raises:
* TypeError - if file format is not supported.
* RuntimeError - if required library to process file is missing.
* InvenioWebSubmitFileMetadataRuntimeError - when metadata cannot be read.
</pre>
<a name="write_metadata"></a>
<b>def write_metadata(inputfile, outputfile, metadata_dictionary, force=None, verbose=0):</b>
<pre>
Writes metadata to given file.
Availability depends on input file format and installed plugins
(return TypeError if unsupported file format).
Parameters:
* inputfile (string) - path to a file
* outputfile (string) - path to the resulting file.
* metadata_dictionary (dict) - keys and values of metadata to update.
* force (string) - name of plugin to use, to skip plugin auto-discovery
* verbose (int) - verbosity
Returns: string
output of the plugin
Raises:
* TypeError - if file format is not supported.
* RuntimeError - if required library to process file is missing.
* InvenioWebSubmitFileMetadataRuntimeError - when metadata cannot be updated.
</pre>
<h2><a name="plugins">2. Plugin development</a></h2>
<p>You can develop new plugins to extend the compatibility of the
library with additional file formats.</p>
<h3><a name="specifications">2.1 Specifications</a></h3>
<p>Your plugin name must start with "<code>wsm_</code>" and end with "<code>.py</code>". For eg. <code>wsm_myplugin.py</code>.<br/>
Once ready, it must be dropped into <code>/opt/invenio/lib/python/invenio/websubmit_file_metadata_plugins/</code> directory.<br/>
Your plugin can define the following interface:
</p>
<ul>
<li><a href="#can_read_local">def can_read_local(..)</a></li>
<li><a href="#can_read_remote">def can_read_remote(..)</a></li>
<li><a href="#can_write_local">def can_write_local(..)</a></li>
<li><a href="#read_metadata_local">def read_metadata_local(..)</a></li>
<li><a href="#read_metadata_remote">def read_metadata_remote(..)</a></li>
<li><a href="#write_metadata_local">def write_metadata_local(..)</a></li>
</ul>
<p>The
functions <code>can_read_local(..)</code>, <code>can_read_remote(..)</code>,
and <code>can_write_local(..)</code> are called at runtime by the
library on all installed plugin to check which one can process the
given file for the given action. If one of these functions return
true, your plugin will be selected to process the file. You can omit
one or several of these functions (for eg. if you don't support
reading from remote server, simply omit <code>can_read_remote(..)</code>).</p>
<p>If your plugin returned <code>True</code> for a given action, the
corresponding
function <code>read_metadata_local(..)</code>, <code>read_metadata_remote(..)</code>
or <code>write_metadata_local(..)</code> is then called. You <b>must</b> therefore implement the corresponding function (for eg. if you return <code>True</code> for some file with <code>can_write_local(..)</code>, then you must implement <code>write_metadata_local(..)</code>).</p>
<p>Your plugin code should also define
the <code>__required_plugin_API_version__</code>variable, to define
the interface version your plugin is compatible with. For
eg. set <code>__required_plugin_API_version__ = "WebSubmit File Metadata Plugin API 1.0"</code>
</p>
<a name="can_read_local"></a>
<b>def can_read_local(inputfile):</b>
<pre>
Returns True if file can be processed by this plugin.
Parameters:
* inputfile (string) - path to a file to read metadata from
Returns: boolean
True if file can be processed
</pre>
<a name="can_read_remote"></a>
<b>def can_read_remote(inputfile):</b>
<pre>
Returns True if file at remote location can be processed by this plugin.
Parameters:
* inputfile (string) - URL to a file to read metadata from
Returns: boolean
True if file can be processed
</pre>
<a name="can_read_remote"></a>
<b>def can_write_local(inputfile):</b>
<pre>
Returns True if file can be processed by this plugin for writing.
Parameters:
* inputfile (string) - path to a file to update metadata
Returns: boolean
True if file can be processed
</pre>
<a name="read_metadata_local"></a>
<b>def read_metadata_local(inputfile, verbose):</b>
<pre>
Returns a dictionary of metadata read from inputfile.
Parameters:
* inputfile (string) - path to file to read from
* verbose (int) - verbosity
Returns: dict
dictionary with metadata
</pre>
<a name="read_metadata_remote"></a>
<b>def read_metadata_remote(inputfile, verbose):</b>
<pre>
Returns a dictionary of metadata read from remote inputfile.
Parameters:
* inputfile (string) - URL to file to read from
* verbose (int) - verbosity
Returns: dict
dictionary with metadata
</pre>
<a name="write_metadata_local"></a>
<b>write_metadata_local(inputfile, verbose):</b>
<pre>
Update metadata of given inputfile.
Parameters:
* inputfile (string) - path to file to update
* verbose (int) - verbosity
Returns: dict
dictionary with metadata
</pre>
<h3><a name="dependencies">2.2 Dependencies on External Libraries</a></h3>
<p>If your plugin depends on some other
external library, you should check that this library is installed at load
time (that is in the main scope of the plugin). If the library is
missing, it should raise an <code>ImportError</code> exception. For example:
<pre class="code">
"""
WebSubmit Metadata Plugin - My custom plugin
Dependencies: extractor
"""
__plugin_version__ = "WebSubmit File Metadata Plugin API 1.0"
import extractor
def can_read_local(inputfile):
[...]
</pre>
The <code>import extractor</code> will generate
such <code>ImportError</code> exception if <code>extractor</code> is
missing.
</p>
<h3><a name="conflicts">2.3 Conflict With Other Plugins</a></h3>
<p>If your plugin can read the same file type as other installed
plugins, the system will combine the information returned by all
compatible plugins in a single dictionary, so that there is no
conflict.<br/>
The behaviour is different when writing to a file: in that case
the first library found is used to update the metadata of a file.
There is no way for the developer to prioritize libraries. Only
the user can specify the <code>--force</code> option to select
a given library.
</p>

Event Timeline