Page MenuHomec4science

refextract.in
No OneTemporary

File Metadata

Created
Fri, May 24, 16:09

refextract.in

#!@PYTHON@
## -*- mode: python; coding: utf-8; -*-
##
## This file is part of Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 CERN.
##
## Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
"refextract" is used to extract and process the "references"
or "citations" made to other documents from within a document.
A document's "references" section is usually found at the end of
the document, and generally consists of a list of the works
cited during the course of the document.
"bibrefextract" can attempt to identify a document's references
section and extract it from the document. It can also attempt
to standardise the references (correct the names of journals
etc so that they are written in a standard format), and mark them
up so that they can be linked to the full articles on the Web by
means of hyper-links.
"bibrefextract" has 4 phases of processing (passes):
1. Convert PDF file to plaintext (UTF-8).
2. Extract References from plaintext.
3. Recognise and standardise citations in the extracted
reference lines. (Periodical titles and institutional
report numbers are standardised with the aid of
dedicated knowledge-bases.)
4. Markup standardised citations in MARC XML and output
them.
Options:
--help, -h Display help/usage message and exit.
--version, -V Print version number and exit.
--output-raw-refs, -r output raw references, as extracted
from the document.
"""
try:
from invenio.refextract import main
except ImportError, err:
import sys
sys.stderr.write("Error: %s" % err)
sys.stderr.flush()
sys.exit(1)
if __name__ == '__main__':
main()

Event Timeline