File Metadata

Created: Fri, Feb 28, 04:51

refextract.in
View Options

	#!@PYTHON@
	## -- mode: python; coding: utf-8; --
	##
	## This file is part of Invenio.
	## Copyright (C) 2005, 2006, 2007, 2008, 2010, 2011 CERN.
	##
	## Invenio is free software; you can redistribute it and/or
	## modify it under the terms of the GNU General Public License as
	## published by the Free Software Foundation; either version 2 of the
	## License, or (at your option) any later version.
	##
	## Invenio is distributed in the hope that it will be useful, but
	## WITHOUT ANY WARRANTY; without even the implied warranty of
	## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
	## General Public License for more details.
	##
	## You should have received a copy of the GNU General Public License
	## along with Invenio; if not, write to the Free Software Foundation, Inc.,
	## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

	"""
	"refextract" is used to extract and process the "references"
	or "citations" made to other documents from within a document.
	A document's "references" section is usually found at the end of
	the document, and generally consists of a list of the works
	cited during the course of the document.
	"bibrefextract" can attempt to identify a document's references
	section and extract it from the document. It can also attempt
	to standardise the references (correct the names of journals
	etc so that they are written in a standard format), and mark them
	up so that they can be linked to the full articles on the Web by
	means of hyper-links.

	"refextract" has 4 phases of processing (passes):
	1. Convert PDF file to plaintext (UTF-8).
	2. Extract References from plaintext.
	3. Recognise and standardise citations in the extracted
	reference lines. (Periodical titles and institutional
	report numbers are standardised with the aid of
	dedicated knowledge-bases.)
	4. Markup standardised citations in MARC XML and output
	them.

	Can be called in either a daemin mode by specifying a collection,
	record id, or extraction job file as input and also in a
	standalone mode by providing a physical fulltext file using
	[-f, --fulltext].
	"""
	try:
	from invenio.refextract_cli import main
	except ImportError, err:
	import sys
	sys.stderr.write("Error: %s" % err)
	sys.stderr.flush()
	sys.exit(1)

	if __name__ == '__main__':
	main()

refextract.in
No OneTemporary
Actions

File Metadata

refextract.in
View Options

Event Timeline

refextract.inNo OneTemporaryActions

File Metadata

refextract.inView Options

Event Timeline

refextract.in
No OneTemporary
Actions

refextract.in
View Options