File Metadata

Created: Mon, Dec 2, 14:49

refextract.py
View Options

	# -- coding: utf-8 --
	##
	## This file is part of Invenio.
	## Copyright (C) 2005, 2006, 2007, 2008, 2010, 2011, 2013 CERN.
	##
	## Invenio is free software; you can redistribute it and/or
	## modify it under the terms of the GNU General Public License as
	## published by the Free Software Foundation; either version 2 of the
	## License, or (at your option) any later version.
	##
	## Invenio is distributed in the hope that it will be useful, but
	## WITHOUT ANY WARRANTY; without even the implied warranty of
	## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
	## General Public License for more details.
	##
	## You should have received a copy of the GNU General Public License
	## along with Invenio; if not, write to the Free Software Foundation, Inc.,
	## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

	from __future__ import print_function

	"""
	"refextract" is used to extract and process the "references"
	or "citations" made to other documents from within a document.
	A document's "references" section is usually found at the end of
	the document, and generally consists of a list of the works
	cited during the course of the document.
	"refextract" can attempt to identify a document's references
	section and extract it from the document. It can also attempt
	to standardise the references (correct the names of journals
	etc so that they are written in a standard format), and mark them
	up so that they can be linked to the full articles on the Web by
	means of hyper-links.

	"refextract" has 4 phases of processing (passes):
	1. Convert PDF file to plaintext (UTF-8).
	2. Extract References from plaintext.
	3. Recognise and standardise citations in the extracted
	reference lines. (Periodical titles and institutional
	report numbers are standardised with the aid of
	dedicated knowledge-bases.)
	4. Markup standardised citations in MARC XML and output
	them.

	Can be called in either a daemon mode by specifying a collection
	or record id and also in a standalone mode by providing a physical
	fulltext file using
	[-f, --fulltext].
	"""
	from invenio.base.factory import with_app_context


	@with_app_context()
	def main():
	from invenio.legacy.refextract.task import main as daemon_main
	try:
	return daemon_main()
	except KeyboardInterrupt:
	# Exit cleanly
	print('Interrupted')

refextract.py
No OneTemporary
Actions

File Metadata

refextract.py
View Options

Event Timeline

refextract.pyNo OneTemporaryActions

File Metadata

refextract.pyView Options

Event Timeline

refextract.py
No OneTemporary
Actions

refextract.py
View Options