Page Menu
Home
c4science
Search
Configure Global Search
Log In
Files
F93931916
refextract.py
No One
Temporary
Actions
Download File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Subscribers
None
File Metadata
Details
File Info
Storage
Attached
Created
Mon, Dec 2, 14:49
Size
2 KB
Mime Type
text/x-python
Expires
Wed, Dec 4, 14:49 (1 d, 21 h)
Engine
blob
Format
Raw Data
Handle
22724699
Attached To
R3600 invenio-infoscience
refextract.py
View Options
# -*- coding: utf-8 -*-
##
## This file is part of Invenio.
## Copyright (C) 2005, 2006, 2007, 2008, 2010, 2011, 2013 CERN.
##
## Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
from
__future__
import
print_function
"""
"refextract" is used to extract and process the "references"
or "citations" made to other documents from within a document.
A document's "references" section is usually found at the end of
the document, and generally consists of a list of the works
cited during the course of the document.
"refextract" can attempt to identify a document's references
section and extract it from the document. It can also attempt
to standardise the references (correct the names of journals
etc so that they are written in a standard format), and mark them
up so that they can be linked to the full articles on the Web by
means of hyper-links.
"refextract" has 4 phases of processing (passes):
1. Convert PDF file to plaintext (UTF-8).
2. Extract References from plaintext.
3. Recognise and standardise citations in the extracted
reference lines. (Periodical titles and institutional
report numbers are standardised with the aid of
dedicated knowledge-bases.)
4. Markup standardised citations in MARC XML and output
them.
Can be called in either a daemon mode by specifying a collection
or record id and also in a standalone mode by providing a physical
fulltext file using
[-f, --fulltext].
"""
from
invenio.base.factory
import
with_app_context
@with_app_context
()
def
main
():
from
invenio.legacy.refextract.task
import
main
as
daemon_main
try
:
return
daemon_main
()
except
KeyboardInterrupt
:
# Exit cleanly
print
(
'Interrupted'
)
Event Timeline
Log In to Comment