Page Menu
Home
c4science
Search
Configure Global Search
Log In
Files
F122144694
extract_participants.py
No One
Temporary
Actions
Download File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Subscribers
None
File Metadata
Details
File Info
Storage
Attached
Created
Wed, Jul 16, 03:16
Size
1 KB
Mime Type
text/x-python
Expires
Fri, Jul 18, 03:16 (1 d, 23 h)
Engine
blob
Format
Raw Data
Handle
27440447
Attached To
R10013 cop-mining-participants
extract_participants.py
View Options
""" The main script of the cop participants extraction.
Takes as an argument the number of the cop to process.
"""
import
os
import
sys
import
partlistproc
from
partlistproc.MeetingAnalyzerFactory
import
MeetingAnalyzerFactory
from
partlistproc.PdfExtractorFactory
import
PdfExtractorFactory
txt_prefix
=
"../results/participants-txt/"
csv_prefix
=
"../results/participants-csv/"
default_intermediate_name
=
txt_prefix
+
"raw_X.txt"
default_output_name
=
csv_prefix
+
"participants_X.csv"
# format:
# extract_participants_xopX.py <numberOfCop> <intermediateFilename>
# <outputFilename>
# the last option is given if the OCR has already been done (for cop 1 - 4)
# parse arguments
arguments
=
sys
.
argv
label
=
arguments
[
1
]
intermediate_name
=
default_intermediate_name
.
replace
(
"X"
,
label
)
output_name
=
default_output_name
.
replace
(
"X"
,
label
)
if
(
len
(
arguments
)
>
2
):
intermediate_name
=
txt_prefix
+
arguments
[
2
]
output_name
=
csv_prefix
+
arguments
[
3
]
# First, extract the text from the pdf if not already done
if
not
os
.
path
.
isfile
(
intermediate_name
):
extr_factory
=
PdfExtractorFactory
(
label
,
intermediate_name
)
extr
=
extr_factory
.
createPdfExtractor
()
extr
.
extract_text
()
# Second, extract the data from the text
ana_factory
=
MeetingAnalyzerFactory
(
label
,
intermediate_name
)
ana
=
ana_factory
.
get_analyzer
()
ana
.
get_data
(
output_name
)
Event Timeline
Log In to Comment