Page Menu
Home
c4science
Search
Configure Global Search
Log In
Files
F66215054
conclusion.tex
No One
Temporary
Actions
Download File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Subscribers
None
File Metadata
Details
File Info
Storage
Attached
Created
Sun, Jun 9, 01:44
Size
2 KB
Mime Type
text/x-tex
Expires
Tue, Jun 11, 01:44 (1 d, 23 h)
Engine
blob
Format
Raw Data
Handle
18187131
Attached To
R10013 cop-mining-participants
conclusion.tex
View Options
\section
{
Conclusion
}
This semester project aims to extract the data contained in the participant lists to meetings of the UNFCCC.
It contributes to a larger research project that aims to investigate the gap between international commitments towards
policies against climate change and the national implementations.
Our collected data helps to better understand the composition of delegations to international climate negotiations.
\\
In a first step, we extract the text from the official participant lists provided by the UNFCCC.
As there are various formats, especially paper scans for the earliest meetings, we need different approaches.
We use Optical Character Recognition for the paper scans and a PDF processing library for the later lists.
Then, we process the text files generated in step one to extract the name, affiliation, affiliation category and
description of each listed participant.
This creates a complete dataset containing the data from all the 54 processed meetings.
We find 271,434 participants, among which we identify 138,940 as unique persons.
Note that there are some limitations due to format variations in the original lists.
Therefore, we sometimes extract wrong affiliations.
\\
In a second step, we find other attributes, as the gender, role and experience of the participants
by processing the available data. The experience of participants is of special interest,
as it links participations between the different meetings. We introduce a method to determine whether two names are the same person,
by using edit distances and other rules adapted to the particularities of the raw data.
\\
Finally, we propose some basic models to predict the number of interventions a Party to the Convention makes on a meeting using
the collected delegation composition information.
A linear regression model shows that the number of interventions is mostly dependent on the party itself,
identifying very active parties as the European Union and the United States.
The introduced two-step model that aims to better predict the data that contains many labels that are equal to zero
doesn't improve the prediction a lot. Due to time constraints, we can't go further on this topic within this project.
We believe that the potential for this part is large.
% TODO change
Other models to predict interventions could yield better performances, and linking the dataset to other data,
as the interactions of parties, could be very interesting.
\\
Our main accomplishment is the extraction of the complete dataset containing the data of the UNFCCC participant lists.
Our scripts are able to process new participant lists in the future if the format doesn't change fundamentally, which provides fast and simple
access to delegation characteristics on international climate negotiations.
%\subsection{Critics on methodology}
% mention errors of OCR, for example cop 7 (marocco in france)
Event Timeline
Log In to Comment