Page MenuHomec4science

conclusion.tex
No OneTemporary

File Metadata

Created
Sun, Jun 9, 01:44

conclusion.tex

\section{Conclusion}
This semester project aims to extract the data contained in the participant lists to meetings of the UNFCCC.
It contributes to a larger research project that aims to investigate the gap between international commitments towards
policies against climate change and the national implementations.
Our collected data helps to better understand the composition of delegations to international climate negotiations.\\
In a first step, we extract the text from the official participant lists provided by the UNFCCC.
As there are various formats, especially paper scans for the earliest meetings, we need different approaches.
We use Optical Character Recognition for the paper scans and a PDF processing library for the later lists.
Then, we process the text files generated in step one to extract the name, affiliation, affiliation category and
description of each listed participant.
This creates a complete dataset containing the data from all the 54 processed meetings.
We find 271,434 participants, among which we identify 138,940 as unique persons.
Note that there are some limitations due to format variations in the original lists.
Therefore, we sometimes extract wrong affiliations. \\
In a second step, we find other attributes, as the gender, role and experience of the participants
by processing the available data. The experience of participants is of special interest,
as it links participations between the different meetings. We introduce a method to determine whether two names are the same person,
by using edit distances and other rules adapted to the particularities of the raw data. \\
Finally, we propose some basic models to predict the number of interventions a Party to the Convention makes on a meeting using
the collected delegation composition information.
A linear regression model shows that the number of interventions is mostly dependent on the party itself,
identifying very active parties as the European Union and the United States.
The introduced two-step model that aims to better predict the data that contains many labels that are equal to zero
doesn't improve the prediction a lot. Due to time constraints, we can't go further on this topic within this project.
We believe that the potential for this part is large. % TODO change
Other models to predict interventions could yield better performances, and linking the dataset to other data,
as the interactions of parties, could be very interesting. \\
Our main accomplishment is the extraction of the complete dataset containing the data of the UNFCCC participant lists.
Our scripts are able to process new participant lists in the future if the format doesn't change fundamentally, which provides fast and simple
access to delegation characteristics on international climate negotiations.
%\subsection{Critics on methodology}
% mention errors of OCR, for example cop 7 (marocco in france)

Event Timeline