No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Sun, Jan 12, 06:48

View Options

	diff --git a/report/conclusion.tex b/report/conclusion.tex
	index 9488843..dfc2e6a 100644
	--- a/report/conclusion.tex
	+++ b/report/conclusion.tex
	@@ -1,35 +1,34 @@
	\section{Conclusion}
	-This semester project aims to get the data contained in the participant lists to meetings of the UNFCCC.
	+This semester project aims to extract the data contained in the participant lists to meetings of the UNFCCC.
	It contributes to a larger research project that aims to investigate the gap between international commitments towards
	policies against climate change and the national implementations.
	-We extract data that helps to better understand the composition of delegations to international climate negotiations.\\
	+Our collected data helps to better understand the composition of delegations to international climate negotiations.\\
	In a first step, we extract the text from the official participant lists provided by the UNFCCC.
	As there are various formats, especially paper scans for the earliest meetings, we need different approaches.
	We use Optical Character Recognition for the paper scans and a PDF processing library for the later lists.
	-Then, we gather the information out of the text files generated in step one.
	-We find the name, affiliation, affiliation category and the description for each listed participant.
	-This creates us a complete dataset containing all the data from all the 54 processed meetings.
	+Then, we process the text files generated in step one to extract the name, affiliation, affiliation category and
	+description of each listed participant.
	+This creates a complete dataset containing the data from all the 54 processed meetings.
	We find 271,434 participants, among which we identify 138,940 as unique persons.
	Note that there are some limitations due to format variations in the original lists.
	Therefore, we sometimes extract wrong affiliations. \\
	In a second step, we find other attributes, as the gender, role and experience of the participants
	by processing the available data. The experience of participants is of special interest,
	as it links participations between the different meetings. We introduce a method to determine whether two names are the same person,
	by using edit distances and other rules adapted to the particularities of the raw data. \\
	-Finally, we propose some basic models to predict the number of interventions a Party to the Convention makes.
	+Finally, we propose some basic models to predict the number of interventions a Party to the Convention makes on a meeting using
	+the collected delegation composition information.
	A linear regression model shows that the number of interventions is mostly dependent on the party itself,
	identifying very active parties as the European Union and the United States.
	The introduced two-step model that aims to better predict the data that contains many labels that are equal to zero
	doesn't improve the prediction a lot. Due to time constraints, we can't go further on this topic within this project.
	We believe that the potential for this part is large. % TODO change
	Other models to predict interventions could yield better performances, and linking the dataset to other data,
	as the interactions of parties, could be very interesting. \\
	Our main accomplishment is the extraction of the complete dataset containing the data of the UNFCCC participant lists.
	Our scripts are able to process new participant lists in the future if the format doesn't change fundamentally, which provides fast and simple
	-access to delegation characteristics on climate negotiations.
	+access to delegation characteristics on international climate negotiations.


	-% other applications
	-
	%\subsection{Critics on methodology}
	% mention errors of OCR, for example cop 7 (marocco in france)
	\ No newline at end of file
	diff --git a/report/data_processing.tex b/report/data_processing.tex
	index 3e1be9e..367e280 100644
	--- a/report/data_processing.tex
	+++ b/report/data_processing.tex
	@@ -1,444 +1,442 @@
	\section{Data Extraction and Processing}
	The main part of this project is to extract the information contained in the participant lists of UNFCCC meetings.
	We explain in this section how we extract and process the data of the PDF lists.


	\subsection{Data Extraction}
	This section describes the first problem of the project, i.e., how we extract the data from the PDF participant lists.
	The first step consists of transforming the available PDF files into text files. It is important to keep some information
	about the structure of the text to be able to find the relevant data in the resulting text file.
	The second step consists of transforming the text files to comma-separated values (CSV) files.
	The result of this task is a CSV file for each processed participant list that contains the entries
	\textit{affiliation category, affiliation, name, description}.

	\subsubsection{Raw Dataset} \label{dataset}
	We download the participant lists from the document webpage of the UNFCCC secretariat. \cite{UNFCCC_docs}
	The lists we process contain all the COP meetings and almost all the SB meetings. Note that during a COP, there is usually an SB meeting
	held in parallel for which there is no separate participant list. \\
	A participant list has the following general structure: Participants are listed under the affiliation they belong to. A member of the
	Swiss government is for example listed under the affiliation “Switzerland”. A participant is attributed a salutary address that contains
	at least “Mr.” or “Ms.”, but may also contain some titles as “H.E.” (i.e., “Her Excellence”). Some participants, but not all of them, are
	attributed a description that explains their role within the delegation. This description could for example be “Minister of Foreign Affairs”.
	Affiliations are sorted according to their affiliation category and then alphabetically. The affiliation categories are:
	\begin{itemize}
	\item Parties
	\item Observer States
	\item United Nations Secretariat units and bodies
	\item Specialized agencies and related organizations
	\item Intergovernmental organizations
	\item Non-governmental organizations
	\end{itemize}
	Furthermore, most of the lists contain an index that states the total number of participants per category.
	The category “Media” exists in this index for newer participant lists, but the corresponding participants are not listed. We therefore exclude this category.
	\\
	The format of the participant lists varies over time. For the first meetings the participant lists are paper scans, which means that
	we need to convert images to text. Furthermore, the manner in which affiliations are indicated varies, in the first meetings they are
	always written in all uppercase letters, which was changed in later meetings. Figures \ref{fig:raw_scan} and \ref{fig:raw_well} show the first page of the
	participant lists of COP 3 and COP 25 respectively, the one for COP 3 being a scan.

	\begin{figure}
	\centering
	\begin{minipage}[ht]{.5\textwidth}
	\captionsetup{width=.8\linewidth}
	\captionof{figure}{Example page of participant list of COP 3}
	\centering
	\includegraphics[width=0.9\textwidth]{raw_scan.PNG}
	\label{fig:raw_scan}
	\end{minipage}%
	\begin{minipage}[ht]{.5\textwidth}
	\captionsetup{width=.8\linewidth}
	\captionof{figure}{Example page of participant list of COP 25}
	\centering
	\includegraphics[width=0.9\textwidth]{raw_well_formatted.PNG}
	\label{fig:raw_well}
	\end{minipage}
	\end{figure}

	We choose the version of the participant lists that is published during the last days of a meeting.
	We exclude the corrigenda, documents that are published later for some participant lists and contain corrections of the lists,
	because their format varies a lot
	and many of the listed corrections are rather small (change of order of participants within an affiliation, change of descriptions).


	\subsubsection{Optical Character Recognition}
	To extract the data from the scanned lists, we use Optical Character Recognition (OCR), more precisely Python-Tesseract (pytesseract \cite{pytesseract}).
	Python-Tesseract is a wrapper for the OCR engine Tesseract developed by Google since 2006. \\
	Tesseract works as follows. First, it looks for regions in the image that contain dense elements to find connected components that are then organized as text lines.
	This first step determines the format of a page that is extracted.
	Then, a two-pass process for recognition is applied.
	In the first pass, the program tries to recognize each word. If a word is recognized satisfactory, it is used as training data for
	every word that follows. To make use of all the training data, the second pass goes over all unrecognized words for a second time. \cite{tesseract_expl} \\
	The version of tesseract that we use introduces neural nets LSTM. % TODO LSTM neural nets
	In the dataset of this project, the Tesseract OCR engine fails for some specific pages that contain only sparsely distributed participants
	without descriptions and messes up the order. To lead to a recognition of more accurate connected components, we insert half-transparent boxes on pages
	that encounter this problem. (See Figure \ref{fig:boxes}) This ensures the correct order of names in the resulting text file.
	\begin{figure}[ht]
	\caption{Page with an inserted half-transparent box before OCR}
	\centering
	\includegraphics[width=0.4\textwidth]{boxes_tesseract.png}
	\label{fig:boxes}
	\end{figure}


	% TODO change title
	\subsubsection{Well-formatted PDF Extraction}
	To extract the data from the well-formatted PDF files, we use a PDF processing package called Pdfminer.six.\cite{pdfminer.six}
	Again, the main difficulty is to extract the text of the list in correct order. Especially for documents with three columns, this
	becomes a difficult task. For this reason, we adapted the use of Pdfminer.six by rewriting one of the classes, the \texttt{PDFPageAggregator}. \\
	-First, we explain quickly how pdfminer extracts text from PDF files. Pdfminer.six performs a layout analysis on every page before
	+First, we explain quickly how Pdfminer.six extracts text from PDF files. Pdfminer.six performs a layout analysis on every page before
	extracting the text. This analysis is done in three stages:
	\begin{itemize}
	\item Group characters to words and lines
	\item Group lines to boxes
	\item Group text boxes hierarchically
	\end{itemize}
	The output of the layout analysis is visualized in Figure \ref{fig:pdfminer}.\\

	\begin{figure}[ht]
	\caption{Output of the layout analysis of pdfminer.six}
	\centering
	\includegraphics[width=0.9\textwidth]{pdfminer.png}
	\label{fig:pdfminer}
	\end{figure}

	The class we want to modify, \texttt{PDFPageAggregator}, is responsible for outputting the text lines of a page in the determined order.
	To be able to sort the text lines according to our rules later, we modify the function \texttt{receive\_layout} such that it outputs
	for each LTTextLine the available $x$ and $y$ positions within the page. In our script that performs the extraction, we then define rules to
	determine in which column a text line is situated. \\
	A special case for the page layout are affiliation category titles. They break the column system in the middle of a page. We therefore
	need to recognize them by their text and introduce special rules for pages that contain affiliation category titles.
	Another difficulty is the recognition of new affiliations. Pdfminer.six is not able to get information about font style, so the
	only way to detect new affiliations is through line breaks and the fact that names are always started with a salutary address.
	As line breaks are automatically preserved with pdfminer, we encounter problems only in special situations: When a new affiliation
	is on top of a column and longer than two lines, we can't distinguish it from the description of a previous participant that is
	divided to two columns.


	\subsubsection{Extraction from Text Files}
	We now need to extract the information from the generated text files.
	We do this with the following procedure:
	\begin{enumerate}
	\item Clean the text file from unnecessary elements, e.g. page numbers and page headers.
	\item Iterate through the rows of the text file and repeatedly apply:
	\begin{enumerate}
	\item Check if the current line is the beginning of a new affiliation category. We do this by keyword checking.
	\item Check if the current line is a new affiliation.
	We look for format cues like a row in uppercase letters (for early meetings) or lines that are positioned after a double line
	break and don't start with a salutary address.
	\item Check if the current line is a new name for the current affiliation by detecting a salutary address.
	\item If none of the above is the case, add the line to the description of the last participant.
	\end{enumerate}
	+ \item Store the data structure to a CSV file.
	\end{enumerate}

	Note that this algorithm fails for participants that do not start with a salutary address. But as this case only happens
	a few times in all the processed lists, we can neglect this error.



	\subsection{Data Processing}
	-This section describes task two of the project, i.e., to gain more information in form of attributes
	+This section describes the second problem of the project, i.e., to gain more information in form of attributes
	from the extracted data.
	The goal is to post-process the CSV files and bring all the meetings together to one dataset that contains more
	attributes per participants.

	\subsubsection{Unification of Meetings}
	In order to make our complete dataset as consistent as possible, we need the same affiliation to be named the same
	throughout all the meetings.
	For some earlier meetings, e.g. COP 2, the English version of the participant lists were not available.
	We therefore processed the French versions of their participant lists. With the help of a dictionary, we translate
	the names of all the parties to English.
	Once all the country names are in English, we nevertheless need to unify them to the same country denotation.
	-For example, the party Venezuela is named "Venezuela" in the participant list of COP 6, but
	-"Venezuela (Bolivarian Republic of)" in COP 25.
	+For example, the party Venezuela is named “Venezuela” in the participant list of COP 6, but
	+“Venezuela (Bolivarian Republic of)” in COP 25.
	To unify the English country names, we use the python package country-converter. \cite{coco}
	-We use it to change every country name to its "short" name. "Venezuela (Bolivarian Republic of)" then becomes
	-"Venezuela". This packet has some limits when misspelling occur. For example, an error in the OCR process caused
	-Iran to be spelled "Tran" in COP 1, which provokes that the country-converter doesn't recognize it correctly. \\
	-Note that we applied translation and unification only to the parties. Even if in some earlier meetings there are
	-other affiliations that are written in different ways, the unification would have been more difficult and more
	-error-prone due to the larger number of possible names. We therefore decided not to apply unification for the
	-rest of the participant lists, also including the descriptions.
	+We use it to change every country name to its “short” name. “Venezuela (Bolivarian Republic of)” then becomes
	+“Venezuela”. This packet has limitations when misspelling occur. For example, an error in the OCR process caused
	+Iran to be spelled “Tran” in COP 1, which provokes that the country-converter doesn't recognize it correctly. \\
	+Note that we apply translation and unification only to the affiliation names of the parties. The unification would be more difficult and more
	+error-prone for other affiliations due to the larger number of possible names.
	+Similarly, we didn't modify descriptions of participants, even if they're sometimes written in another language.

	\subsubsection{Gender and Title}
	The easiest additional attributes to extract are gender and title of participants. This is due to the very static
	-structure of names in the UNFCCC participant lists: Each name starts with a salutary address ("Mr.", "Ms.", "Sr.", "Sra." etc.)
	+structure of names in the UNFCCC participant lists: Each name starts with a salutary address (“Mr.”, “Ms.”, “Sr.”, “Sra.” etc.)
	that is associated to be either male or female. By simply checking this salutary address, we can extract the gender
	of each participant.
	-Optionally, the salutary address contains some title like "H.E." ("Her Excellence"), "Dr." or "Prof.".
	+Optionally, the salutary address contains some title like “H.E.” (“Her Excellence”), “Dr.” or “Prof.”.
	We set a binary attribute \textit{has\_title} to 1 if a participant is listed with such a title, and to 0 otherwise.

	\subsubsection{Roles} \label{roles}
	-The descriptions of the participants contain more information about the participant, but in a very inconsistent
	+The description of a participant contains more information about them, but in a very inconsistent
	format. Every affiliation can decide what to provide as descriptions of their participants.
	-We extract information out of the descriptions by defining roles. These roles define the
	+We try to categorize descriptions by defining roles. These roles define the
	role of a participant within its affiliation. \\
	We assign a role to a participant by looking for keywords in its description. The following list contains the
	roles that we look for and some corresponding keywords in order of decreasing priority. If a description contains
	keywords from more than one role, it's assigned the one with higher priority.
	\begin{itemize}
	\item Security (Security Officer, Security Service)
	\item Diplomacy (Ambassador, Embassy, Diplomatic)
	\item Government (Ministry, Minister, Government, Parliament, Agency, Department of, European Commission, Presidential Office)
	\item Press (Journalist, Reporter, Radio, Press)
	\item Universities (Professor, Researcher, Student, University)
	\end{itemize}
	-The reason for "Security" having the highest priority is that security service is often provided to people of other roles.
	-With our priority rule, the description "Security Officer of the Minister" would be assigned the role "Security"
	-and not "Government", which is the correct choice. On the other hand, we avoid the keyword "Security" for this role
	-to prevent a description like "Minister for Politics, Law, and Security Affairs" being assigned to the role "Security". \\
	+The reason for “Security” having the highest priority is that security service is often provided to people of other roles.
	+With our priority rule, the description “Security Officer of the Minister” would be assigned the role “Security”
	+and not “Government”, which is the correct choice. On the other hand, we avoid the keyword “Security” for this role
	+to prevent a description like “Minister for Politics, Law, and Security Affairs” being assigned to the role “Security”. \\

	\subsubsection{Association to Fossil Fuel Industry}
	We also use keywords to determine whether a participant is associated to the fossil fuel industry or not.
	+Examples of our keywords are “Petroleum”, “Oil”, “Gas”, “BP”, “Total”, etc.
	We separate this from the roles as we do not only use the description, but also the affiliation name to
	check for the keywords. For example, we want to detect all participants of the NGO
	-"Canadian Association of Petroleum Producers" as associated to the fossil fuel industry, even if
	+“Canadian Association of Petroleum Producers” as associated to the fossil fuel industry, even if
	they don't have any description. \\
	It is also an advantage that a participant that is associated with the fossil fuel industry can still
	-have a role. For example, Saudi Arabia has a Ministry of Petroleum. The corresponding Minister is
	-assigned the role "Government" but still is associated with the fossil fuel industry.
	+have a role. For example, Saudi Arabia has a Ministry of Petroleum. The corresponding minister is
	+assigned the role “Government” but still is associated with the fossil fuel industry.

	\subsubsection{Experience} \label{experience}
	When bringing together the data of all the different meetings, we are interested in the
	experience of participants. We define experience by the number of earlier UNFCCC meetings
	that the participant has visited. We differ between experience in SB meetings and COP meetings,
	as they have quite different characteristics. Furthermore, we differ between experience
	-within a delegation of a Party to the Convention (i.e. category "parties") and experience
	-in a delegation of a non-governmental organization. \\
	+within a delegation of a Party to the Convention (i.e. affiliation category “parties”) and experience
	+in another affiliation category. \\
	To determine the experience, we have to compare names throughout different meetings.
	There are some situations where a plain text comparison would fail, even if it's the same
	person.
	\begin{itemize}
	\item Different spellings of the name, simplification of a special character (e instead of é)
	\item Long names that span over more than one column are not entirely detected in the newer PDFs
	- because there are three columns in the document. Hence, only a part of the name is detected. % TODO example
	- \item The order of names is swapped (e.g. "Obama Barack" instead of "Barack Obama")
	+ because there are three columns in the document. Hence, only a part of the name is detected.
	+ \item The order of names is swapped (e.g. “Obama Barack” instead of “Barack Obama”)
	\end{itemize}
	We decided to handle these cases in the following manner:
	\begin{itemize}
	\item Allow an edit distance of 1 (see below).
	- \item Consider two names as the same when one starts with the other ("Alexander Van der Bellen" and
	- "Alexander Van der" are considered to be the same person). We exclude names with
	+ \item Consider two names as the same when one starts with the other (“Alexander Van der Bellen” and
	+ “Alexander Van der” are considered to be the same person). We exclude names with
	less than 15 characters from this rule to guarantee that a line break is involved.
	\item If the set of words of two names are equal, the persons are considered to be the same.
	\end{itemize}

	We compute the \textbf{edit distance} between names. There exist several types of edit distances.
	All of them count the minimum number of operations to get from one string to the other.
	We need to keep the accepted distance very small to keep the error rate low. With over 130 000 distinct participants,
	the occurrence of very similar names is probable.
	To get the property that we want, we need substitution to be allowed, such that a missed special character
	or a typo can simply be replaced by the correct character. We compare the performance of two edit distances. \\
	The \textbf{Hamming distance} only allows substitution, hence the compared strings need to have the same length.
	It is equal to the number positions at which the symbols differ in the two strings.
	The \textbf{Levenshtein distance} allows substitution, insertion and deletion. It is equal to the minimum number
	of single-character edits required to change one string into the other. Mathematically,
	\begin{equation} \label{levenshtein}
	lev(a,b) =
	\begin{cases}
	\lvert a \rvert & \text{if } \lvert b \rvert = 0 \\
	\lvert b \rvert & \text{if } \lvert a \rvert = 0 \\
	lev(tail(a), tail(b)) & \text{if } a[0] = b[0] \\
	1 + min \begin{cases}
	lev(tail(a), b) \\
	lev(a, tail(b)) \\
	lev(tail(a), tail(b)) \\
	\end{cases} & \text{otherwise}
	\end{cases}
	\end{equation}
	where for a string x, tail(x) is the string without the first character and $\lvert x \rvert$ is the length of the string. \\

	When comparing the results of Levenshtein distance and Hamming distance on our data, the samples that were additionally
	found to be the same person by the Levenshtein distance were mostly correct ones. One common case is for example a forgotten apostrophe (e.g.
	-"yaara peretz" and "ya'ara peretz") or a missing empty space (e.g. "yong chul cho" and "yongchul cho"). There are some
	-false positives that are inserted, but this is rather due to common names (e.g. "yan jia" and "yuan jia").
	+“yaara peretz” and “ya'ara peretz”) or a missing empty space (e.g. “yong chul cho” and “yongchul cho”). There are some
	+false positives that are inserted, but this is rather due to common names (e.g. “yan jia” and “yuan jia”).
	According to the results of this comparison, we choose the Levenshtein distance in the final implementation. \\
	-To mark false positives, we add another attribute to the dataset that is set to one if for a participant, there has
	-been detected his name twice in one of the earlier meetings. When this flag is set, the experience attributes
	-contain an error. \\
	+To mark false positives, we add another attribute to the dataset that is set to one if the name of a participant has been detected twice
	+in one of the earlier meetings.
	+When this flag is set, the experience attributes contain an error. \\

	In addition to the attributes for experience that we can add to our dataset, we obtain the information for each participant
	to which meetings they have participated within which affiliation. \\

	Note that a delegation is one instance of an affiliation. Each affiliation comes to a new meeting with a new delegation.
	To be able to compare delegations with respect to the experience of their participants, we need to define a metric
	for the experience of a delegation.
	We call this the \textbf{experience score} of an affiliation and define it as follows:
	\begin{equation}
	ExperienceScore(\text{delegation}) = avg(\text{total experience of the top 10 most experienced participants})
	\end{equation}
	The reason for only choosing the top 10 is that delegations are sometimes very big with only a few participants
	actively involved in the negotiation process.




	\subsection{Results}
	We process the participant list of 54 UNFCCC meetings, 26 COPs and 28 SBs.
	We find in total 271,434 participants, among which we identify 138,940 different persons.
	In average, we find 8353 participants per COP meeting and 1949 participants per SB meeting.
	We show in Figures \ref{fig:cop_overall} and \ref{fig:sb_overall} the total numbers of extracted participants
	of all the COP and SB meetings respectively.
	-The category with the most participants is "Parties", followed by "Non-governmental organizations".
	-The absence of the category "Non-governmental organizations" in COP 2 and SB 4 is due to a formatting error
	+The category with the most participants is “Parties”, followed by “Non-governmental organizations”.
	+The absence of the category “Non-governmental organizations” in COP 2 and SB 4 is due to a formatting error
	of the OCR process.
	In general, the number of participants have increased for all meetings over time.
	The peak in number of participants have occurred at the COP 15 meeting in Copenhagen 2009, followed by
	COP 21 in Paris 2015. For COP 15, it was originally planned to make a major agreement, but this failed.
	-The actual major agreement was then done in COP 21, the "Paris Agreement", which explains the second peak.
	+The actual major agreement was then done in COP 21, the Paris Agreement, which explains the second peak.
	Also, the SB meetings that were held earlier in the years 2009 and 2015 had more participants than the other SB meetings. \\
	For the latest meetings, there is a gap between the number of detected participants and the number of participants
	written in the index of the list. This gap is sometimes pretty large, the maximum being 7287 unlisted participants for COP 21.
	The UNFCCC explains this difference with the fact that only the participants that participated in the negotiation process
	are included in the list, the participants complementing the delegations are only included in the number but not in the list.
	% TODO speak about that with Victor

	\begin{figure}
	\centering
	\begin{minipage}[ht]{.5\textwidth}
	\captionsetup{width=.8\linewidth}
	\captionof{figure}{Overview of the extracted participants of COP meetings}
	\centering
	\includegraphics[width=0.9\textwidth]{participants_per_cop.png}
	\label{fig:cop_overall}
	\end{minipage}%
	\begin{minipage}[ht]{.5\textwidth}
	\captionsetup{width=.8\linewidth}
	\captionof{figure}{Overview of the extracted participants of SB meetings}
	\centering
	\includegraphics[width=0.9\textwidth]{participants_per_sb.png}
	\label{fig:sb_overall}
	\end{minipage}
	\end{figure}


	\subsubsection{Gender and Title}
	The proportion of women steadily increased since the first meetings.
	Starting at a rate of 21.4\% at the first SB meeting in 1995 it reached its temporary peak of 47.3\% at SB 50 in 2019.
	Figure \ref{fig:gender} shows the continuously increasing trend of this measure, with a slight higher rate
	of women at the SB meetings compared to the COP meetings.

	\begin{figure}[ht]
	\caption{Proportion of female participants per meeting}
	\centering
	\includegraphics[width=0.8\textwidth]{gender.png}
	\label{fig:gender}
	\end{figure}

	The UNFCCC secretariat publishes gender composition reports as their goal is to meet gender balance at their meetings,
	as this may lead to more gender-sensitive climate policies.
	They show in these reports that even if the numbers are almost reaching 50\% in the latest meetings, equality is not yet reached.
	The proportion of women is lower when only looking at the Parties to the Convention,
	and it is also significantly lower when considering the heads of delegations.
	For example for COP 24 we find an overall proportion of women of 42.8\%, the gender composition report
	states a percentage of 38\% for party delegates and a percentage of 27\% in the heads of delegation \cite{UNFCCC_genderreport} \\

	-The number of participants with a title is generally rather slow.
	-For COP meetings, the average rate of participants with a title is at 3.9\%, for SB meetings the average is at 1.8\%.
	+The number of participants with a title is generally rather low.
	+For COP meetings, the average proportion of participants with a title is at 3.9\%, for SB meetings the average is at 1.8\%.


	\subsubsection{Roles}
	The assigned roles are mainly of interest for parties, as the descriptions are the most exhaustive for their delegates and
	also contain more keywords. The Figures \ref{fig:cop_roles} and \ref{fig:sb_roles} show which roles have been found to
	which percentage in the parties of the meetings for COP and SB meetings respectively.
	-The main role is "Government" with a usual rate of 40-60\% of all party delegates being assigned this role.
	-The rate of governmental participants is higher at SB meetings, being almost always over 60\% after SB 16.
	-"Security" is the second most common role, with usually more than 10\% of the party participants at COPs having this role
	+The main role is “Government” with a usual proportion of 40-60\% of all party delegates being assigned this role.
	+The proportion of governmental participants is higher at SB meetings, being almost always over 60\% after SB 16.
	+“Security” is the second most common role, with usually more than 10\% of the party participants at COPs being assigned this role
	and about 5\% at SB meetings.
	-The role "Diplomacy" is represented more at COP meetings, which makes sense considering that SB meetings are mainly
	-negotiation focused.
	-The role "no keyword found" in the plots shows the participants that did not match any keyword,
	-the role "no description" contains the participants that didn't have a description in the source document.
	+The role “Diplomacy” is more present at COP meetings, which makes sense considering that SB meetings are mainly
	+negotiation focused and that Ambassadors often just represent the country officially.
	+The role “no keyword found” in the plots shows the participants that did not match any keyword,
	+the role “no description” contains the participants that didn't have a description in the source document.

	\begin{figure}
	\centering
	\begin{minipage}{.5\textwidth}
	\captionof{figure}{Assigned roles for COP meetings}
	\centering
	\includegraphics[width=1\linewidth]{roles_cop.png}
	\label{fig:cop_roles}
	\end{minipage}%
	\begin{minipage}{.5\textwidth}
	\captionof{figure}{Assigned roles for SB meetings}
	\centering
	\includegraphics[width=1\linewidth]{roles_sb.png}
	\label{fig:sb_roles}
	\end{minipage}
	\end{figure}


	\subsubsection{Association to Fossil Fuel Industry}
	The number of detected participants that openly represent the fossil fuel industry varies a lot from meeting to meeting.
	-Figures \ref{fig:cop_fossil} and \ref{fig:sb_fossil} show the rate and absolute numbers of detected fossil fuel industry representants
	+Figures \ref{fig:cop_fossil} and \ref{fig:sb_fossil} show the rate and absolute numbers of detected fossil fuel industry representatives
	for COP and SB meetings respectively. The average rate of participants with a fossil fuel industry association is
	1.7\% for COP meetings and 2.7\% for SB meetings. These rates have decreased over the years as the number of participants has increased.

	\begin{figure}
	\centering
	\begin{minipage}[ht]{.5\textwidth}
	\captionsetup{width=.8\linewidth}
	\captionof{figure}{Participants with fossil fuel industry association (COP)}
	\centering
	\includegraphics[width=1\linewidth]{ff_cop.png}
	\label{fig:cop_fossil}
	\end{minipage}%
	\begin{minipage}[ht]{.5\textwidth}
	\captionsetup{width=.8\linewidth}
	\captionof{figure}{Participants with fossil fuel industry association (SB)}
	\centering
	\includegraphics[width=1\linewidth]{ff_sb.png}
	\label{fig:sb_fossil}
	\end{minipage}
	\end{figure}



	\subsubsection{Experience}
	-Over all meetings, we find 138,940 distinct participants. \\
	-
	+Over all meetings, we find 138,940 distinct participants.
	We identify 193 persons that have participated to at least half of the 54 processed meetings.
	The most experienced participants and their affiliation in COP 25 are the following:
	\begin{enumerate}
	\item Helmut Hojesky: Austria (26 COP, 27 SB) % TODO victor add more information?
	\item Norine Kennedy: United States Council for International Business (25 COP, 28 SB)
	\item Manfred Treber: Germanwatch (26 COP, 26 SB)
	\end{enumerate}

	We can investigate the different affiliations that participants have over time.
	Figure \ref{fig:exp_flow} shows an undirected graph in which nodes are affiliations and edges are participants
	changing from one affiliation to another between to meetings starting to track at COP 10. The weight of an edge increases by one
	for every detected interchange of a participant, regardless of the direction.
	We only show the 40 affiliations with the highest degree, i.e., the highest sum of weights of their adjacent edges.
	Note that a single error in the data extraction can generate quite some error in this graph as the weights are
	rather small in general, so before drawing conclusion about some details of the plot, a recheck with the raw data is
	necessary. \\
	-The maximum weight edge is between the two NGO's "International Institute for Environment and Development" and
	-"International Institute for Sustainable Development" with a total of 51 participant exchanges.
	-Interesting to note is the strong exchange between the NGO's "International Emissions Trading Association",
	-"World Business Council for Sustainable Development" and "International Chamber of Commerce", which are all
	+The maximum weight edge is between the two NGO's “International Institute for Environment and Development” and
	+“International Institute for Sustainable Development” with a total of 51 participant exchanges.
	+Interesting to note is the strong exchange between the NGO's “International Emissions Trading Association”,
	+“World Business Council for Sustainable Development” and “International Chamber of Commerce”, which are all
	representing the interests of business and often linked with big companies operating in the fossil fuel industry.
	Quite naturally, we detect a lot of connections between the European Union and its member countries, these edges are
	the strongest when only considering party to party edges.

	\begin{figure}[ht]
	\caption{Flow of participants between meetings between the most connected affiliations}
	\centering
	\includegraphics[width=1\textwidth]{participant_flow_maxdegree_allafs.png}
	\label{fig:exp_flow}
	\end{figure}

	% TODO experience score graphs (include austria!!)
	We consider our defined Experience Score to compare affiliations according to their experience.
	Figure \ref{fig:expscore_overview} shows the average Experience Score over all affiliations per meeting.
	The separation of the bars shows if the experience is more or less gained in COP or SB meetings.
	Affiliations at SB meetings have a higher experience score than on COP meetings, which can be explained
	through the fact that inexperienced participants go less to SB meetings, as these meetings are more
	technical and with less public attention.

	\begin{figure}[ht]
	\caption{Average Experience Score over time}
	\centering
	\includegraphics[width=1\textwidth]{experiencescore_overview.png}
	\label{fig:expscore_overview}
	\end{figure}
	-
	-TODO plot experience score for some affiliations (include Austria)
	\ No newline at end of file
	diff --git a/report/predictive_modelling.tex b/report/predictive_modelling.tex
	index 91210da..a3946da 100644
	--- a/report/predictive_modelling.tex
	+++ b/report/predictive_modelling.tex
	@@ -1,155 +1,155 @@
	\section{Predictive Modelling} \label{predictive_modelling}
	Having extracted and processed the data contained in the participant lists, we use them to build predictive models for other data.
	First, we build linear models for the data on interventions at UNFCCC meetings collected by Tatiana Cogne and Victor Kristof (see \ref{tatiana}).
	Note that we can't go further on this topic due to time constraints, but there is more potential for creating models with our data,
	especially for the interaction dataset also collected by Tatiana Cogne and Victor Kristof.

	\subsection{Predict Interventions}
	The data on interventions lists for different UNFCCC meetings how many times a party intervenes in this meeting.
	We build a model that predicts for a party and a given meeting the number of interventions of the party at this meeting.
	-Figure \ref{fig:interv_distr} plots the distribution of the interventions, i.e. the distribution of the labels of the complete dataset.
	+Figure \ref{fig:interv_distr} plots the distribution of the interventions, i.e., the distribution of the labels of the complete dataset.
	Most parties don't have any intervention or only one, while some parties intervene a lot more.

	\begin{figure}[ht]
	\caption{Distribution of the intervention labels}
	\centering
	\includegraphics[width=0.7\textwidth]{distr_interventions.png}
	\label{fig:interv_distr}
	\end{figure}

	\subsubsection{Data samples}
	We define a data sample $x_i$ as the participation of a party at a meeting. Note that we only consider parties and no other
	affiliations as only parties are able to make interventions in the official negotiations. We thus define the features of
	a data sample.

	\begin{center}
	\begin{tabularx}{\textwidth}{\|c\|c\|X\|}
	\hline
	Name & Value Range & Description \\
	\hline\hline
	year & $0 - 24$ & The year the meeting took place minus 1995 \\
	\hline
	number\_of\_delegates & $1 - 1589$ & Number of participants of this delegation \\
	\hline
	meeting\_type & 0 or 1 & 0 if the meeting is a COP, 1 if it's an SB \\
	\hline
	government\_rate & $0 - 1$ & Proportion of delegates with role "Government" \\
	\hline
	diplomacy\_rate & $0 - 1$ & Proportion of delegates with role "Diplomacy" \\
	\hline
	security\_rate & $0 - 1$ & Proportion of delegates with role "Security" \\
	\hline
	press\_rate & $0 - 1$ & Proportion of delegates with role "Press" \\
	\hline
	university\_rate & $0 - 1$ & Proportion of delegates with role "Universities" \\
	\hline
	no\_description\_rate & $0 - 1$ & Proportion of delegates with no description \\
	\hline
	no\_keyword\_rate & $0 - 1$ & Proportion of delegates with no detected keyword \\
	\hline
	nb\_fossil\_fuel\_industry\_associations & $0 - 26$ & Absolute number of delegates with association to the fossil fuel industry \\
	\hline
	woman\_proportion & $0 - 1$ & The proportion of female participants in the delegation \\
	\hline
	experience\_score\_cop & $0 - 18$ & The experience score on previous COPs of the delegation \\
	\hline
	experience\_score\_sb & $0 - 17$ & The experience score on previous COPs of the delegation \\
	\hline
	experience\_score\_parties\_rate & $0 - 1$ & The proportion of the total experience score that has been acquired in the category "Parties" \\
	\hline
	is\_Afghanistan & 0 or 1 & 1 if the delegation is Afghanistan \\
	\hline
	is\_Albania & 0 or 1 & 1 if the delegation is Albania \\
	\hline
	$\vdots$ & $\vdots$ & $\vdots$ \\
	\hline
	is\_Zimbabwe & 0 or 1 & 1 if the delegation is Zimbabwe \\
	\hline
	is\_unrecognized\_country & 0 or 1 & 1 if no party has been detected \\
	\hline
	\end{tabularx}
	\end{center}

	There is a total of $213$ features. The attribute \textit{year} is the year the meeting took place and is substracted 1995 which is the year of the first meeting (SB1)
	to get values closer to zero.
	The attributes \textit{government\_rate} to \textit{no\_keyword\_rate} correspond to the proportion of each role that we assign
	(see \ref{roles}). For the experience score, we provide COP and SB experience in total numbers, they sum up to the total
	experience score of an affiliation. The \textit{experience\_score\_parties\_rate} denotes the rate of the total experience score
	that has been acquired in parties (see \ref{experience}).
	The information about the parties are converted into 198 binary attributes, one for each of the 197 Parties to the Convention
	and one for an invalid or unrecognized country. \\
	In total, we have 9218 data samples. We randomly pick about 80\% of these samples, i.e. 7400 samples, as our training set.
	The resting samples form our test set.


	\subsubsection{Models}
	% baseline models
	We first build two \textbf{baseline models}, such that we are later able to compare our models to those simple models.
	The first baseline model consists simply of always predicting zero interventions, as this is the most common label.
	The second baseline model consists of computing the average number of interventions a party did over all included meetings
	and always predict this average.

	% linear model
	TODO embed in linear model
	For this reason, we normalize the attributes before training the model. For an attribute $x_i,j$ we compute
	\begin{equation}
	x_{i,j} ' = \frac{x_{i,j} - \mu}{\|x_{i,j}\|}
	\end{equation}

	% linear model with logarithmic transformation
	TODO log transformation \\

	% mixed model
	A next approach is to try to handle the large amount of zero interventions better.
	The massive count of zero labels makes it hard for linear models to succeed.
	We therefore introduce a \textbf{two-step model} that works as follows: % TODO cite inspiration
	\begin{enumerate}
	\item Predict for each sample if the number of interventions will be zero or non-zero.
	\item For the non-zero sample, apply a second model to predict the label.
	\end{enumerate}
	For the first step, we use a logistic regressor with regularization.
	For the second step, we use a Poisson regressor with regularization.


	\subsubsection{Results}
	We will compare our models by the root-mean-square error (RMSE) between the predicted number of interventions $\hat{y_i}$ and the true values $y_i$.
	The root-mean-square error is defined as the root of the mean squared error (MSE), i.e. for $n$ samples,
	\begin{equation}
	RMSE = \sqrt{MSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (\hat{y_i} - y_i)^2}
	\end{equation}
	Figure \ref{fig:RMSE_results} shows the RMSE of the different models.

	\begin{figure}[ht]
	\caption{Resulting RMSE of the different models}
	\centering
	\includegraphics[width=0.9\textwidth]{resultsRMSE.png}
	\label{fig:RMSE_results}
	\end{figure}


	First, we consider the \textbf{baseline models}. When always predicting zero interventions, the test data yields $ RMSE = 9.54 $.
	When we always predict the average number of interventions of the party in question during all samples in the training data,
	the test data yields $ RMSE = 5.02 $. This shows that the information of the party already gives a lot of information about the behavior
	during meetings. \\

	The \textbf{ridge regression} model with all features yields an $ RMSE = 5.01 $.
	The optimal solution was found with cross-validation at regularizer $\lambda = 0.0101 $.
	% TODO insert the same notation W
	We can analyze attributes with the strongest influence on the prediction.
	The bias of the whole dataset is at $ w_0 = 3.281 $. The attributes with the strongest influence on the predictions are parties, as we expect seeing that the second
	baseline model works pretty good. The highest tendency to many interventions per meetings is showed by the European Union ($ + 74.7 $), United States ($ + 53.3 $) and China ($ + 48.0 $).
	Cote d'Ivoire ($ - 2.58 $), San Marino ($ - 2.56 $) and Greece ($ - 2.48 $) are the parties that bias the most towards little interventions.
	When considering only non-party attributes, the top of the list towards more interventions are \textit{press\_rate} ($ + 2.51 $), \textit{university\_rate} ($ + 1.11 $)
	and \textit{experience\_score\_parties\_rate} ($ + 0.70 $).
	The non-party attributes that are lowering the predicted number of interventions the most are \textit{no\_description\_rate} ($ - 1.67 $), \textit{diplomacy\_rate} ($ - 0.82 $)
	and \textit{no\_keyword\_rate} ($ - 0.43 $).
	Interestingly, the year and the number of delegates are the attributes with the weakest influence on the prediction. Apparently, time and delegation size do
	have a rather small influence on the activity of a party. \\

	% TODO log transf
	The \textbf{two-step model} doesn't improve the prediction, it yields an $ RMSE = 5.01 $.
	The first step correctly classifies 79.2\% of the test samples into zero or non-zero, with an optimal regularizer of $\lambda = 1.035 $.
	The second step predicts the number of interventions on the samples that have been classified as non-zero by the logistic regressor.
	For only those samples, i.e., the ones that have been predicted to be non-zero, it yields an $ RMSE = 9.89 $.
	When looking at the final prediction of all the test samples, the two-step model yields an $ RMSE = 4.94 $.
	This is a slight improvement compared to the previous models.
	diff --git a/report/report.aux b/report/report.aux
	index 0131236..6f3d4a0 100644
	--- a/report/report.aux
	+++ b/report/report.aux
	@@ -1,117 +1,60 @@
	\relax
	\providecommand\hyper@newdestlabel[2]{}
	\providecommand\HyperFirstAtBeginDocument{\AtBeginDocument}
	\HyperFirstAtBeginDocument{\ifx\hyper@anchor\@undefined
	\global\let\oldcontentsline\contentsline
	\gdef\contentsline#1#2#3#4{\oldcontentsline{#1}{#2}{#3}}
	\global\let\oldnewlabel\newlabel
	\gdef\newlabel#1#2{\newlabelxx{#1}#2}
	\gdef\newlabelxx#1#2#3#4#5#6{\oldnewlabel{#1}{{#2}{#3}}}
	\AtEndDocument{\ifx\hyper@anchor\@undefined
	\let\contentsline\oldcontentsline
	\let\newlabel\oldnewlabel
	\fi}
	\fi}
	\global\let\hyper@last\relax
	\gdef\HyperFirstAtBeginDocument#1{#1}
	\providecommand\HyField@AuxAddToFields[1]{}
	\providecommand\HyField@AuxAddToCoFields[2]{}
	\providecommand\@newglossary[4]{}
	\@newglossary{main}{glg}{gls}{glo}
	\providecommand\@glsorder[1]{}
	\providecommand\@istfilename[1]{}
	\@istfilename{report.ist}
	\@glsorder{word}
	\citation{ipcc:2018}
	\citation{UNFCCC}
	\citation{UNFCCC_process}
	\citation{evolution_UNFCCC}
	\citation{UNFCCC_process}
	\citation{larger_project}
	\citation{proj_tatiana}
	\@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{2}{section.1}\protected@file@percent }
	\@writefile{toc}{\contentsline {subsection}{\numberline {1.1}International Climate Negotiations}{2}{subsection.1.1}\protected@file@percent }
	\@writefile{toc}{\contentsline {subsection}{\numberline {1.2}Project}{2}{subsection.1.2}\protected@file@percent }
	\@writefile{toc}{\contentsline {subsubsection}{\numberline {1.2.1}Larger Project}{2}{subsubsection.1.2.1}\protected@file@percent }
	\newlabel{tatiana}{{1.2.1}{2}{Larger Project}{subsubsection.1.2.1}{}}
	\@writefile{toc}{\contentsline {subsubsection}{\numberline {1.2.2}Our Project}{2}{subsubsection.1.2.2}\protected@file@percent }
	\citation{UNFCCC_docs}
	\citation{pytesseract}
	\citation{tesseract_expl}
	\@writefile{toc}{\contentsline {section}{\numberline {2}Data Extraction and Processing}{3}{section.2}\protected@file@percent }
	\@writefile{toc}{\contentsline {subsection}{\numberline {2.1}Data Extraction}{3}{subsection.2.1}\protected@file@percent }
	\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.1.1}Raw Dataset}{3}{subsubsection.2.1.1}\protected@file@percent }
	\newlabel{dataset}{{2.1.1}{3}{Raw Dataset}{subsubsection.2.1.1}{}}
	\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.1.2}Optical Character Recognition}{3}{subsubsection.2.1.2}\protected@file@percent }
	\citation{pdfminer.six}
	\@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces Example page of participant list of COP 3\relax }}{4}{figure.caption.1}\protected@file@percent }
	\providecommand*\caption@xref[2]{\@setref\relax\@undefined{#1}}
	\newlabel{fig:raw_scan}{{1}{4}{Example page of participant list of COP 3\relax }{figure.caption.1}{}}
	\@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces Example page of participant list of COP 25\relax }}{4}{figure.caption.1}\protected@file@percent }
	\newlabel{fig:raw_well}{{2}{4}{Example page of participant list of COP 25\relax }{figure.caption.1}{}}
	\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.1.3}Well-formatted PDF Extraction}{4}{subsubsection.2.1.3}\protected@file@percent }
	\citation{coco}
	\@writefile{lof}{\contentsline {figure}{\numberline {3}{\ignorespaces Page with an inserted half-transparent box before OCR\relax }}{5}{figure.caption.2}\protected@file@percent }
	\newlabel{fig:boxes}{{3}{5}{Page with an inserted half-transparent box before OCR\relax }{figure.caption.2}{}}
	\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.1.4}Extraction from Text Files}{5}{subsubsection.2.1.4}\protected@file@percent }
	\@writefile{toc}{\contentsline {subsection}{\numberline {2.2}Data Processing}{5}{subsection.2.2}\protected@file@percent }
	\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.1}Unification of Meetings}{5}{subsubsection.2.2.1}\protected@file@percent }
	\@writefile{lof}{\contentsline {figure}{\numberline {4}{\ignorespaces Output of the layout analysis of pdfminer.six\relax }}{6}{figure.caption.3}\protected@file@percent }
	-\newlabel{fig:pdfminer}{{4}{6}{Output of the layout analysis of pdfminer.six\relax }{figure.caption.3}{}}
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.2}Gender and Title}{6}{subsubsection.2.2.2}\protected@file@percent }
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.3}Roles}{6}{subsubsection.2.2.3}\protected@file@percent }
	-\newlabel{roles}{{2.2.3}{6}{Roles}{subsubsection.2.2.3}{}}
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.4}Association to Fossil Fuel Industry}{7}{subsubsection.2.2.4}\protected@file@percent }
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.2.5}Experience}{7}{subsubsection.2.2.5}\protected@file@percent }
	-\newlabel{experience}{{2.2.5}{7}{Experience}{subsubsection.2.2.5}{}}
	-\newlabel{levenshtein}{{1}{7}{Experience}{equation.2.1}{}}
	-\citation{UNFCCC_genderreport}
	-\@writefile{lof}{\contentsline {figure}{\numberline {5}{\ignorespaces Overview of the extracted participants of COP meetings\relax }}{8}{figure.caption.4}\protected@file@percent }
	-\newlabel{fig:cop_overall}{{5}{8}{Overview of the extracted participants of COP meetings\relax }{figure.caption.4}{}}
	-\@writefile{lof}{\contentsline {figure}{\numberline {6}{\ignorespaces Overview of the extracted participants of SB meetings\relax }}{8}{figure.caption.4}\protected@file@percent }
	-\newlabel{fig:sb_overall}{{6}{8}{Overview of the extracted participants of SB meetings\relax }{figure.caption.4}{}}
	-\@writefile{toc}{\contentsline {subsection}{\numberline {2.3}Results}{8}{subsection.2.3}\protected@file@percent }
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.3.1}Gender and Title}{8}{subsubsection.2.3.1}\protected@file@percent }
	-\@writefile{lof}{\contentsline {figure}{\numberline {7}{\ignorespaces Proportion of female participants per meeting\relax }}{9}{figure.caption.5}\protected@file@percent }
	-\newlabel{fig:gender}{{7}{9}{Proportion of female participants per meeting\relax }{figure.caption.5}{}}
	-\@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces Assigned roles for COP meetings\relax }}{9}{figure.caption.6}\protected@file@percent }
	-\newlabel{fig:cop_roles}{{8}{9}{Assigned roles for COP meetings\relax }{figure.caption.6}{}}
	-\@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Assigned roles for SB meetings\relax }}{9}{figure.caption.6}\protected@file@percent }
	-\newlabel{fig:sb_roles}{{9}{9}{Assigned roles for SB meetings\relax }{figure.caption.6}{}}
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.3.2}Roles}{9}{subsubsection.2.3.2}\protected@file@percent }
	-\@writefile{lof}{\contentsline {figure}{\numberline {10}{\ignorespaces Participants with fossil fuel industry association (COP)\relax }}{10}{figure.caption.7}\protected@file@percent }
	-\newlabel{fig:cop_fossil}{{10}{10}{Participants with fossil fuel industry association (COP)\relax }{figure.caption.7}{}}
	-\@writefile{lof}{\contentsline {figure}{\numberline {11}{\ignorespaces Participants with fossil fuel industry association (SB)\relax }}{10}{figure.caption.7}\protected@file@percent }
	-\newlabel{fig:sb_fossil}{{11}{10}{Participants with fossil fuel industry association (SB)\relax }{figure.caption.7}{}}
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.3.3}Association to Fossil Fuel Industry}{10}{subsubsection.2.3.3}\protected@file@percent }
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {2.3.4}Experience}{10}{subsubsection.2.3.4}\protected@file@percent }
	-\@writefile{lof}{\contentsline {figure}{\numberline {12}{\ignorespaces Flow of participants between meetings between the most connected affiliations\relax }}{11}{figure.caption.8}\protected@file@percent }
	-\newlabel{fig:exp_flow}{{12}{11}{Flow of participants between meetings between the most connected affiliations\relax }{figure.caption.8}{}}
	-\@writefile{lof}{\contentsline {figure}{\numberline {13}{\ignorespaces Average Experience Score over time\relax }}{11}{figure.caption.9}\protected@file@percent }
	-\newlabel{fig:expscore_overview}{{13}{11}{Average Experience Score over time\relax }{figure.caption.9}{}}
	-\@writefile{toc}{\contentsline {section}{\numberline {3}Predictive Modelling}{12}{section.3}\protected@file@percent }
	-\newlabel{predictive_modelling}{{3}{12}{Predictive Modelling}{section.3}{}}
	-\@writefile{toc}{\contentsline {subsection}{\numberline {3.1}Predict Interventions}{12}{subsection.3.1}\protected@file@percent }
	-\@writefile{lof}{\contentsline {figure}{\numberline {14}{\ignorespaces Distribution of the intervention labels\relax }}{12}{figure.caption.10}\protected@file@percent }
	-\newlabel{fig:interv_distr}{{14}{12}{Distribution of the intervention labels\relax }{figure.caption.10}{}}
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {3.1.1}Data samples}{12}{subsubsection.3.1.1}\protected@file@percent }
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {3.1.2}Models}{13}{subsubsection.3.1.2}\protected@file@percent }
	-\@writefile{toc}{\contentsline {subsubsection}{\numberline {3.1.3}Results}{14}{subsubsection.3.1.3}\protected@file@percent }
	-\@writefile{lof}{\contentsline {figure}{\numberline {15}{\ignorespaces Resulting RMSE of the different models\relax }}{14}{figure.caption.11}\protected@file@percent }
	-\newlabel{fig:RMSE_results}{{15}{14}{Resulting RMSE of the different models\relax }{figure.caption.11}{}}
	-\@writefile{toc}{\contentsline {section}{\numberline {4}Conclusion}{15}{section.4}\protected@file@percent }
	-\bibdata{reference}
	-\bibcite{pdfminer.six}{1}
	-\bibcite{pytesseract}{2}
	-\bibcite{UNFCCC}{3}
	-\bibcite{proj_tatiana}{4}
	-\bibcite{larger_project}{5}
	-\bibcite{ipcc:2018}{6}
	-\bibcite{evolution_UNFCCC}{7}
	-\bibcite{tesseract_expl}{8}
	-\bibcite{UNFCCC_docs}{9}
	-\bibcite{UNFCCC_process}{10}
	-\bibcite{UNFCCC_genderreport}{11}
	-\bibcite{coco}{12}
	-\bibstyle{plain}
	-\gdef \@abspage@last{19}
	+\newlabel{fig:pdfminer}{{4}{6}{Output of the layout analysis of pdfminer.six\
	\ No newline at end of file

No OneTemporaryActions

File Metadata

View Options

Event Timeline

No OneTemporary
Actions