diff --git a/code/readme.md b/code/readme.md
index 07b9e87..5fe3128 100644
--- a/code/readme.md
+++ b/code/readme.md
@@ -1,56 +1,59 @@
 # Bachelor project, Jan Linder & Viktor Kristof
 This is a project from INDY lab at EPFL that is part of a larger research project on the topic of international climate negotiations. (More information about the larger project can be found [here](https://snis.ch/projects/what-international-negotiators-promise-and-domestic-policymakers-adopt-policy-and-politics-in-the-multi-level-climate-change-regime/))
 This repository contains the code to extract the information out of participant lists to UNFCCC meetings. The code repo is structured in the following directories:
 - `data`: contains the raw data, i.e., the PDF participant lists (either in the directory `data/COP` or `data/SB`, depending on whether the meeting is a Conference of the Party (COP) or a Subsidiary Bodies (SB) meeting). Note that every new meeting to process must also be added in the `data/meetings_metadata.csv` file.
 - `lib`: contains the library to extract the data from the PDF files to .txt and then to CSV files. To use this, go to the `scripts` folder.
 - `results`: contains all the results once generated.
 - `scripts`: contains all the important scripts that use the lib code to extract the data.
 
 How to use the code is described in the next part.
 
 ## Important scripts & how to use them
 In the following, we explain how to use the most important scripts in this repository. They allow to use the library and it's functionalities.
 
-### extract_participants_copX.py
+### extract_participants.py
 `extract_participants.py <meetingLabel> (<intermediateFilename> <outputFilename>)`
 `@meetingLabel` (int): which meeting to handle. Example: "cop1" or "sb40"
 `@intermediateFilename` (str), optional: the filename .txt where the extracted text is stored. If it already exists, the pdf to txt will not be performed again.
 `@outputFilename` (str), optional: the output filename (.csv)
 Extracts the data from the raw data in two steps (the PDF participant list must be provided in the data folder (either COP or SB)). The first step is from PDF to a text file, the second step is from the text file to a CSV file. If a text file already exists for this COP (in `results/participants-txt`), it does not perform the PDF to txt extraction. Results in a CSV file for _one_ meeting.
 
 ### do_all.py
 calls the function `extract_participants.py` for all the meetings listed in the metadata file.
 Optimally, you should call `generate_complete_dataset.py`after this to update the complete dataset
 
 ### generate_complete_dataset.py
 Generates one complete dataset of all meetings specified in `data/meetings_metadata.csv` with more features. The CSV files for all the meetings must be provided in the `results/participants-csv` directory. Generates a CSV file called `complete_dataset.csv`.
 
+### sort_final_dataset.py
+Takes the complete dataset and sorts the columns to a specified order.
+
 ### find_experience.py
 This script finds the experience features, i.e., it links the different instances of the same persons in different lists. It requires the file `complete_dataset.csv` to exist. This script is long-running and can take about 10 hours to complete. More information about the different features can be found in the folder `results`.
 
 ### generate_plots.py
 Generates plots using matplotlib using the code specified in `scripts/plots`.
 
 ### prepare_intervention_data.py
 To predict the intervention data collected by Tatiana Cogne and Victor Kristof, you need to prepare their output for our prediction model.
 This script looks for the data (`interventions.csv` and `list_meetings.csv`) in the folder `code/data/data_tatiana` and processes them to create the files
 dataset_interventions, interventions_prepared.csv and interventions_aff.csv. Note that this needs to be rerun every time the complete dataset changes or
 when the intervention data changes.
 --> this data is then used in the notebook `predict_interventions.ipynb`
 
 ### predict_interventions.ipynb
 Jupyter notebook that contains models that try to predict the numbers of interventions based on our `complete_dataset.csv`.
 
 
 ## Informations about the original lists
 The participant list are taken from the following official website in September 2020: https://unfccc.int/process/bodies/supreme-bodies/conference-of-the-parties-cop
 Please note the following issues in the source:
 COP's 1 - 4, 7 & 8 are scans and are extracted with **Optical Character Recognition** using the package _**pytesseract**_, a wrapper for Google's tesseract-ocr machine (version 5.0.0). The results for those lists are expected to contain typos.
 - COP 2: The whole list is written in French.
 - COP 3: Officially stated 710 "overflow participants" that are not in the list.
 - COP 4: Very bad scan quality on p. 86 and p. 92. At least one participant unreadable.
 - COP 8: Generally bad scan quality. Might contain more errors.
 
 The other lists are extracted using the package _**pdfminer.six**_.
 - COP 5: Problem with special characters in the source. For example consider the names Sr. Ra™l CASTELLINI or Sra. MarÌa Fernanda CA‹¡S.
 - COP6: Note that there were held two COP6, seperated by half a year. This is why the meeting label "cop6b" exists. 
diff --git a/code/scripts/compare_experience.py b/code/scripts/compare_experience.py
deleted file mode 100644
index 48132c8..0000000
--- a/code/scripts/compare_experience.py
+++ /dev/null
@@ -1,10 +0,0 @@
-import json
-
-f = open("experience_nothamming_dict.txt", "r", encoding="utf-8")
-experience_dict = json.loads(f.read())
-f.close()
-
-filtered = {k: v for (k, v) in experience_dict.items() if len(v) > 1 and v[0][1] != v[1][1]}
-
-for k, v in sorted(filtered.items()):
-    print(str(k) + " -> " + str(v))
\ No newline at end of file
diff --git a/code/scripts/extract_affiliations.py b/code/scripts/extract_affiliations.py
index 1785540..5a95a3f 100644
--- a/code/scripts/extract_affiliations.py
+++ b/code/scripts/extract_affiliations.py
@@ -1,24 +1,27 @@
+""" Extracts a list of all affiliations in all the cops (without 7 and 8). Is then used to extract cops 7 and 8.
+    Could be refactored to taking the information also from sb meetings or optimized by eliminating errors"""
+
 import pandas as pd
 
 # black list coming from frequent errors
 # cancelled out : "national institute","institute for"
 no_affiliation_start = ("ministry ", "bureau ", "executive director", "chairman", "ministerio ", "netherlands committee for", "prof.")
 no_affiliation = ("technology", "protection", "university of vienna")
 
 affiliations = set([])
 
 for i in range(1, 26):
     if i != 7 and i != 8:
         data = pd.read_csv("../results/participants-csv/participants_cop" + str(i) + ".csv", encoding="utf-8-sig")
         byaffiliation = data.groupby('affiliation')
         for aff, rest in byaffiliation:
             aff = aff.lower()
             if not aff.startswith(no_affiliation_start) and aff not in no_affiliation:
                 affiliations.add(aff)
             # add temporarily
             else:
                 print(aff + " --- wrongly extracted from cop" + str(i))
 
 # write them to a csv file
 df = pd.DataFrame(affiliations)
 df.to_csv('../data/dictionaries/affiliations.csv', encoding="utf-8-sig")
diff --git a/code/scripts/extract_descriptions.py b/code/scripts/extract_descriptions.py
index a91b4bb..091a17f 100644
--- a/code/scripts/extract_descriptions.py
+++ b/code/scripts/extract_descriptions.py
@@ -1,36 +1,38 @@
+"""Analyzes the descriptions of all the participants in complete_dataset.csv"""
+
 import pandas as pd
 import numpy as np
 import re
 from collections import Counter
 
 from partlistproc.MeetingAnalyzer import MeetingAnalyzer
 
 
 descriptions = []
 
 participants = pd.read_csv("../results/complete_dataset.csv", encoding="utf-8-sig")
 
 for index, participant in participants.iterrows():
     description = str(participant["description"])
     description_list = re.split(MeetingAnalyzer.description_splitter,
                                 description)
     description_list = filter(None, description_list)
     #description_list = [str(line) for line in description_list]
     descriptions.extend(description_list)
     if index % 1000 == 0:
         print(index)
 
 
 #print(descriptions)
 print("Find the most common lines:")
 counter = Counter(descriptions)
 print("Totally found " + str(sum(counter.values())) + " distinct lines, the 20 most common being")
 print(counter.most_common(20))
 
 # save most common 200
 most_common_lines = counter.most_common(200)
 output_file = open("most_common_descriptions.txt", "a")
 for line, count in most_common_lines:
     output_file.write(str(count) + " times the line: " + line)
     output_file.write("\n")
 output_file.close()
diff --git a/code/scripts/extract_participants.py b/code/scripts/extract_participants.py
index ed43420..95dc130 100644
--- a/code/scripts/extract_participants.py
+++ b/code/scripts/extract_participants.py
@@ -1,42 +1,42 @@
 """ The main script of the cop participants extraction.
-Takes as an argument the number of the cop to process.
+    Takes as an argument the number of the cop to process.
 """
 
 import os
 import sys
 
 import partlistproc
 from partlistproc.MeetingAnalyzerFactory import MeetingAnalyzerFactory
 from partlistproc.PdfExtractorFactory import PdfExtractorFactory
 
 txt_prefix = "../results/participants-txt/"
 csv_prefix = "../results/participants-csv/"
 default_intermediate_name = txt_prefix + "raw_X.txt"
 default_output_name = csv_prefix + "participants_X.csv"
 valid_affiliation_names_path = "../data/dictionaries/valid_affiliation_names.csv"
 
 
 # format:
 # extract_participants_xopX.py <numberOfCop> <intermediateFilename>
 #   <outputFilename>
 # the last option is given if the OCR has already been done (for cop 1 - 4)
 
 # parse arguments
 arguments = sys.argv
 label = arguments[1]
 intermediate_name = default_intermediate_name.replace("X", label)
 output_name = default_output_name.replace("X", label)
 if(len(arguments) > 2):
     intermediate_name = txt_prefix + arguments[2]
     output_name = csv_prefix + arguments[3]
 
 # First, extract the text from the pdf if not already done
 if not os.path.isfile(intermediate_name):
     extr_factory = PdfExtractorFactory(label, intermediate_name, valid_affiliation_names_path)
     extr = extr_factory.createPdfExtractor()
     extr.extract_text()
 
 # Second, extract the data from the text
 ana_factory = MeetingAnalyzerFactory(label, intermediate_name)
 ana = ana_factory.get_analyzer()
 ana.get_data(output_name)
diff --git a/code/scripts/find_average_participants.py b/code/scripts/find_average_participants.py
index c9d8734..ab12d7a 100644
--- a/code/scripts/find_average_participants.py
+++ b/code/scripts/find_average_participants.py
@@ -1,32 +1,36 @@
+""" Finds the average numbers of participants for COP meetings and for SB meetings
+    Could be improved by using the metadata file.
+"""
+
 import pandas as pd
 import os
 
 complete_data = pd.read_csv("../results/complete_dataset.csv",
                             encoding="utf-8-sig")
 
 # COPs
 sum = 0
 for i in range(1, 26):
     data = pd.read_csv("../results/participants-csv/participants_cop" + str(i) + ".csv",
                        encoding="utf-8-sig")
 
     sum += len(data)
 
 data = pd.read_csv("../results/participants-csv/participants_cop6b.csv",
                    encoding="utf-8-sig")
 sum += len(data)
 
 print(f"COPs have in average {sum/26} participants")
 
 
 # SB
 sum = 0
 count = 0
 for i in range(1, 51):
     path = "../results/participants-csv/participants_sb" + str(i) + ".csv"
     if os.path.isfile(path):
         count += 1
         data = pd.read_csv(path, encoding="utf-8-sig")
         sum += len(data)
 
 print(f"SBs have in average {sum/count} participants")
diff --git a/code/scripts/find_biggest_delegations.py b/code/scripts/find_biggest_delegations.py
index 1841c7f..a22b9bf 100644
--- a/code/scripts/find_biggest_delegations.py
+++ b/code/scripts/find_biggest_delegations.py
@@ -1,41 +1,41 @@
 import sys
 import os
 import pandas as pd
 
-""" Finds the top ten biggest party delegation and top ten biggest NGO 
-    delegation of a meeting
+""" Finds the top ten biggest party delegation and top ten biggest NGO
+    delegation of a meeting (of which the label is given as an argument).
 """
 
 args = sys.argv
 if len(args) != 2:
     sys.exit("Please provide one argument that contains the label \
 of the meeting to proceed")
 
 label = args[1]
 
 datafile_name = "../results/participants-csv/participants_" + label + ".csv"
 # generate the data if not yet available
 if not os.path.isfile(datafile_name):
     sys.exit("There is no datafile for this meeting. Please check if given \
 label is a valid label")
 
 data = pd.read_csv(datafile_name, encoding="utf-8-sig")
 data_by_affiliation = data.groupby("affiliation")
 
 parties = []
 NGOs = []
 for affiliation, delegates in data_by_affiliation:
     df = pd.DataFrame(delegates)
     if df["affiliation_category"].iloc[0] == "parties":
         parties.append((affiliation, len(delegates)))
     if df["affiliation_category"].iloc[0] == "non-governmental organizations":
         NGOs.append((affiliation, len(delegates)))
 
 parties.sort(reverse=True, key=(lambda x: x[1]))
 NGOs.sort(reverse=True, key=(lambda x: x[1]))
 
 print("Biggest delegations of " + label)
 print("Parties")
 print(parties[:10])
 print("NGO's")
 print(NGOs[:10])
diff --git a/code/scripts/find_experience.py b/code/scripts/find_experience.py
index 831b707..839af53 100644
--- a/code/scripts/find_experience.py
+++ b/code/scripts/find_experience.py
@@ -1,120 +1,124 @@
+""" This script finds the experience features, i.e., it links the different instances of the same persons in different lists.
+    It requires the file `complete_dataset.csv` to exist. This script is long-running and can take about 10 hours to complete.
+"""
+
 import sys
 import pandas as pd
 import editdistance
 import json
 
 # constants
 max_distance = 1
 min_length_for_linebreak = 15
 names = dict()  # contains all the unique names in the format (name, list[(meeting, name, affiliation, affiliation_category)])
 
 
 def compare_names(name1, name2):
     # case: one starts with the other (because some words are on next line)
     l1 = len(name1)
     l2 = len(name2)
     if (l1 >= min_length_for_linebreak and
        l2 >= min_length_for_linebreak and
        (name2.startswith(name1) or name1.startswith(name2)) and
        (set(name1.split()) <= set(name2.split()) or
        set(name2.split()) <= set(name1.split()))):
         return True
     # case: first name and last name inversed -> same set of names
     if l1 == l2 and set(name2.split()) == set(name1.split()):
         return True
 
     # leventshtein difference if the two words have a similar length (value <= 1 possible)
     if abs(l1 - l2) > max_distance:
         return False
     else:
         dist = editdistance.eval(name1, name2)
         return dist <= max_distance
 
 
 def get_experience(name, meeting, affiliation, affiliation_category):
     """[summary]
 
     Args:
         name ([type]): [description]
         meeting ([type]): [description]
         affiliation ([type]): [description]
         affiliation_category ([type]): [description]
 
     Returns:
         int, int, int, int, bool: cop_exp, sb_exp, party_exp, not_party_exp, exp_err_poss
     """    
     for key_name, participation_list in names.items():
         if compare_names(name, key_name):
             prev_meetings = names[key_name]
             cops = [m for m in prev_meetings if m[0].startswith("cop")]
             sbs = [m for m in prev_meetings if m[0].startswith("sb")]
             in_party = [m for m in prev_meetings if m[3] == "parties"]
             not_party = [m for m in prev_meetings if m[3] != "parties"]
             names[key_name].append((meeting, name, affiliation, affiliation_category))
             # an error occurs when there is a meeting more than once
             err_poss = len(set([m[0] for m in prev_meetings])) != len(names[key_name])
             return len(cops), len(sbs), len(in_party), len(not_party), int(err_poss)
 
     names[name] = [(meeting, name, affiliation, affiliation_category)]
     return 0, 0, 0, 0, 0
 
 if __name__ == "__main__":
     complete_data = pd.read_csv("../results/complete_dataset.csv",
                                 encoding="utf-8-sig")
     complete_data_with_experience = pd.DataFrame(columns={
                 "meeting",
                 "name",
                 "gender",
                 "has_title",
                 "affiliation",
                 "affiliation_category",
                 "role",
                 "description",
                 "experience cop",
                 "experience sb",
                 "experience party",
                 "experience not_party",
                 "experience possible error"})
 
     metadata = pd.read_csv("../data/meetings_metadata.csv")
 
     for label in metadata["label"]:
     #for label in ["cop24", "cop25"]:
         print(label)
         data = complete_data.loc[complete_data.meeting == label]
 
         # print(data.apply(lambda row: pd.Series(get_experience(row["name"], row["meeting"], row["affiliation"], row["affiliation_category"]), axis=1)))
         data[["experience cop", "experience sb", "experience party", "experience not_party", "experience possible error"]] = (
             data.apply(lambda row: pd.Series(get_experience(row["name"], row["meeting"], row["affiliation"], row["affiliation_category"])), axis=1))
         
         complete_data_with_experience = complete_data_with_experience.append(data, ignore_index=True)
 
     # generate the output file
     complete_data_with_experience.to_csv("../results/complete_dataset_experience.csv", 
                                          encoding="utf-8-sig", index=False)
     print(len(names))
 
     # print the dictionary to a text file
     f = open("experience_dict.txt", "w", encoding="utf-8")
     f.write(json.dumps(names))
     f.close()
 
 def get_experience_score(delegates_experience):
     """Computes the experience score of an affiliation. This is the average experience of the top 10 most experienced delegates
 
     Args:
         delegates_experience (list[int]): The experiences of all the delegates of a party
     """        
     if len(delegates_experience) <= 10:
         return average(delegates_experience)
     else:
         copy = delegates_experience.copy()
         copy.sort(reverse=True)
         return average(copy[:10])
 
 
 def average(numbers):
     sum = 0
     for n in numbers:
         sum += n
     return sum / len(numbers)
diff --git a/code/scripts/find_most_common_word_nokeyword.py b/code/scripts/find_most_common_word_nokeyword.py
index 3a0f3aa..8520f6b 100644
--- a/code/scripts/find_most_common_word_nokeyword.py
+++ b/code/scripts/find_most_common_word_nokeyword.py
@@ -1,31 +1,34 @@
+""" Finds the most common word in the description of participants for which we didn't find a role.
+"""
+
 import pandas as pd 
 
 complete_data = pd.read_csv("../results/complete_dataset.csv",
                                 encoding="utf-8-sig")
 
 no_keyword_participants = complete_data.loc[complete_data["role"] == "no keyword found"]
 no_keyword_participants = no_keyword_participants.loc[no_keyword_participants["affiliation_category"] == "parties"]
 
 words_dict = dict()
 line_dict = dict()
 
 for description in no_keyword_participants["description"]:
     description = description.replace(";", " ")
     # description
     if description in line_dict:
         line_dict[description] += 1
     else:
         line_dict[description] = 1
     # words
     for word in description.split(" "):
         if word in words_dict:
             words_dict[word] += 1
         else:
             words_dict[word] = 1
 
 sorted_word_dict = sorted(words_dict.items(), key=lambda x: x[1], reverse=True)
 sorted_line_dict = sorted(line_dict.items(), key=lambda x: x[1], reverse=True)
 print("Most common words:")
 print(sorted_word_dict[:100])
 print("Most common lines:")
 print(sorted_line_dict[:100])
\ No newline at end of file
diff --git a/code/scripts/find_number_of_participants_per_party.py b/code/scripts/find_number_of_participants_per_party.py
index c2f3baa..bf08e55 100644
--- a/code/scripts/find_number_of_participants_per_party.py
+++ b/code/scripts/find_number_of_participants_per_party.py
@@ -1,20 +1,24 @@
+""" Creates a file that contains all parties and the number of participants of this party at a meeting.
+    To change the meeting, change the variable 'filename' below.
+"""
+
 import pandas as pd
 
 filename = "../results/participants-csv/participants_cop25.csv"
 
 participants = pd.read_csv(filename, encoding="utf-8-sig")
 
 grouped = participants.groupby("affiliation")
 
 country_and_nb_part = pd.DataFrame(columns={"country", "nb of participants"})
 
 for affiliation, people in grouped:
     df = pd.DataFrame(people)
     if df["affiliation_category"].iloc[0] == "parties":
         print(f"{affiliation} : {len(people)}")
         country_and_nb_part = country_and_nb_part.append({
             "country": affiliation,
             "nb of participants": len(people)
             }, ignore_index=True)
 
 country_and_nb_part.to_csv("cop25_per_party.csv", encoding="utf-8-sig", mode="w", index=False)
\ No newline at end of file
diff --git a/code/scripts/generate_complete_dataset.py b/code/scripts/generate_complete_dataset.py
index 9fc430e..28bae75 100644
--- a/code/scripts/generate_complete_dataset.py
+++ b/code/scripts/generate_complete_dataset.py
@@ -1,207 +1,207 @@
+""" This script generates one csv file containing all the participants of all
+    the available meetings (information taken from metadata file)
+"""
+
 import pandas as pd
 import country_converter as coco
 import editdistance
 import os
 import re
 
 import partlistproc.MeetingAnalyzer as Ana
 import partlistproc.MeetingAnalyzerFactory as AnaFac
 
-""" This script generates one csv file containing all the participants of all
-    the available meetings (information taken from metadata file)
-"""
-
 
 def is_male(name):
     return any(title in name for title in Ana.MeetingAnalyzer.masculine_salutory_addresses)
 
 
 def is_female(name):
     return any(title in name for title in Ana.MeetingAnalyzer.feminine_salutory_addresses)
 
 
 def has_title(name):
     return any(title in name for title in Ana.MeetingAnalyzer.titles)
 
 
 def has_no_title(name):
     return not has_title(name)
 
 
 def get_role(description):
     # EDIT: I had to redesign this to make keywords of several words possible (28.11.20)
     description = str(description)
     if description == "nan":
         return "no description"
     splitted = re.split('[; ]{1}', description)
     for key_line in roles_dict.keys():
         keywords = re.split(" ", key_line)
         if(str(keywords[0]) in splitted or (str(keywords[0])).lower() in
            splitted):
             if(len(keywords) == 1):
                 return roles_dict[key_line]
             else:
                 # keyword is more than one word: check the others
                 found_word = str(keywords[0])
                 if found_word not in splitted:
                     found_word = found_word.lower()
                 index = splitted.index(found_word)
                 size = len(splitted)
                 for i in range(1, len(keywords)):
                     if index + i >= size or str(keywords[i]).lower() != str(splitted[index + i]).lower():
                         continue
                 return roles_dict[key_line]
     return "no keyword found"
 
 
 def clear_name(name):
     """removes all salutory addresses and titles from a given name
 
     Args:
         name (str): the name to be cleared
     """
     cleared_name = name
     while cleared_name.startswith(Ana.MeetingAnalyzer.salutory_addresses):
         startindex = cleared_name.find(" ")
         if startindex == -1:
             return cleared_name
         startindex += 1
         cleared_name = cleared_name[startindex:]
 
     return cleared_name.lower()
 
 
 short_country_names = dict()
 short_country_names["european union"] = "European Union"
 short_country_names["european community"] = "European Union"
 def simplify_country_name(affiliation):
     if affiliation in short_country_names:
         return short_country_names[affiliation]
     else:
         # None for not found makes that it returns the input value
         converted = (coco.convert(names=[affiliation], to="name_short",
                                   not_found=None))
         if isinstance(converted, list):
             converted = converted[0]
         short_country_names[affiliation] = converted
         return converted
 
 
 def is_fossil_fuel_associated(words):
     """checks if the given string contains a fossil fuel industry keyword
 
     Args:
         words (str): the string to be tested for keywords
     """
     splitted = re.split('[; ]{1}', (str(words)).lower())
     for keyword in fossil_fuel_keywords:
         if keyword in splitted:
             return True
     return False
 
 
 # pre-processing
 
 # extract the list of roles
 roles_dict = {}
 roles_file = open("../data/dictionaries/role_keywords.txt", "r", encoding="utf-8")
 role_lines = roles_file.readlines()
 current_role = ""
 for line in role_lines:
     if "\n" in line:
         line = line[:line.index("\n")]
     if line.startswith("["):
         if not line.endswith("]"):
             raise KeyError("Format on line {} was incorrect".format(line))
         current_role = line[1:len(line) - 1]
     else:
         if line != "":
             roles_dict[line] = current_role
 
 # extract the list of fossil fuel industry keywords
 fossil_fuel_keywords = []
 ff_file = open("../data/dictionaries/fossil_fuel_industry_keywords.txt", "r",
                encoding="utf-8")
 ff_lines = ff_file.readlines()
 for line in ff_lines:
     if "\n" in line:
         line = line[:line.index("\n")]
     if line != "":
         fossil_fuel_keywords.append(line.lower())
 
 # begin with the real processing
 complete_data = pd.DataFrame(columns={
             "meeting",
             "name",
             "gender",
             "has_title",
             "affiliation",
             "affiliation_category",
             "role",
             "fossil_fuel_industry",
             "description"})
 
 metadata = pd.read_csv("../data/meetings_metadata.csv")
 
 for label in metadata["label"]:
     datafile_name = "../results/participants-csv/participants_" + label + ".csv"
     if label in AnaFac.MeetingAnalyzerFactory.french_meetings:
         datafile_name = "../results/participants-csv/participants_" + label + "-en.csv"
         if not os.path.isfile(datafile_name):
             os.system("python extract_participants.py " + label)
             os.system("python translate_list_fr_en.py " + label)
     # generate the data if not yet available
     if not os.path.isfile(datafile_name):
         os.system("python extract_participants.py " + label)
 
     # open the data from this cop
     cop_data = pd.read_csv(datafile_name, encoding="utf-8-sig")
 
     # add its data to the complete dataframe
     cop_data["meeting"] = label
     # determine gender
     cop_data.loc[cop_data.name.apply(is_male), "gender"] = "m"
     cop_data.loc[cop_data.name.apply(is_female), "gender"] = "f"
     # determine title (if any)
     cop_data.loc[cop_data.name.apply(has_title), "has_title"] = 1
     cop_data.loc[cop_data.name.apply(has_no_title), "has_title"] = 0
 
     # define the role
     cop_data["role"] = cop_data["description"].apply(get_role)
 
     # define the association to fossil fuel industry
     cop_data["fossil_fuel_industry"] = 0
     cop_data.loc[cop_data.description.apply(is_fossil_fuel_associated), "fossil_fuel_industry"] = 1
     cop_data.loc[cop_data.affiliation.apply(is_fossil_fuel_associated), "fossil_fuel_industry"] = 1
 
     # clear up the name
     cop_data["name"] = cop_data["name"].apply(clear_name)
 
     # unify the country names
     cop_data.loc[cop_data.affiliation_category.apply(lambda p: p == "parties"), "affiliation"] = cop_data.loc[cop_data.affiliation_category.apply(lambda p: p == "parties"), "affiliation"].apply(simplify_country_name)
 
     print(label)
     print(cop_data[:5])
     complete_data = complete_data.append(cop_data, ignore_index=True)
 
 # only for a short time
 grouped_by_role = complete_data.groupby("role")
 for role, rest in grouped_by_role:
     print(f"{role}: {len(rest)} participants found")
 
 print(f"Country names map of length {len(short_country_names)}")
 print(short_country_names)
 short_country_names_cleaned = {k: v for (k, v) in short_country_names.items() if k != v}
 country_set = set(short_country_names_cleaned.values())
 print(f"Set of length {len(country_set)}")
 print(country_set)
 f = open("../data/dictionaries/valid_countries.txt", "w")
 for country in country_set:
     if country != "Vatican":
         f.write(str(country) + "\n")
 f.write("european union")
 f.close()
 
 # generate the output file
 complete_data.to_csv("../results/complete_dataset.csv", encoding="utf-8-sig",
                      index=False)
diff --git a/code/scripts/generate_plots.py b/code/scripts/generate_plots.py
index a79c869..00111d7 100644
--- a/code/scripts/generate_plots.py
+++ b/code/scripts/generate_plots.py
@@ -1,32 +1,35 @@
+""" Generates the plots for which the exact specification is given in the subdirectory 'plots'
+"""
+
 import os
 
 import matplotlib.pyplot as plt
 import pandas as pd
 
 import plots.plot_categories as plot_categories
 import plots.plot_fossil_fuel_industry as plot_fossil_fuel_industry
 import plots.plot_government as plot_government
 import plots.plot_experience as plot_experience
 import plots.plot_missing_participants as plot_missing_participants
 import plots.plot_gender_rate as plot_gender_rate
 import plots.plot_delegation_sizes as plot_delegation_sizes
 import plots.plot_overall_experience_distr as plot_overall_experience_distr
 import plots.plot_delegation_exp as plot_delegation_exp
 import plots.plot_intervention_distr as plot_intervention_distr
 import plots.plot_detailed_experience as plot_detailed_experience
 import plots.plot_participant_graph as plot_participant_graph
 import plots.plot_RMSE_results as plot_RMSE_results
 
 plot_overall_experience_distr.plot("../results/complete_dataset_experience-def.csv")
 plot_RMSE_results.plot({"Baseline Zero": 9.54, "Baseline Average": 5.02, "Ridge Regression": 5.01, "Two-Step Model": 4.94})
 plot_participant_graph.plot("../results/experience_dict_def.txt")
 plot_intervention_distr.plot("../data/data_regression/dataset_interventions.csv")
 plot_fossil_fuel_industry.plot("../results/complete_dataset.csv")
 plot_gender_rate.plot("../results/complete_dataset.csv")
 plot_detailed_experience.plot("../results/complete_dataset_experience-def.csv")
 plot_experience.plot("../results/complete_dataset_experience-def.csv")
 plot_government.plot("../results/complete_dataset.csv")
 plot_categories.plot("../results/participants-csv/participants_")
 plot_missing_participants.plot("../results/participants-csv/participants_cop")
 plot_delegation_exp.plot("../results/complete_dataset_experience-def.csv")
 plot_delegation_sizes.plot("../results/complete_dataset.csv")
diff --git a/code/scripts/process_participant_experience.py b/code/scripts/process_participant_experience.py
index c655f0f..395e362 100644
--- a/code/scripts/process_participant_experience.py
+++ b/code/scripts/process_participant_experience.py
@@ -1,28 +1,30 @@
-import json
+"""Processes some information about the experience, e.g., the top ten most experienced participants.
+"""
 
+import json
 
 def contains_duplicate_meetings(meeting_list):
     meetings = set()
     label_list = [x[0] for x in meeting_list]
     for label in label_list:
         if label in meetings:
             return True
         meetings.add(label)
     return False
 
 
 f = open("../results/experience_dict_def.txt", "r", encoding="utf-8")
 text = f.read()
 names = json.loads(text)
 
 res = [(key, val) for key, val in sorted(names.items(), key = lambda ele: len(ele[1]), reverse=True) if not contains_duplicate_meetings(val)]
 print("The top ten most experienced participants:")
 for i in range(10):
     name, list = res[i]
     cop = len([x for x in list if x[0].startswith("cop")])
     sb = len([x for x in list if x[0].startswith("sb")])
     aff = list[len(list) - 1][2]
     print(f"{name}: {aff} ({cop} COP, {sb} SB)")
 
 over_27_meetings = [(k, v) for k, v in res if len(v) >= 27]
 print(f"{len(over_27_meetings)} participants have participated to at least half of all the meetings")
\ No newline at end of file
diff --git a/code/scripts/sort_final_dataset.py b/code/scripts/sort_final_dataset.py
index 32c3403..df0ae76 100644
--- a/code/scripts/sort_final_dataset.py
+++ b/code/scripts/sort_final_dataset.py
@@ -1,13 +1,16 @@
+""" Takes the complete dataset and sorts the columns to a specified order.
+"""
+
 import pandas as pd 
 
 data = pd.read_csv("../results/complete_dataset_experience-def.csv", encoding="utf-8-sig")
 
 data = data[["meeting", "affiliation_category", "affiliation", "name", "description",
              "gender", "has_title", "role", "fossil_fuel_industry",
              "experience cop", "experience sb", "experience party", "experience not_party",
              "experience possible error"]]
 
 
 data.to_csv("participants.csv",
             encoding="utf-8-sig",
             index=False)
diff --git a/code/scripts/translate_list_fr_en.py b/code/scripts/translate_list_fr_en.py
index e6c9635..bb6ba2b 100644
--- a/code/scripts/translate_list_fr_en.py
+++ b/code/scripts/translate_list_fr_en.py
@@ -1,39 +1,40 @@
+""" Translates affiliations and affiliation categories of
+    a meeting csv from French to English.
+"""
+
 import sys
 import pandas as pd
 
 
 def translate_affiliation_and_category(row):
     new_aff = country_translations.get(row["affiliation"], row["affiliation"])
 
     return pd.Series([row["name"], new_aff, row["affiliation_category"], row["description"]],
                      index=["name", "affiliation", "affiliation_category", "description"])
 
-""" translates affiliations and affiliation categories of
-    a cop csv from french to english
-"""
 args = sys.argv
 if len(args) != 2:
     sys.exit("Please provide one argument that contains the path \
 of the csv participant list to translate")
 
 filename = "../results/participants-csv/participants_" + args[1] + ".csv"
 
 # get the translations for the countries
 country_translations = dict()
 country_translations_df = pd.read_csv("../data/dictionaries/countries_french.csv")
 for index, row in country_translations_df.iterrows():
     country_clean = row["fr"].lower()
     country_clean = country_clean.replace("le ", "").replace("la ", "").replace("les ", "").replace("l'", "")
     country_clean = country_clean.replace("é", "e").replace("è", "e").replace("ê", "e").replace("ï", "i").replace("ô", "o")
     if "(" in country_clean:
         country_clean = country_clean[:country_clean.index("(")-1]
     country_translations[country_clean] = row["en"].lower()
 
 print(country_translations)
 
 cop_data = pd.read_csv(filename, encoding="utf-8-sig")
 cop_data = cop_data.apply(translate_affiliation_and_category, axis=1)
 
 filename = filename[:filename.index(".csv")]
 filename += "-en.csv"
 cop_data.to_csv(filename, encoding="utf-8-sig", index=False)