Page MenuHomec4science

01_oacct_countries.md
No OneTemporary

File Metadata

Created
Mon, Jul 29, 04:18

01_oacct_countries.md

# Projet Open Access Compliance Check Tool (OACCT)
Projet P5 de la bibliothèque de l'EPFL en collaboration avec les bibliothèques des Universités de Genève, Lausanne et Berne : https://www.swissuniversities.ch/themen/digitalisierung/p-5-wissenschaftliche-information/projekte/swiss-mooc-service-1-1-1-1
Ce notebook permet d'extraire les données choisis parmis les sources obtenues par API et les traiter pour les rendre exploitables dans l'application OACCT.
Auteur : **Pablo Iriarte**, Université de Genève (pablo.iriarte@unige.ch)
Date de dernière mise à jour : 16.07.2021
```python
import pandas as pd
import csv
import json
import numpy as np
```
## Table Countries
```python
# La table a été corrigée pour ajouter la valeur manquante à la fin :
# International Agency International Agency OI INT 999
country = pd.read_csv('iso_3166.txt', encoding='utf-8', header=0, sep='\t', na_filter=False)
country
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>English short name</th>
<th>French short name</th>
<th>Alpha-2 code</th>
<th>Alpha-3 code</th>
<th>Numeric</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Afghanistan</td>
<td>Afghanistan (l')</td>
<td>AF</td>
<td>AFG</td>
<td>4</td>
</tr>
<tr>
<td>1</td>
<td>Albania</td>
<td>Albanie (l')</td>
<td>AL</td>
<td>ALB</td>
<td>8</td>
</tr>
<tr>
<td>2</td>
<td>Algeria</td>
<td>Algérie (l')</td>
<td>DZ</td>
<td>DZA</td>
<td>12</td>
</tr>
<tr>
<td>3</td>
<td>American Samoa</td>
<td>Samoa américaines (les)</td>
<td>AS</td>
<td>ASM</td>
<td>16</td>
</tr>
<tr>
<td>4</td>
<td>Andorra</td>
<td>Andorre (l')</td>
<td>AD</td>
<td>AND</td>
<td>20</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>245</td>
<td>Yemen</td>
<td>Yémen (le)</td>
<td>YE</td>
<td>YEM</td>
<td>887</td>
</tr>
<tr>
<td>246</td>
<td>Zambia</td>
<td>Zambie (la)</td>
<td>ZM</td>
<td>ZMB</td>
<td>894</td>
</tr>
<tr>
<td>247</td>
<td>Zimbabwe</td>
<td>Zimbabwe (le)</td>
<td>ZW</td>
<td>ZWE</td>
<td>716</td>
</tr>
<tr>
<td>248</td>
<td>Åland Islands</td>
<td>Åland(les Îles)</td>
<td>AX</td>
<td>ALA</td>
<td>248</td>
</tr>
<tr>
<td>249</td>
<td>International Agency</td>
<td>International Agency</td>
<td>OI</td>
<td>INT</td>
<td>999</td>
</tr>
</tbody>
</table>
<p>250 rows × 5 columns</p>
</div>
```python
country.loc[country['Alpha-2 code'].isnull()]
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>English short name</th>
<th>French short name</th>
<th>Alpha-2 code</th>
<th>Alpha-3 code</th>
<th>Numeric</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
```python
# convertir l'index en id
country = country.reset_index()
country
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>index</th>
<th>English short name</th>
<th>French short name</th>
<th>Alpha-2 code</th>
<th>Alpha-3 code</th>
<th>Numeric</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>Afghanistan</td>
<td>Afghanistan (l')</td>
<td>AF</td>
<td>AFG</td>
<td>4</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>Albania</td>
<td>Albanie (l')</td>
<td>AL</td>
<td>ALB</td>
<td>8</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>Algeria</td>
<td>Algérie (l')</td>
<td>DZ</td>
<td>DZA</td>
<td>12</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
<td>American Samoa</td>
<td>Samoa américaines (les)</td>
<td>AS</td>
<td>ASM</td>
<td>16</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>Andorra</td>
<td>Andorre (l')</td>
<td>AD</td>
<td>AND</td>
<td>20</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>245</td>
<td>245</td>
<td>Yemen</td>
<td>Yémen (le)</td>
<td>YE</td>
<td>YEM</td>
<td>887</td>
</tr>
<tr>
<td>246</td>
<td>246</td>
<td>Zambia</td>
<td>Zambie (la)</td>
<td>ZM</td>
<td>ZMB</td>
<td>894</td>
</tr>
<tr>
<td>247</td>
<td>247</td>
<td>Zimbabwe</td>
<td>Zimbabwe (le)</td>
<td>ZW</td>
<td>ZWE</td>
<td>716</td>
</tr>
<tr>
<td>248</td>
<td>248</td>
<td>Åland Islands</td>
<td>Åland(les Îles)</td>
<td>AX</td>
<td>ALA</td>
<td>248</td>
</tr>
<tr>
<td>249</td>
<td>249</td>
<td>International Agency</td>
<td>International Agency</td>
<td>OI</td>
<td>INT</td>
<td>999</td>
</tr>
</tbody>
</table>
<p>250 rows × 6 columns</p>
</div>
```python
country['id'] = country['index'] + 1
del country['index']
del country['French short name']
del country['Alpha-3 code']
del country['Numeric']
country
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>English short name</th>
<th>Alpha-2 code</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Afghanistan</td>
<td>AF</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>Albania</td>
<td>AL</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>Algeria</td>
<td>DZ</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>American Samoa</td>
<td>AS</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>Andorra</td>
<td>AD</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>245</td>
<td>Yemen</td>
<td>YE</td>
<td>246</td>
</tr>
<tr>
<td>246</td>
<td>Zambia</td>
<td>ZM</td>
<td>247</td>
</tr>
<tr>
<td>247</td>
<td>Zimbabwe</td>
<td>ZW</td>
<td>248</td>
</tr>
<tr>
<td>248</td>
<td>Åland Islands</td>
<td>AX</td>
<td>249</td>
</tr>
<tr>
<td>249</td>
<td>International Agency</td>
<td>OI</td>
<td>250</td>
</tr>
</tbody>
</table>
<p>250 rows × 3 columns</p>
</div>
```python
# renommer les colonnes
country = country.rename(columns={'Alpha-2 code' : 'iso_code', 'English short name' : 'name'})
```
```python
# ajout de la valeur UNKNOWN
country = country.append({'id' : 999999, 'iso_code' : '__', 'name' : 'UNKNOWN'}, ignore_index=True)
```
```python
country
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name</th>
<th>iso_code</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Afghanistan</td>
<td>AF</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>Albania</td>
<td>AL</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>Algeria</td>
<td>DZ</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>American Samoa</td>
<td>AS</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>Andorra</td>
<td>AD</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>246</td>
<td>Zambia</td>
<td>ZM</td>
<td>247</td>
</tr>
<tr>
<td>247</td>
<td>Zimbabwe</td>
<td>ZW</td>
<td>248</td>
</tr>
<tr>
<td>248</td>
<td>Åland Islands</td>
<td>AX</td>
<td>249</td>
</tr>
<tr>
<td>249</td>
<td>International Agency</td>
<td>OI</td>
<td>250</td>
</tr>
<tr>
<td>250</td>
<td>UNKNOWN</td>
<td>__</td>
<td>999999</td>
</tr>
</tbody>
</table>
<p>251 rows × 3 columns</p>
</div>
```python
# esport JSON
result = country.to_json(orient='records', force_ascii=False)
parsed = json.loads(result)
with open('sample/country.json', 'w', encoding='utf-8') as file:
json.dump(parsed, file, indent=2, ensure_ascii=False)
```
```python
# export csv
country.to_csv('sample/country.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export csv
country.to_csv('country.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export excel
country.to_excel('sample/country.xlsx', index=False)
```

Event Timeline