Page MenuHomec4science

07_oacct_sherpa_publishers.md
No OneTemporary

File Metadata

Created
Thu, Nov 7, 06:27

07_oacct_sherpa_publishers.md

# Projet Open Access Compliance Check Tool (OACCT)
Projet P5 de la bibliothèque de l'EPFL en collaboration avec les bibliothèques des Universités de Genève, Lausanne et Berne : https://www.swissuniversities.ch/themen/digitalisierung/p-5-wissenschaftliche-information/projekte/swiss-mooc-service-1-1-1-1
Ce notebook permet d'extraire les données choisis parmis les sources obtenues par API et les traiter pour les rendre exploitables dans l'application OACCT.
Auteur : **Pablo Iriarte**, Université de Genève (pablo.iriarte@unige.ch)
Date de dernière mise à jour : 16.07.2021
## Table Journals Publishers : ajout des informations de Sherpa
```python
import pandas as pd
import csv
import json
import numpy as np
```
```python
publishers_issn = pd.read_csv('sample/publishers_brut.tsv', encoding='utf-8', header=0, sep='\t')
publishers_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher_id</th>
<th>name</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia_La_Commerciale</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
</tr>
<tr>
<td>377</td>
<td>Red.:_Prof._Dr._F._Cavalli,_Istituto_oncologic...</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta_Medica</td>
<td>Excerpta Medica</td>
<td>379</td>
</tr>
<tr>
<td>379</td>
<td>Generative_Grammar_Group_of_the_Department_of_...</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
</tr>
<tr>
<td>380</td>
<td>999999</td>
<td>UNKNOWN</td>
<td>999999</td>
</tr>
</tbody>
</table>
<p>381 rows × 3 columns</p>
</div>
```python
# import ids
publisher_ids = pd.read_csv('sample/journals_publishers_ids.tsv', encoding='utf-8', header=0, sep='\t')
publisher_ids
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>publisher</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>997</td>
<td>47</td>
</tr>
<tr>
<td>941</td>
<td>998</td>
<td>75</td>
</tr>
<tr>
<td>942</td>
<td>999</td>
<td>8</td>
</tr>
<tr>
<td>943</td>
<td>1000</td>
<td>119</td>
</tr>
<tr>
<td>944</td>
<td>1001</td>
<td>217</td>
</tr>
</tbody>
</table>
<p>945 rows × 2 columns</p>
</div>
```python
# renommage id
publisher_ids = publisher_ids.rename(columns = {'id': 'journal'})
publisher_ids = publisher_ids.rename(columns = {'publisher': 'id'})
```
```python
# dédoublonage par publisher id
publisher_ids_dedup = publisher_ids.drop_duplicates(subset='id')
publisher_ids_dedup
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>929</td>
<td>987</td>
<td>376</td>
</tr>
<tr>
<td>930</td>
<td>987</td>
<td>377</td>
</tr>
<tr>
<td>932</td>
<td>989</td>
<td>378</td>
</tr>
<tr>
<td>934</td>
<td>991</td>
<td>379</td>
</tr>
<tr>
<td>937</td>
<td>994</td>
<td>380</td>
</tr>
</tbody>
</table>
<p>380 rows × 2 columns</p>
</div>
```python
# merge avec journals
publisher = pd.merge(publishers_issn, publisher_ids_dedup, on='id', how='left')
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher_id</th>
<th>name</th>
<th>id</th>
<th>journal</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
<td>1.0</td>
</tr>
<tr>
<td>1</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
<td>2</td>
<td>2.0</td>
</tr>
<tr>
<td>2</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
<td>3</td>
<td>3.0</td>
</tr>
<tr>
<td>3</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
<td>4.0</td>
</tr>
<tr>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
<td>4.0</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia_La_Commerciale</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
<td>987.0</td>
</tr>
<tr>
<td>377</td>
<td>Red.:_Prof._Dr._F._Cavalli,_Istituto_oncologic...</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
<td>989.0</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta_Medica</td>
<td>Excerpta Medica</td>
<td>379</td>
<td>991.0</td>
</tr>
<tr>
<td>379</td>
<td>Generative_Grammar_Group_of_the_Department_of_...</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
<td>994.0</td>
</tr>
<tr>
<td>380</td>
<td>999999</td>
<td>UNKNOWN</td>
<td>999999</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>381 rows × 4 columns</p>
</div>
```python
# ajout des valeurs de sherpa
publisher_sherpa = pd.read_csv('sample/publisher_sherpa.tsv', encoding='utf-8', header=0, sep='\t')
publisher_sherpa
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id</th>
<th>name</th>
<th>country</th>
<th>type</th>
<th>url</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>532</td>
<td>45</td>
<td>John Wiley and Sons</td>
<td>gb</td>
<td>former_publisher</td>
<td>http://www.wiley.com/</td>
</tr>
<tr>
<td>1</td>
<td>498</td>
<td>4</td>
<td>American Chemical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://pubs.acs.org/</td>
</tr>
<tr>
<td>2</td>
<td>789</td>
<td>126</td>
<td>Acoustical Society of America</td>
<td>us</td>
<td>society_publisher</td>
<td>http://acousticalsociety.org/</td>
</tr>
<tr>
<td>3</td>
<td>166</td>
<td>3291</td>
<td>Springer</td>
<td>gb</td>
<td>commercial_publisher</td>
<td>https://www.springernature.com/gp/products/jou...</td>
</tr>
<tr>
<td>4</td>
<td>807</td>
<td>3291</td>
<td>Springer</td>
<td>gb</td>
<td>commercial_publisher</td>
<td>https://www.springernature.com/gp/products/jou...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>803</td>
<td>870</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>804</td>
<td>41</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>805</td>
<td>80</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>806</td>
<td>533</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>807</td>
<td>608</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
</tbody>
</table>
<p>808 rows × 6 columns</p>
</div>
```python
# renommage ids
publisher_sherpa = publisher_sherpa.rename(columns = {'publisher_id': 'publisher_id_sherpa', 'url': 'website_sherpa', 'country': 'iso_code'})
```
```python
# merge avec ids journals
publisher = pd.merge(publisher, publisher_sherpa, on='journal', how='left')
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher_id</th>
<th>name_x</th>
<th>id</th>
<th>journal</th>
<th>publisher_id_sherpa</th>
<th>name_y</th>
<th>iso_code</th>
<th>type</th>
<th>website_sherpa</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
<td>2</td>
<td>2.0</td>
<td>10.0</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>2</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
<td>3</td>
<td>3.0</td>
<td>112.0</td>
<td>Public Library of Science</td>
<td>us</td>
<td>commercial_publisher</td>
<td>http://www.plos.org/</td>
</tr>
<tr>
<td>3</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
<td>4.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
<td>4.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia_La_Commerciale</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
<td>987.0</td>
<td>3291.0</td>
<td>Springer</td>
<td>gb</td>
<td>commercial_publisher</td>
<td>https://www.springernature.com/gp/products/jou...</td>
</tr>
<tr>
<td>377</td>
<td>Red.:_Prof._Dr._F._Cavalli,_Istituto_oncologic...</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
<td>989.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta_Medica</td>
<td>Excerpta Medica</td>
<td>379</td>
<td>991.0</td>
<td>30.0</td>
<td>Elsevier</td>
<td>us</td>
<td>commercial_publisher</td>
<td>http://www.elsevier.com/</td>
</tr>
<tr>
<td>379</td>
<td>Generative_Grammar_Group_of_the_Department_of_...</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
<td>994.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>380</td>
<td>999999</td>
<td>UNKNOWN</td>
<td>999999</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>381 rows × 9 columns</p>
</div>
```python
# renommage names
publisher = publisher.rename(columns = {'name_x': 'name_issn', 'name_y': 'name_sherpa'})
```
```python
# ajout des informations à partir des revues
publisher_journals = pd.read_csv('sample/journals_publishers_brut.tsv', encoding='utf-8', header=0, sep='\t', usecols=['id', 'url'])
publisher_journals
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>url</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>http://prl.aps.org/</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>http://www.plosone.org/</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>NaN</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>http://ojps.aip.org/prbo/</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>NaN</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>http://www.jpedsurg.org</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>http://www.springerlink.com/content/100451</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>NaN</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>https://www.physiology.org/journal/jappl</td>
</tr>
</tbody>
</table>
<p>911 rows × 2 columns</p>
</div>
```python
# renommage id
publisher_journals = publisher_journals.rename(columns = {'id': 'journal'})
```
```python
# merge avec ids journals
publisher = pd.merge(publisher, publisher_journals, on='journal', how='left')
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher_id</th>
<th>name_issn</th>
<th>id</th>
<th>journal</th>
<th>publisher_id_sherpa</th>
<th>name_sherpa</th>
<th>iso_code</th>
<th>type</th>
<th>website_sherpa</th>
<th>url</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
<td>2</td>
<td>2.0</td>
<td>10.0</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
<td>http://prl.aps.org/</td>
</tr>
<tr>
<td>2</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
<td>3</td>
<td>3.0</td>
<td>112.0</td>
<td>Public Library of Science</td>
<td>us</td>
<td>commercial_publisher</td>
<td>http://www.plos.org/</td>
<td>http://www.plosone.org/</td>
</tr>
<tr>
<td>3</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
<td>4.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
<td>4.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia_La_Commerciale</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
<td>987.0</td>
<td>3291.0</td>
<td>Springer</td>
<td>gb</td>
<td>commercial_publisher</td>
<td>https://www.springernature.com/gp/products/jou...</td>
<td>NaN</td>
</tr>
<tr>
<td>377</td>
<td>Red.:_Prof._Dr._F._Cavalli,_Istituto_oncologic...</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
<td>989.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta_Medica</td>
<td>Excerpta Medica</td>
<td>379</td>
<td>991.0</td>
<td>30.0</td>
<td>Elsevier</td>
<td>us</td>
<td>commercial_publisher</td>
<td>http://www.elsevier.com/</td>
<td>NaN</td>
</tr>
<tr>
<td>379</td>
<td>Generative_Grammar_Group_of_the_Department_of_...</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
<td>994.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>380</td>
<td>999999</td>
<td>UNKNOWN</td>
<td>999999</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>381 rows × 10 columns</p>
</div>
```python
# renommage names
del publisher['publisher_id']
del publisher['publisher_id_sherpa']
del publisher['type']
publisher = publisher.rename(columns = {'url' : 'website_issn_journal'})
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name_issn</th>
<th>id</th>
<th>journal</th>
<th>name_sherpa</th>
<th>iso_code</th>
<th>website_sherpa</th>
<th>website_issn_journal</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>American Physical Society</td>
<td>2</td>
<td>2.0</td>
<td>American Physical Society</td>
<td>us</td>
<td>http://www.aps.org/</td>
<td>http://prl.aps.org/</td>
</tr>
<tr>
<td>2</td>
<td>Public Library of Science</td>
<td>3</td>
<td>3.0</td>
<td>Public Library of Science</td>
<td>us</td>
<td>http://www.plos.org/</td>
<td>http://www.plosone.org/</td>
</tr>
<tr>
<td>3</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
<td>4.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>4</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
<td>4.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
<td>987.0</td>
<td>Springer</td>
<td>gb</td>
<td>https://www.springernature.com/gp/products/jou...</td>
<td>NaN</td>
</tr>
<tr>
<td>377</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
<td>989.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta Medica</td>
<td>379</td>
<td>991.0</td>
<td>Elsevier</td>
<td>us</td>
<td>http://www.elsevier.com/</td>
<td>NaN</td>
</tr>
<tr>
<td>379</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
<td>994.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>380</td>
<td>UNKNOWN</td>
<td>999999</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>381 rows × 7 columns</p>
</div>
```python
# ajout des champs vides des vides et int
publisher['city'] = ''
publisher['state'] = ''
publisher['oa_policies'] = ''
publisher['starting_year'] = 0
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name_issn</th>
<th>id</th>
<th>journal</th>
<th>name_sherpa</th>
<th>iso_code</th>
<th>website_sherpa</th>
<th>website_issn_journal</th>
<th>city</th>
<th>state</th>
<th>oa_policies</th>
<th>starting_year</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>American Physical Society</td>
<td>2</td>
<td>2.0</td>
<td>American Physical Society</td>
<td>us</td>
<td>http://www.aps.org/</td>
<td>http://prl.aps.org/</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>Public Library of Science</td>
<td>3</td>
<td>3.0</td>
<td>Public Library of Science</td>
<td>us</td>
<td>http://www.plos.org/</td>
<td>http://www.plosone.org/</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
<td>4.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
<td>4.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
<td>987.0</td>
<td>Springer</td>
<td>gb</td>
<td>https://www.springernature.com/gp/products/jou...</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>377</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
<td>989.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta Medica</td>
<td>379</td>
<td>991.0</td>
<td>Elsevier</td>
<td>us</td>
<td>http://www.elsevier.com/</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>379</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
<td>994.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>380</td>
<td>UNKNOWN</td>
<td>999999</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
</tbody>
</table>
<p>381 rows × 11 columns</p>
</div>
```python
# iso_code en majuscules
publisher['iso_code'] = publisher['iso_code'].str.upper()
# ajout de la valeur pour unknown
publisher['iso_code'] = publisher['iso_code'].fillna('__')
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name_issn</th>
<th>id</th>
<th>journal</th>
<th>name_sherpa</th>
<th>iso_code</th>
<th>website_sherpa</th>
<th>website_issn_journal</th>
<th>city</th>
<th>state</th>
<th>oa_policies</th>
<th>starting_year</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
<td>1.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>American Physical Society</td>
<td>2</td>
<td>2.0</td>
<td>American Physical Society</td>
<td>US</td>
<td>http://www.aps.org/</td>
<td>http://prl.aps.org/</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>Public Library of Science</td>
<td>3</td>
<td>3.0</td>
<td>Public Library of Science</td>
<td>US</td>
<td>http://www.plos.org/</td>
<td>http://www.plosone.org/</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
<td>4.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
<td>4.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
<td>987.0</td>
<td>Springer</td>
<td>GB</td>
<td>https://www.springernature.com/gp/products/jou...</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>377</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
<td>989.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta Medica</td>
<td>379</td>
<td>991.0</td>
<td>Elsevier</td>
<td>US</td>
<td>http://www.elsevier.com/</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>379</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
<td>994.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>380</td>
<td>UNKNOWN</td>
<td>999999</td>
<td>NaN</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
</tbody>
</table>
<p>381 rows × 11 columns</p>
</div>
```python
# merge avec countries
country = pd.read_csv('sample/country.tsv', usecols=('iso_code', 'id'), encoding='utf-8', header=0, sep='\t')
country
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>iso_code</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>AF</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>AL</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>DZ</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>AS</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>AD</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>246</td>
<td>ZM</td>
<td>247</td>
</tr>
<tr>
<td>247</td>
<td>ZW</td>
<td>248</td>
</tr>
<tr>
<td>248</td>
<td>AX</td>
<td>249</td>
</tr>
<tr>
<td>249</td>
<td>OI</td>
<td>250</td>
</tr>
<tr>
<td>250</td>
<td>__</td>
<td>999999</td>
</tr>
</tbody>
</table>
<p>251 rows × 2 columns</p>
</div>
```python
country = country.rename(columns={'id': 'country'})
country
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>iso_code</th>
<th>country</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>AF</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>AL</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>DZ</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>AS</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>AD</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>246</td>
<td>ZM</td>
<td>247</td>
</tr>
<tr>
<td>247</td>
<td>ZW</td>
<td>248</td>
</tr>
<tr>
<td>248</td>
<td>AX</td>
<td>249</td>
</tr>
<tr>
<td>249</td>
<td>OI</td>
<td>250</td>
</tr>
<tr>
<td>250</td>
<td>__</td>
<td>999999</td>
</tr>
</tbody>
</table>
<p>251 rows × 2 columns</p>
</div>
```python
publisher = pd.merge(publisher, country, on='iso_code', how='left')
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name_issn</th>
<th>id</th>
<th>journal</th>
<th>name_sherpa</th>
<th>iso_code</th>
<th>website_sherpa</th>
<th>website_issn_journal</th>
<th>city</th>
<th>state</th>
<th>oa_policies</th>
<th>starting_year</th>
<th>country</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
<td>1.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
</tr>
<tr>
<td>1</td>
<td>American Physical Society</td>
<td>2</td>
<td>2.0</td>
<td>American Physical Society</td>
<td>US</td>
<td>http://www.aps.org/</td>
<td>http://prl.aps.org/</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>236</td>
</tr>
<tr>
<td>2</td>
<td>Public Library of Science</td>
<td>3</td>
<td>3.0</td>
<td>Public Library of Science</td>
<td>US</td>
<td>http://www.plos.org/</td>
<td>http://www.plosone.org/</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>236</td>
</tr>
<tr>
<td>3</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
<td>4.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
</tr>
<tr>
<td>4</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
<td>4.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
<td>987.0</td>
<td>Springer</td>
<td>GB</td>
<td>https://www.springernature.com/gp/products/jou...</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>234</td>
</tr>
<tr>
<td>377</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
<td>989.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta Medica</td>
<td>379</td>
<td>991.0</td>
<td>Elsevier</td>
<td>US</td>
<td>http://www.elsevier.com/</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>236</td>
</tr>
<tr>
<td>379</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
<td>994.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
</tr>
<tr>
<td>380</td>
<td>UNKNOWN</td>
<td>999999</td>
<td>NaN</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
</tr>
</tbody>
</table>
<p>381 rows × 12 columns</p>
</div>
```python
# garder sherpa puis issn.org
publisher.loc[publisher['name_sherpa'].notna(), 'name'] = publisher['name_sherpa']
publisher.loc[publisher['name_sherpa'].isna(), 'name'] = publisher['name_issn']
publisher.loc[publisher['website_sherpa'].notna(), 'website'] = publisher['website_sherpa']
publisher.loc[publisher['website_sherpa'].isna(), 'website'] = publisher['website_issn_journal']
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name_issn</th>
<th>id</th>
<th>journal</th>
<th>name_sherpa</th>
<th>iso_code</th>
<th>website_sherpa</th>
<th>website_issn_journal</th>
<th>city</th>
<th>state</th>
<th>oa_policies</th>
<th>starting_year</th>
<th>country</th>
<th>name</th>
<th>website</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
<td>1.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
<td>Revue Médicale Suisse</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>American Physical Society</td>
<td>2</td>
<td>2.0</td>
<td>American Physical Society</td>
<td>US</td>
<td>http://www.aps.org/</td>
<td>http://prl.aps.org/</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>236</td>
<td>American Physical Society</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>2</td>
<td>Public Library of Science</td>
<td>3</td>
<td>3.0</td>
<td>Public Library of Science</td>
<td>US</td>
<td>http://www.plos.org/</td>
<td>http://www.plosone.org/</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>236</td>
<td>Public Library of Science</td>
<td>http://www.plos.org/</td>
</tr>
<tr>
<td>3</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
<td>4.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>NaN</td>
</tr>
<tr>
<td>4</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
<td>4.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
<td>987.0</td>
<td>Springer</td>
<td>GB</td>
<td>https://www.springernature.com/gp/products/jou...</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>234</td>
<td>Springer</td>
<td>https://www.springernature.com/gp/products/jou...</td>
</tr>
<tr>
<td>377</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
<td>989.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>NaN</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta Medica</td>
<td>379</td>
<td>991.0</td>
<td>Elsevier</td>
<td>US</td>
<td>http://www.elsevier.com/</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>236</td>
<td>Elsevier</td>
<td>http://www.elsevier.com/</td>
</tr>
<tr>
<td>379</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
<td>994.0</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>NaN</td>
</tr>
<tr>
<td>380</td>
<td>UNKNOWN</td>
<td>999999</td>
<td>NaN</td>
<td>NaN</td>
<td>__</td>
<td>NaN</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>999999</td>
<td>UNKNOWN</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>381 rows × 14 columns</p>
</div>
```python
# garder les champs utiles pour lditeur
publisher_export = publisher[['id', 'name', 'country', 'city', 'state', 'starting_year', 'website', 'oa_policies']]
```
```python
# supprimer les doublons
publisher_export = publisher_export.drop_duplicates(subset='id')
publisher_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>country</th>
<th>city</th>
<th>state</th>
<th>starting_year</th>
<th>website</th>
<th>oa_policies</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue Médicale Suisse</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td>NaN</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>American Physical Society</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.aps.org/</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>Public Library of Science</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.plos.org/</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td>NaN</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td>NaN</td>
<td></td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>377</td>
<td>Springer</td>
<td>234</td>
<td></td>
<td></td>
<td>0</td>
<td>https://www.springernature.com/gp/products/jou...</td>
<td></td>
</tr>
<tr>
<td>377</td>
<td>378</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td>NaN</td>
<td></td>
</tr>
<tr>
<td>378</td>
<td>379</td>
<td>Elsevier</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.elsevier.com/</td>
<td></td>
</tr>
<tr>
<td>379</td>
<td>380</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td>NaN</td>
<td></td>
</tr>
<tr>
<td>380</td>
<td>999999</td>
<td>UNKNOWN</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td>NaN</td>
<td></td>
</tr>
</tbody>
</table>
<p>381 rows × 8 columns</p>
</div>
```python
# remplacement des vides et id à int
publisher_export['website'] = publisher_export['website'].fillna('')
publisher_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>country</th>
<th>city</th>
<th>state</th>
<th>starting_year</th>
<th>website</th>
<th>oa_policies</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue Médicale Suisse</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>American Physical Society</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.aps.org/</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>Public Library of Science</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.plos.org/</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>377</td>
<td>Springer</td>
<td>234</td>
<td></td>
<td></td>
<td>0</td>
<td>https://www.springernature.com/gp/products/jou...</td>
<td></td>
</tr>
<tr>
<td>377</td>
<td>378</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>378</td>
<td>379</td>
<td>Elsevier</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.elsevier.com/</td>
<td></td>
</tr>
<tr>
<td>379</td>
<td>380</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>380</td>
<td>999999</td>
<td>UNKNOWN</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>381 rows × 8 columns</p>
</div>
```python
# merge pour avoir les titres
publisher_ids_dedup = pd.merge(publisher_ids_dedup, publisher_export[['id', 'name']], on='id', how='left')
publisher_ids_dedup
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>id</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>Revue Médicale Suisse</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>2</td>
<td>American Physical Society</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>3</td>
<td>Public Library of Science</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>4</td>
<td>The Global Studies Institute de lUniversité d...</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>5</td>
<td>Universitat de València, Departamento de Teorí...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>375</td>
<td>987</td>
<td>376</td>
<td>Springer</td>
</tr>
<tr>
<td>376</td>
<td>987</td>
<td>377</td>
<td>Springer</td>
</tr>
<tr>
<td>377</td>
<td>989</td>
<td>378</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
</tr>
<tr>
<td>378</td>
<td>991</td>
<td>379</td>
<td>Elsevier</td>
</tr>
<tr>
<td>379</td>
<td>994</td>
<td>380</td>
<td>Generative Grammar Group of the Department of ...</td>
</tr>
</tbody>
</table>
<p>380 rows × 3 columns</p>
</div>
```python
# garder les ids avant le dédoublonage pour la correction du publisher_ids_dedup
publisher_ids_dedup = publisher_ids_dedup.rename(columns = {'id': 'publisher_av_dedup'})
publisher_ids_dedup
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_av_dedup</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>Revue Médicale Suisse</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>2</td>
<td>American Physical Society</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>3</td>
<td>Public Library of Science</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>4</td>
<td>The Global Studies Institute de lUniversité d...</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>5</td>
<td>Universitat de València, Departamento de Teorí...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>375</td>
<td>987</td>
<td>376</td>
<td>Springer</td>
</tr>
<tr>
<td>376</td>
<td>987</td>
<td>377</td>
<td>Springer</td>
</tr>
<tr>
<td>377</td>
<td>989</td>
<td>378</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
</tr>
<tr>
<td>378</td>
<td>991</td>
<td>379</td>
<td>Elsevier</td>
</tr>
<tr>
<td>379</td>
<td>994</td>
<td>380</td>
<td>Generative Grammar Group of the Department of ...</td>
</tr>
</tbody>
</table>
<p>380 rows × 3 columns</p>
</div>
```python
publisher_export_dedup = publisher_export.drop_duplicates(subset='name')
publisher_export_dedup
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>country</th>
<th>city</th>
<th>state</th>
<th>starting_year</th>
<th>website</th>
<th>oa_policies</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue Médicale Suisse</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>American Physical Society</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.aps.org/</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>Public Library of Science</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.plos.org/</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>371</td>
<td>372</td>
<td>[American Medical Association]</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td>http://archneur.jamanetwork.com/issues.aspx</td>
<td></td>
</tr>
<tr>
<td>374</td>
<td>375</td>
<td>Société botanique de Genève</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>377</td>
<td>378</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>379</td>
<td>380</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>380</td>
<td>999999</td>
<td>UNKNOWN</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>196 rows × 8 columns</p>
</div>
```python
del publisher_export_dedup['id']
# convertir l'index en id
publisher_export_dedup = publisher_export_dedup.reset_index()
# ajout de l'id avec l'index + 1
publisher_export_dedup['id'] = publisher_export_dedup['index'] + 1
del publisher_export_dedup['index']
publisher_export_dedup
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name</th>
<th>country</th>
<th>city</th>
<th>state</th>
<th>starting_year</th>
<th>website</th>
<th>oa_policies</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue Médicale Suisse</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>American Physical Society</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.aps.org/</td>
<td></td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>Public Library of Science</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.plos.org/</td>
<td></td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>191</td>
<td>[American Medical Association]</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td>http://archneur.jamanetwork.com/issues.aspx</td>
<td></td>
<td>372</td>
</tr>
<tr>
<td>192</td>
<td>Société botanique de Genève</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>375</td>
</tr>
<tr>
<td>193</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>378</td>
</tr>
<tr>
<td>194</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>380</td>
</tr>
<tr>
<td>195</td>
<td>UNKNOWN</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>381</td>
</tr>
</tbody>
</table>
<p>196 rows × 8 columns</p>
</div>
```python
del publisher_export_dedup['id']
# convertir l'index en id
publisher_export_dedup = publisher_export_dedup.reset_index()
# ajout de l'id avec l'index + 1
publisher_export_dedup['id'] = publisher_export_dedup['index'] + 1
del publisher_export_dedup['index']
publisher_export_dedup
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name</th>
<th>country</th>
<th>city</th>
<th>state</th>
<th>starting_year</th>
<th>website</th>
<th>oa_policies</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue Médicale Suisse</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>American Physical Society</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.aps.org/</td>
<td></td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>Public Library of Science</td>
<td>236</td>
<td></td>
<td></td>
<td>0</td>
<td>http://www.plos.org/</td>
<td></td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>191</td>
<td>[American Medical Association]</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td>http://archneur.jamanetwork.com/issues.aspx</td>
<td></td>
<td>192</td>
</tr>
<tr>
<td>192</td>
<td>Société botanique de Genève</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>193</td>
</tr>
<tr>
<td>193</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>194</td>
</tr>
<tr>
<td>194</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>195</td>
</tr>
<tr>
<td>195</td>
<td>UNKNOWN</td>
<td>999999</td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td>196</td>
</tr>
</tbody>
</table>
<p>196 rows × 8 columns</p>
</div>
```python
# merge avec les ids d'avant Sherpa
publisher_ids_dedup = pd.merge(publisher_ids_dedup, publisher_export_dedup[['id', 'name']], on='name', how='left')
publisher_ids_dedup = publisher_ids_dedup.rename(columns = {'id': 'publisher'})
publisher_ids_dedup = publisher_ids_dedup.rename(columns = {'journal': 'id'})
publisher_ids_dedup
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>publisher_av_dedup</th>
<th>name</th>
<th>publisher</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>2</td>
<td>American Physical Society</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>3</td>
<td>Public Library of Science</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>4</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>5</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>375</td>
<td>987</td>
<td>376</td>
<td>Springer</td>
<td>45</td>
</tr>
<tr>
<td>376</td>
<td>987</td>
<td>377</td>
<td>Springer</td>
<td>45</td>
</tr>
<tr>
<td>377</td>
<td>989</td>
<td>378</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>194</td>
</tr>
<tr>
<td>378</td>
<td>991</td>
<td>379</td>
<td>Elsevier</td>
<td>11</td>
</tr>
<tr>
<td>379</td>
<td>994</td>
<td>380</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>195</td>
</tr>
</tbody>
</table>
<p>380 rows × 4 columns</p>
</div>
```python
# concat valeurs avec même id
del publisher_ids_dedup['publisher_av_dedup']
del publisher_ids_dedup['name']
publisher_ids_dedup['publisher'] = publisher_ids_dedup['publisher'].astype(str)
publisher_ids_dedup_grouped = publisher_ids_dedup.groupby('id').agg({'publisher': lambda x: ', '.join(x)})
publisher_ids_dedup_grouped
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher</th>
</tr>
<tr>
<th>id</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>4</td>
<td>4, 5</td>
</tr>
<tr>
<td>5</td>
<td>2</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>986</td>
<td>193</td>
</tr>
<tr>
<td>987</td>
<td>45, 45</td>
</tr>
<tr>
<td>989</td>
<td>194</td>
</tr>
<tr>
<td>991</td>
<td>11</td>
</tr>
<tr>
<td>994</td>
<td>195</td>
</tr>
</tbody>
</table>
<p>366 rows × 1 columns</p>
</div>
```python
# modifs dans les journaux
journal = pd.read_csv('sample/journal_fin_sherpa.tsv', encoding='utf-8', header=0, sep='\t')
journal
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>oa_status</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td>NaN</td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>PLoS ONE</td>
<td>NaN</td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
<td>5</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td>NaN</td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>4, 5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>Physical review B: Condensed matter and materi...</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>1998</td>
<td>2015</td>
<td>http://journals.aps.org/prb/</td>
<td>236</td>
<td>124</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>Smart Materials and Structures</td>
<td>Smart mater. struct. (Print)</td>
<td>1992</td>
<td>9999</td>
<td>http://iopscience.iop.org/0964-1726</td>
<td>234</td>
<td>124</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>119</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>217</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>911 rows × 16 columns</p>
</div>
```python
# merge avec les journaux journal_fin_sherpa
journal = pd.merge(journal, publisher_ids_dedup_grouped, on='id', how='left')
journal
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>publisher_x</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>oa_status</th>
<th>publisher_y</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td>NaN</td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>PLoS ONE</td>
<td>NaN</td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
<td>5</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td>NaN</td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>4, 5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
<td>4, 5</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>Physical review B: Condensed matter and materi...</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>1998</td>
<td>2015</td>
<td>http://journals.aps.org/prb/</td>
<td>236</td>
<td>124</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>Smart Materials and Structures</td>
<td>Smart mater. struct. (Print)</td>
<td>1992</td>
<td>9999</td>
<td>http://iopscience.iop.org/0964-1726</td>
<td>234</td>
<td>124</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
<td>NaN</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
<td>NaN</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
<td>2</td>
<td>NaN</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>119</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
<td>NaN</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>217</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>911 rows × 17 columns</p>
</div>
```python
del journal['publisher_x']
journal = journal.rename(columns = {'publisher_y': 'publisher'})
journal
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>oa_status</th>
<th>publisher</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td>NaN</td>
<td>215</td>
<td>138</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>PLoS ONE</td>
<td>NaN</td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
<td>5</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td>NaN</td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
<td>4, 5</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>Physical review B: Condensed matter and materi...</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>1998</td>
<td>2015</td>
<td>http://journals.aps.org/prb/</td>
<td>236</td>
<td>124</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>Smart Materials and Structures</td>
<td>Smart mater. struct. (Print)</td>
<td>1992</td>
<td>9999</td>
<td>http://iopscience.iop.org/0964-1726</td>
<td>234</td>
<td>124</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
<td>NaN</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
<td>NaN</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
<td>2</td>
<td>NaN</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
<td>NaN</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>911 rows × 16 columns</p>
</div>
```python
# esport JSON publisher
result = journal.to_json(orient='records', force_ascii=False)
parsed = json.loads(result)
with open('sample/journal.json', 'w', encoding='utf-8') as file:
json.dump(parsed, file, indent=2, ensure_ascii=False)
```
```python
# export csv
journal.to_csv('sample/journal.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export excel
journal.to_excel('sample/journal.xlsx', index=False)
```
```python
# esport JSON publisher
result = publisher_export_dedup.to_json(orient='records', force_ascii=False)
parsed = json.loads(result)
with open('sample/publisher.json', 'w', encoding='utf-8') as file:
json.dump(parsed, file, indent=2, ensure_ascii=False)
```
```python
# export csv
publisher_export_dedup.to_csv('sample/publisher.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export excel
publisher_export_dedup.to_excel('sample/publisher.xlsx', index=False)
```
```python
```

Event Timeline