Page MenuHomec4science

04_oacct_publishers.md
No OneTemporary

File Metadata

Created
Mon, Jun 10, 21:27

04_oacct_publishers.md

# Projet Open Access Compliance Check Tool (OACCT)
Projet P5 de la bibliothèque de l'EPFL en collaboration avec les bibliothèques des Universités de Genève, Lausanne et Berne : https://www.swissuniversities.ch/themen/digitalisierung/p-5-wissenschaftliche-information/projekte/swiss-mooc-service-1-1-1-1
Ce notebook permet d'extraire les données choisis parmis les sources obtenues par API et les traiter pour les rendre exploitables dans l'application OACCT.
Auteur : **Pablo Iriarte**, Université de Genève (pablo.iriarte@unige.ch)
Date de dernière mise à jour : 16.07.2021
## Extraction des données des éditeurs
Sources :
1. Données de ISSN.org (JSON)
### Format des données source
* Noeud : "@graph"
* spatial & publisher :
* "@id": "resource/ISSN/0140-6736",
* "spatial": [
"http://id.loc.gov/vocabulary/countries/ne",
"https://www.iso.org/obp/ui/#iso:code:3166:NL"
],
Exemple avec plusieurs éditeurs dans le temps :
"publisher": [
"resource/ISSN/0140-6736#Publisher-Elsevier",
"resource/ISSN/0140-6736#Publisher-J._Onwhyn"
],
{
"@id": "resource/ISSN/0140-6736#LatestPublicationEvent",
"@type": "http://schema.org/PublicationEvent",
"publishedBy": "resource/ISSN/0140-6736#Publisher-Elsevier",
"location": "resource/ISSN/0140-6736#PublicationPlace-Amsterdam"
},
{
"@id": "resource/ISSN/0140-6736#Publisher-Elsevier",
"@type": "http://schema.org/Organization",
"name": "Elsevier"
},
Exemple avec un seul éditeur dans le temps :
"publisher": "resource/ISSN/0899-8418#Publisher-Wiley",
{
"@id": "resource/ISSN/0899-8418#EarliestPublicationEvent",
"@type": "http://schema.org/PublicationEvent",
"publishedBy": "resource/ISSN/0899-8418#Publisher-Wiley",
"temporal": "c1989-",
"location": [
"resource/ISSN/0899-8418#PublicationPlace-New_York",
"resource/ISSN/0899-8418#PublicationPlace-Chichester"
]
},
{
"@id": "resource/ISSN/0899-8418#Publisher-Wiley",
"@type": "http://schema.org/Organization",
"name": "Wiley"
},
Exemple avec une liste dditeurs finaux :
{
"@id": "resource/ISSN/2174-8454#LatestPublicationEvent",
"@type": "http://schema.org/PublicationEvent",
"publishedBy": [
"resource/ISSN/2174-8454#Publisher-The_Global_Studies_Institute_de_l’Université_de_Genève",
"resource/ISSN/2174-8454#Publisher-Universitat_de_València,_Departamento_de_Teoría_de_los_Lenguajes_y_Ciencias_de_la_Comunicación"
],
"location": "resource/ISSN/2174-8454#PublicationPlace-Valencia"
},
```python
import pandas as pd
import csv
import json
import numpy as np
import os
```
## Table Publishers
```python
# creation du DF
# 'country' supprimé pour l'ajouter aux journaux
# 'oa_status' supprimé pour le moment
col_names = ['id',
'name',
'publisher_id_issn',
]
publisher_issn = pd.DataFrame(columns = col_names)
publisher_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>publisher_id_issn</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
## Table Journals
```python
journal = pd.read_csv('sample/journals_brut.tsv', encoding='utf-8', header=0, sep='\t')
journal
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>issn</th>
<th>issnl</th>
<th>title</th>
<th>starting_year</th>
<th>end_year</th>
<th>url</th>
<th>name_short_iso_4</th>
<th>language</th>
<th>country</th>
<th>...</th>
<th>doaj_status</th>
<th>lockss_title</th>
<th>lockss</th>
<th>portico_status</th>
<th>portico</th>
<th>nlch_title</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>doublon_issnl</th>
<th>oa_status</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1660-9379</td>
<td>1660-9379</td>
<td>Revue médicale suisse</td>
<td>2005</td>
<td>9999</td>
<td>NaN</td>
<td>Rev. méd. suisse</td>
<td>138</td>
<td>215</td>
<td>...</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>0031-9007</td>
<td>0031-9007</td>
<td>Physical review letters (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>Phys. rev. lett. (Print)</td>
<td>124</td>
<td>236</td>
<td>...</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>1.0</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>1932-6203</td>
<td>1932-6203</td>
<td>PloS one</td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>NaN</td>
<td>124</td>
<td>236</td>
<td>...</td>
<td>1.0</td>
<td>PLoS One</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>4.035714</td>
<td>NaN</td>
<td>5</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>2174-8454</td>
<td>2174-8454</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td>NaN</td>
<td>EU-topías</td>
<td>124, 138, 402, 292</td>
<td>209</td>
<td>...</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>1098-0121</td>
<td>1098-0121</td>
<td>Physical review. B, Condensed matter and mater...</td>
<td>1998</td>
<td>2015</td>
<td>http://ojps.aip.org/prbo/</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>124</td>
<td>236</td>
<td>...</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>1.0</td>
<td>1</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>0964-1726</td>
<td>0964-1726</td>
<td>Smart materials and structures (Print)</td>
<td>1992</td>
<td>9999</td>
<td>NaN</td>
<td>Smart mater. struct. (Print)</td>
<td>124</td>
<td>234</td>
<td>...</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>0022-3468</td>
<td>0022-3468</td>
<td>Journal of pediatric surgery (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org</td>
<td>J. pediatr. surg. (Print)</td>
<td>124</td>
<td>236</td>
<td>...</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>1432-2064</td>
<td>0178-8051</td>
<td>Probability theory and related fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>124</td>
<td>83</td>
<td>...</td>
<td>0.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>preserved</td>
<td>1.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>0960-1481</td>
<td>0960-1481</td>
<td>Renewable energy</td>
<td>1991</td>
<td>9999</td>
<td>NaN</td>
<td>Renew. energy</td>
<td>124</td>
<td>234</td>
<td>...</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>0161-7567</td>
<td>0161-7567</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>124</td>
<td>236</td>
<td>...</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>911 rows × 23 columns</p>
</div>
## Table Journals Publishers
```python
# creation du DF
col_names = ['journal',
'publisher_id_issn'
]
journal_publisher = pd.DataFrame(columns = col_names)
journal_publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id_issn</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
```python
# extraction des informations à partir des données ISSN.org
for index, row in journal.iterrows():
journal_id = row['id']
journal_issn = row['issn']
if (((index/10) - int(index/10)) == 0) :
print(index)
# initialisation des variables à extraire
publisher_name = ''
publisher_country = ''
publisher_id = ''
publisher_id_first = ''
publisher_id_last = ''
# export en json
if os.path.exists('issn/data/' + journal_issn + '.json'):
with open('issn/data/' + journal_issn + '.json', 'r', encoding='utf-8') as f:
data = json.load(f)
for x in data['@graph']:
if ('@id' in x):
if (x['@id'] == 'resource/ISSN/' + journal_issn + '#LatestPublicationEvent'):
if ('publishedBy' in x):
publisher_id_last = x['publishedBy']
elif (x['@id'] == 'resource/ISSN/' + journal_issn + '#EarliestPublicationEvent'):
if ('publishedBy' in x):
publisher_id_first = x['publishedBy']
if (publisher_id_last != ''):
publisher_id = publisher_id_last
else :
publisher_id = publisher_id_first
if type(publisher_id) is list:
for pid in publisher_id:
if (pid != ''):
for x in data['@graph']:
if ('@id' in x):
if (x['@id'] == pid):
if ('name' in x):
publisher_name = x['name']
publisher_issn = publisher_issn.append({'publisher_id_issn' : pid, 'name' : publisher_name}, ignore_index=True)
journal_publisher = journal_publisher.append({'journal' : journal_id, 'publisher_id_issn' : pid}, ignore_index=True)
else :
if (publisher_id != ''):
for x in data['@graph']:
if ('@id' in x):
if (x['@id'] == publisher_id):
if ('name' in x):
publisher_name = x['name']
publisher_issn = publisher_issn.append({'publisher_id_issn' : publisher_id, 'name' : publisher_name}, ignore_index=True)
journal_publisher = journal_publisher.append({'journal' : journal_id, 'publisher_id_issn' : publisher_id}, ignore_index=True)
else :
print(row['issn'] + ' - pas trouvé')
```
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
560
570
580
590
600
610
620
630
640
650
660
670
680
690
700
710
720
730
740
750
760
770
780
790
800
810
820
830
840
850
860
870
880
890
900
910
```python
publisher_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>publisher_id_issn</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>NaN</td>
<td>Revue Médicale Suisse</td>
<td>resource/ISSN/1660-9379#Publisher-Revue_Médica...</td>
</tr>
<tr>
<td>1</td>
<td>NaN</td>
<td>American Physical Society</td>
<td>resource/ISSN/0031-9007#Publisher-American_Phy...</td>
</tr>
<tr>
<td>2</td>
<td>NaN</td>
<td>Public Library of Science</td>
<td>resource/ISSN/1932-6203#Publisher-Public_Libra...</td>
</tr>
<tr>
<td>3</td>
<td>NaN</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>resource/ISSN/2174-8454#Publisher-The_Global_S...</td>
</tr>
<tr>
<td>4</td>
<td>NaN</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>resource/ISSN/2174-8454#Publisher-Universitat_...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>NaN</td>
<td>IOP Publishing</td>
<td>resource/ISSN/0964-1726#Publisher-IOP_Publishing</td>
</tr>
<tr>
<td>941</td>
<td>NaN</td>
<td>Elsevier [etc.]</td>
<td>resource/ISSN/0022-3468#Publisher-Elsevier_[etc.]</td>
</tr>
<tr>
<td>942</td>
<td>NaN</td>
<td>Springer</td>
<td>resource/ISSN/1432-2064#Publisher-Springer</td>
</tr>
<tr>
<td>943</td>
<td>NaN</td>
<td>Pergamon</td>
<td>resource/ISSN/0960-1481#Publisher-Pergamon</td>
</tr>
<tr>
<td>944</td>
<td>NaN</td>
<td>American Physiological Society</td>
<td>resource/ISSN/0161-7567#Publisher-American_Phy...</td>
</tr>
</tbody>
</table>
<p>945 rows × 3 columns</p>
</div>
```python
# simlification des IDs
publisher_issn[['publisher_id_racine', 'publisher_id_fin']] = publisher_issn['publisher_id_issn'].str.split('#Publisher-', n=1, expand=True)
publisher_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>publisher_id_issn</th>
<th>publisher_id_racine</th>
<th>publisher_id_fin</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>NaN</td>
<td>Revue Médicale Suisse</td>
<td>resource/ISSN/1660-9379#Publisher-Revue_Médica...</td>
<td>resource/ISSN/1660-9379</td>
<td>Revue_Médicale_Suisse</td>
</tr>
<tr>
<td>1</td>
<td>NaN</td>
<td>American Physical Society</td>
<td>resource/ISSN/0031-9007#Publisher-American_Phy...</td>
<td>resource/ISSN/0031-9007</td>
<td>American_Physical_Society</td>
</tr>
<tr>
<td>2</td>
<td>NaN</td>
<td>Public Library of Science</td>
<td>resource/ISSN/1932-6203#Publisher-Public_Libra...</td>
<td>resource/ISSN/1932-6203</td>
<td>Public_Library_of_Science</td>
</tr>
<tr>
<td>3</td>
<td>NaN</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>resource/ISSN/2174-8454#Publisher-The_Global_S...</td>
<td>resource/ISSN/2174-8454</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
</tr>
<tr>
<td>4</td>
<td>NaN</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>resource/ISSN/2174-8454#Publisher-Universitat_...</td>
<td>resource/ISSN/2174-8454</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>NaN</td>
<td>IOP Publishing</td>
<td>resource/ISSN/0964-1726#Publisher-IOP_Publishing</td>
<td>resource/ISSN/0964-1726</td>
<td>IOP_Publishing</td>
</tr>
<tr>
<td>941</td>
<td>NaN</td>
<td>Elsevier [etc.]</td>
<td>resource/ISSN/0022-3468#Publisher-Elsevier_[etc.]</td>
<td>resource/ISSN/0022-3468</td>
<td>Elsevier_[etc.]</td>
</tr>
<tr>
<td>942</td>
<td>NaN</td>
<td>Springer</td>
<td>resource/ISSN/1432-2064#Publisher-Springer</td>
<td>resource/ISSN/1432-2064</td>
<td>Springer</td>
</tr>
<tr>
<td>943</td>
<td>NaN</td>
<td>Pergamon</td>
<td>resource/ISSN/0960-1481#Publisher-Pergamon</td>
<td>resource/ISSN/0960-1481</td>
<td>Pergamon</td>
</tr>
<tr>
<td>944</td>
<td>NaN</td>
<td>American Physiological Society</td>
<td>resource/ISSN/0161-7567#Publisher-American_Phy...</td>
<td>resource/ISSN/0161-7567</td>
<td>American_Physiological_Society</td>
</tr>
</tbody>
</table>
<p>945 rows × 5 columns</p>
</div>
```python
# simplifications
del publisher_issn['publisher_id_issn']
del publisher_issn['publisher_id_racine']
del publisher_issn['id']
publisher_issn = publisher_issn.rename(columns={'publisher_id_fin': 'publisher_id_issn'})
publisher_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name</th>
<th>publisher_id_issn</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue Médicale Suisse</td>
<td>Revue_Médicale_Suisse</td>
</tr>
<tr>
<td>1</td>
<td>American Physical Society</td>
<td>American_Physical_Society</td>
</tr>
<tr>
<td>2</td>
<td>Public Library of Science</td>
<td>Public_Library_of_Science</td>
</tr>
<tr>
<td>3</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
</tr>
<tr>
<td>4</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>IOP Publishing</td>
<td>IOP_Publishing</td>
</tr>
<tr>
<td>941</td>
<td>Elsevier [etc.]</td>
<td>Elsevier_[etc.]</td>
</tr>
<tr>
<td>942</td>
<td>Springer</td>
<td>Springer</td>
</tr>
<tr>
<td>943</td>
<td>Pergamon</td>
<td>Pergamon</td>
</tr>
<tr>
<td>944</td>
<td>American Physiological Society</td>
<td>American_Physiological_Society</td>
</tr>
</tbody>
</table>
<p>945 rows × 2 columns</p>
</div>
```python
# supprimer les crochets et supprimer les doublons
# publisher['publisher_id'] = publisher['publisher_id'].str.replace('[', '')
# publisher['publisher_id'] = publisher['publisher_id'].str.replace(']', '')
# publisher['name'] = publisher['name'].str.replace('[', '')
# publisher['name'] = publisher['name'].str.replace(']', '')
publisher_issn = publisher_issn.drop_duplicates(subset=['publisher_id_issn'])
publisher_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name</th>
<th>publisher_id_issn</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue Médicale Suisse</td>
<td>Revue_Médicale_Suisse</td>
</tr>
<tr>
<td>1</td>
<td>American Physical Society</td>
<td>American_Physical_Society</td>
</tr>
<tr>
<td>2</td>
<td>Public Library of Science</td>
<td>Public_Library_of_Science</td>
</tr>
<tr>
<td>3</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
</tr>
<tr>
<td>4</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>929</td>
<td>Fisher</td>
<td>Fisher</td>
</tr>
<tr>
<td>930</td>
<td>Tipografia La Commerciale</td>
<td>Tipografia_La_Commerciale</td>
</tr>
<tr>
<td>932</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>Red.:_Prof._Dr._F._Cavalli,_Istituto_oncologic...</td>
</tr>
<tr>
<td>934</td>
<td>Excerpta Medica</td>
<td>Excerpta_Medica</td>
</tr>
<tr>
<td>937</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>Generative_Grammar_Group_of_the_Department_of_...</td>
</tr>
</tbody>
</table>
<p>380 rows × 2 columns</p>
</div>
```python
# test publishers sans nom
publisher_issn.loc[publisher_issn['name'] == '']
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name</th>
<th>publisher_id_issn</th>
</tr>
</thead>
<tbody>
<tr>
<td>241</td>
<td></td>
<td>None</td>
</tr>
</tbody>
</table>
</div>
```python
journal_publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id_issn</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>resource/ISSN/1660-9379#Publisher-Revue_Médica...</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>resource/ISSN/0031-9007#Publisher-American_Phy...</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>resource/ISSN/1932-6203#Publisher-Public_Libra...</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>resource/ISSN/2174-8454#Publisher-The_Global_S...</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>resource/ISSN/2174-8454#Publisher-Universitat_...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>997</td>
<td>resource/ISSN/0964-1726#Publisher-IOP_Publishing</td>
</tr>
<tr>
<td>941</td>
<td>998</td>
<td>resource/ISSN/0022-3468#Publisher-Elsevier_[etc.]</td>
</tr>
<tr>
<td>942</td>
<td>999</td>
<td>resource/ISSN/1432-2064#Publisher-Springer</td>
</tr>
<tr>
<td>943</td>
<td>1000</td>
<td>resource/ISSN/0960-1481#Publisher-Pergamon</td>
</tr>
<tr>
<td>944</td>
<td>1001</td>
<td>resource/ISSN/0161-7567#Publisher-American_Phy...</td>
</tr>
</tbody>
</table>
<p>945 rows × 2 columns</p>
</div>
```python
# simlification des IDs
journal_publisher[['publisher_id_racine', 'publisher_id_fin']] = journal_publisher['publisher_id_issn'].str.split('#Publisher-', n=1, expand=True)
# simplifications
del journal_publisher['publisher_id_issn']
del journal_publisher['publisher_id_racine']
journal_publisher = journal_publisher.rename(columns={'publisher_id_fin': 'publisher_id_issn'})
journal_publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id_issn</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue_Médicale_Suisse</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>American_Physical_Society</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>Public_Library_of_Science</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>997</td>
<td>IOP_Publishing</td>
</tr>
<tr>
<td>941</td>
<td>998</td>
<td>Elsevier_[etc.]</td>
</tr>
<tr>
<td>942</td>
<td>999</td>
<td>Springer</td>
</tr>
<tr>
<td>943</td>
<td>1000</td>
<td>Pergamon</td>
</tr>
<tr>
<td>944</td>
<td>1001</td>
<td>American_Physiological_Society</td>
</tr>
</tbody>
</table>
<p>945 rows × 2 columns</p>
</div>
```python
# merge avec journals
journal_publisher = pd.merge(journal_publisher, publisher_issn, on='publisher_id_issn', how='left')
journal_publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id_issn</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>997</td>
<td>IOP_Publishing</td>
<td>IOP Publishing</td>
</tr>
<tr>
<td>941</td>
<td>998</td>
<td>Elsevier_[etc.]</td>
<td>Elsevier [etc.]</td>
</tr>
<tr>
<td>942</td>
<td>999</td>
<td>Springer</td>
<td>Springer</td>
</tr>
<tr>
<td>943</td>
<td>1000</td>
<td>Pergamon</td>
<td>Pergamon</td>
</tr>
<tr>
<td>944</td>
<td>1001</td>
<td>American_Physiological_Society</td>
<td>American Physiological Society</td>
</tr>
</tbody>
</table>
<p>945 rows × 3 columns</p>
</div>
```python
journal_publisher = journal_publisher.rename(columns={'publisher_id_issn': 'publisher_id'})
journal_publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>997</td>
<td>IOP_Publishing</td>
<td>IOP Publishing</td>
</tr>
<tr>
<td>941</td>
<td>998</td>
<td>Elsevier_[etc.]</td>
<td>Elsevier [etc.]</td>
</tr>
<tr>
<td>942</td>
<td>999</td>
<td>Springer</td>
<td>Springer</td>
</tr>
<tr>
<td>943</td>
<td>1000</td>
<td>Pergamon</td>
<td>Pergamon</td>
</tr>
<tr>
<td>944</td>
<td>1001</td>
<td>American_Physiological_Society</td>
<td>American Physiological Society</td>
</tr>
</tbody>
</table>
<p>945 rows × 3 columns</p>
</div>
```python
publisher = journal_publisher[['publisher_id', 'name']]
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher_id</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
</tr>
<tr>
<td>1</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
</tr>
<tr>
<td>2</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
</tr>
<tr>
<td>3</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
</tr>
<tr>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>IOP_Publishing</td>
<td>IOP Publishing</td>
</tr>
<tr>
<td>941</td>
<td>Elsevier_[etc.]</td>
<td>Elsevier [etc.]</td>
</tr>
<tr>
<td>942</td>
<td>Springer</td>
<td>Springer</td>
</tr>
<tr>
<td>943</td>
<td>Pergamon</td>
<td>Pergamon</td>
</tr>
<tr>
<td>944</td>
<td>American_Physiological_Society</td>
<td>American Physiological Society</td>
</tr>
</tbody>
</table>
<p>945 rows × 2 columns</p>
</div>
```python
# supprimer les doublons
publisher = publisher.drop_duplicates(subset='publisher_id')
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher_id</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
</tr>
<tr>
<td>1</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
</tr>
<tr>
<td>2</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
</tr>
<tr>
<td>3</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
</tr>
<tr>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>929</td>
<td>Fisher</td>
<td>Fisher</td>
</tr>
<tr>
<td>930</td>
<td>Tipografia_La_Commerciale</td>
<td>Tipografia La Commerciale</td>
</tr>
<tr>
<td>932</td>
<td>Red.:_Prof._Dr._F._Cavalli,_Istituto_oncologic...</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
</tr>
<tr>
<td>934</td>
<td>Excerpta_Medica</td>
<td>Excerpta Medica</td>
</tr>
<tr>
<td>937</td>
<td>Generative_Grammar_Group_of_the_Department_of_...</td>
<td>Generative Grammar Group of the Department of ...</td>
</tr>
</tbody>
</table>
<p>380 rows × 2 columns</p>
</div>
```python
# convertir l'index en id
publisher = publisher.reset_index()
# ajout de l'id avec l'index + 1
publisher['id'] = publisher['index'] + 1
del publisher['index']
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher_id</th>
<th>name</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>375</td>
<td>Fisher</td>
<td>Fisher</td>
<td>930</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia_La_Commerciale</td>
<td>Tipografia La Commerciale</td>
<td>931</td>
</tr>
<tr>
<td>377</td>
<td>Red.:_Prof._Dr._F._Cavalli,_Istituto_oncologic...</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>933</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta_Medica</td>
<td>Excerpta Medica</td>
<td>935</td>
</tr>
<tr>
<td>379</td>
<td>Generative_Grammar_Group_of_the_Department_of_...</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>938</td>
</tr>
</tbody>
</table>
<p>380 rows × 3 columns</p>
</div>
```python
# convertir l'index en id
publisher = publisher.reset_index()
# ajout de l'id avec l'index + 1
publisher['id'] = publisher['index'] + 1
del publisher['index']
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher_id</th>
<th>name</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>375</td>
<td>Fisher</td>
<td>Fisher</td>
<td>376</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia_La_Commerciale</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
</tr>
<tr>
<td>377</td>
<td>Red.:_Prof._Dr._F._Cavalli,_Istituto_oncologic...</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta_Medica</td>
<td>Excerpta Medica</td>
<td>379</td>
</tr>
<tr>
<td>379</td>
<td>Generative_Grammar_Group_of_the_Department_of_...</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
</tr>
</tbody>
</table>
<p>380 rows × 3 columns</p>
</div>
```python
# ajout de la valeur UNKNOWN
# 'country': 999999
publisher = publisher.append({'id' : 999999, 'name' : 'UNKNOWN', 'publisher_id': '999999'}, ignore_index=True)
publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher_id</th>
<th>name</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>376</td>
<td>Tipografia_La_Commerciale</td>
<td>Tipografia La Commerciale</td>
<td>377</td>
</tr>
<tr>
<td>377</td>
<td>Red.:_Prof._Dr._F._Cavalli,_Istituto_oncologic...</td>
<td>Red.: Prof. Dr. F. Cavalli, Istituto oncologic...</td>
<td>378</td>
</tr>
<tr>
<td>378</td>
<td>Excerpta_Medica</td>
<td>Excerpta Medica</td>
<td>379</td>
</tr>
<tr>
<td>379</td>
<td>Generative_Grammar_Group_of_the_Department_of_...</td>
<td>Generative Grammar Group of the Department of ...</td>
<td>380</td>
</tr>
<tr>
<td>380</td>
<td>999999</td>
<td>UNKNOWN</td>
<td>999999</td>
</tr>
</tbody>
</table>
<p>381 rows × 3 columns</p>
</div>
```python
# recuperation de l'id du publisher
journal_publisher = pd.merge(journal_publisher, publisher[['publisher_id', 'id']], on='publisher_id', how='left')
journal_publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id</th>
<th>name</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>997</td>
<td>IOP_Publishing</td>
<td>IOP Publishing</td>
<td>47</td>
</tr>
<tr>
<td>941</td>
<td>998</td>
<td>Elsevier_[etc.]</td>
<td>Elsevier [etc.]</td>
<td>75</td>
</tr>
<tr>
<td>942</td>
<td>999</td>
<td>Springer</td>
<td>Springer</td>
<td>8</td>
</tr>
<tr>
<td>943</td>
<td>1000</td>
<td>Pergamon</td>
<td>Pergamon</td>
<td>119</td>
</tr>
<tr>
<td>944</td>
<td>1001</td>
<td>American_Physiological_Society</td>
<td>American Physiological Society</td>
<td>217</td>
</tr>
</tbody>
</table>
<p>945 rows × 4 columns</p>
</div>
```python
journal_publisher = journal_publisher.rename(columns={'id': 'publisher'})
journal_publisher
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id</th>
<th>name</th>
<th>publisher</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue_Médicale_Suisse</td>
<td>Revue Médicale Suisse</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>American_Physical_Society</td>
<td>American Physical Society</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>Public_Library_of_Science</td>
<td>Public Library of Science</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>The_Global_Studies_Institute_de_lUniversité_d...</td>
<td>The Global Studies Institute de lUniversité d...</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>Universitat_de_València,_Departamento_de_Teorí...</td>
<td>Universitat de València, Departamento de Teorí...</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>997</td>
<td>IOP_Publishing</td>
<td>IOP Publishing</td>
<td>47</td>
</tr>
<tr>
<td>941</td>
<td>998</td>
<td>Elsevier_[etc.]</td>
<td>Elsevier [etc.]</td>
<td>75</td>
</tr>
<tr>
<td>942</td>
<td>999</td>
<td>Springer</td>
<td>Springer</td>
<td>8</td>
</tr>
<tr>
<td>943</td>
<td>1000</td>
<td>Pergamon</td>
<td>Pergamon</td>
<td>119</td>
</tr>
<tr>
<td>944</td>
<td>1001</td>
<td>American_Physiological_Society</td>
<td>American Physiological Society</td>
<td>217</td>
</tr>
</tbody>
</table>
<p>945 rows × 4 columns</p>
</div>
```python
# ajout du publisher id au journals_brut
journal_publisher_ids = journal_publisher[['journal', 'publisher']]
journal_publisher_ids = journal_publisher_ids.rename(columns={'journal': 'id'})
journal_publisher_ids['publisher'] = journal_publisher_ids['publisher'].astype(str)
journal_publisher_ids
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>publisher</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>940</td>
<td>997</td>
<td>47</td>
</tr>
<tr>
<td>941</td>
<td>998</td>
<td>75</td>
</tr>
<tr>
<td>942</td>
<td>999</td>
<td>8</td>
</tr>
<tr>
<td>943</td>
<td>1000</td>
<td>119</td>
</tr>
<tr>
<td>944</td>
<td>1001</td>
<td>217</td>
</tr>
</tbody>
</table>
<p>945 rows × 2 columns</p>
</div>
```python
# concat valeurs avec même id
journal_publisher_grouped = journal_publisher_ids.groupby('id').agg({'publisher': lambda x: ', '.join(x)})
journal_publisher_grouped
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>publisher</th>
</tr>
<tr>
<th>id</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>4</td>
<td>4, 5</td>
</tr>
<tr>
<td>5</td>
<td>6</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>997</td>
<td>47</td>
</tr>
<tr>
<td>998</td>
<td>75</td>
</tr>
<tr>
<td>999</td>
<td>8</td>
</tr>
<tr>
<td>1000</td>
<td>119</td>
</tr>
<tr>
<td>1001</td>
<td>217</td>
</tr>
</tbody>
</table>
<p>911 rows × 1 columns</p>
</div>
```python
# recuperation de l'id du publisher
journals = pd.merge(journal, journal_publisher_grouped, on='id', how='left')
journals
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>issn</th>
<th>issnl</th>
<th>title</th>
<th>starting_year</th>
<th>end_year</th>
<th>url</th>
<th>name_short_iso_4</th>
<th>language</th>
<th>country</th>
<th>...</th>
<th>lockss_title</th>
<th>lockss</th>
<th>portico_status</th>
<th>portico</th>
<th>nlch_title</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>doublon_issnl</th>
<th>oa_status</th>
<th>publisher</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1660-9379</td>
<td>1660-9379</td>
<td>Revue médicale suisse</td>
<td>2005</td>
<td>9999</td>
<td>NaN</td>
<td>Rev. méd. suisse</td>
<td>138</td>
<td>215</td>
<td>...</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>0031-9007</td>
<td>0031-9007</td>
<td>Physical review letters (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>Phys. rev. lett. (Print)</td>
<td>124</td>
<td>236</td>
<td>...</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>1.0</td>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>1932-6203</td>
<td>1932-6203</td>
<td>PloS one</td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>NaN</td>
<td>124</td>
<td>236</td>
<td>...</td>
<td>PLoS One</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>4.035714</td>
<td>NaN</td>
<td>5</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>2174-8454</td>
<td>2174-8454</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td>NaN</td>
<td>EU-topías</td>
<td>124, 138, 402, 292</td>
<td>209</td>
<td>...</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>4, 5</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>1098-0121</td>
<td>1098-0121</td>
<td>Physical review. B, Condensed matter and mater...</td>
<td>1998</td>
<td>2015</td>
<td>http://ojps.aip.org/prbo/</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>124</td>
<td>236</td>
<td>...</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>1.0</td>
<td>1</td>
<td>6</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>0964-1726</td>
<td>0964-1726</td>
<td>Smart materials and structures (Print)</td>
<td>1992</td>
<td>9999</td>
<td>NaN</td>
<td>Smart mater. struct. (Print)</td>
<td>124</td>
<td>234</td>
<td>...</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>47</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>0022-3468</td>
<td>0022-3468</td>
<td>Journal of pediatric surgery (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org</td>
<td>J. pediatr. surg. (Print)</td>
<td>124</td>
<td>236</td>
<td>...</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>75</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>1432-2064</td>
<td>0178-8051</td>
<td>Probability theory and related fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>124</td>
<td>83</td>
<td>...</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>preserved</td>
<td>1.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>8</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>0960-1481</td>
<td>0960-1481</td>
<td>Renewable energy</td>
<td>1991</td>
<td>9999</td>
<td>NaN</td>
<td>Renew. energy</td>
<td>124</td>
<td>234</td>
<td>...</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>119</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>0161-7567</td>
<td>0161-7567</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>124</td>
<td>236</td>
<td>...</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>217</td>
</tr>
</tbody>
</table>
<p>911 rows × 24 columns</p>
</div>
```python
# export csv
publisher.to_csv('sample/publishers_brut.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export excel
publisher.to_excel('sample/publishers_brut.xlsx', index=False)
```
```python
# export csv brut des journals
journals.to_csv('sample/journals_publishers_brut.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export excel brut
journals.to_excel('sample/journals_publishers_brut.xlsx', index=False)
```
```python
# export csv brut des ids
journal_publisher_ids.to_csv('sample/journals_publishers_ids.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export excel brut des ids
journal_publisher_ids.to_excel('sample/journals_publishers_ids.xlsx', index=False)
```

Event Timeline