Page MenuHomec4science

06_oacct_sherpa.md
No OneTemporary

File Metadata

Created
Tue, Jul 2, 06:31

06_oacct_sherpa.md

# Projet Open Access Compliance Check Tool (OACCT)
Projet P5 de la bibliothèque de l'EPFL en collaboration avec les bibliothèques des Universités de Genève, Lausanne et Berne : https://www.swissuniversities.ch/themen/digitalisierung/p-5-wissenschaftliche-information/projekte/swiss-mooc-service-1-1-1-1
Ce notebook permet d'extraire les données de Sherpa/Romeo obtenues par API et les traiter pour les rendre exploitables dans l'application OACCT.
Auteur : **Pablo Iriarte**, Université de Genève (pablo.iriarte@unige.ch)
Date de dernière mise à jour : 16.07.2021
## Données de Sherpa/Romeo
### Exemple
https://v2.sherpa.ac.uk/cgi/retrieve_by_id?item-type=publication&api-key=EEE6F146-678E-11EB-9C3A-202F3DE2659A&format=Json&identifier=17601
```python
import pandas as pd
import csv
import json
import numpy as np
import os
# afficher toutes les colonnes
pd.set_option('display.max_columns', None)
```
## Table publisher_sherpa
```python
# creation du DF
col_names = ['journal',
'publisher_id',
'name',
'country',
'type',
'url'
]
publisher_sherpa = pd.DataFrame(columns = col_names)
publisher_sherpa
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id</th>
<th>name</th>
<th>country</th>
<th>type</th>
<th>url</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
## Table sherpa match issn
```python
# creation du DF
col_names = ['issn',
'sherpa_match',
]
sherpa_match_issn = pd.DataFrame(columns = col_names)
sherpa_match_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>sherpa_match</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
## Table sherpa issns
```python
# creation du DF
col_names = ['issn',
'type',
]
sherpa_issn = pd.DataFrame(columns = col_names)
sherpa_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>type</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
## Table sherpa journals
```python
# creation du DF
col_names = ['journal',
'title',
'url',
]
sherpa_journal = pd.DataFrame(columns = col_names)
sherpa_journal
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>title</th>
<th>url</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
## Import table Journals et ISSN
```python
journal = pd.read_csv('sample/journals_publishers_brut.tsv', encoding='utf-8', header=0, sep='\t')
journal
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>issn</th>
<th>issnl</th>
<th>title</th>
<th>starting_year</th>
<th>end_year</th>
<th>url</th>
<th>name_short_iso_4</th>
<th>language</th>
<th>country</th>
<th>doaj_title</th>
<th>doaj_seal</th>
<th>APC</th>
<th>doaj_status</th>
<th>lockss_title</th>
<th>lockss</th>
<th>portico_status</th>
<th>portico</th>
<th>nlch_title</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>doublon_issnl</th>
<th>oa_status</th>
<th>publisher</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1660-9379</td>
<td>1660-9379</td>
<td>Revue médicale suisse</td>
<td>2005</td>
<td>9999</td>
<td>NaN</td>
<td>Rev. méd. suisse</td>
<td>138</td>
<td>215</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>0031-9007</td>
<td>0031-9007</td>
<td>Physical review letters (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>Phys. rev. lett. (Print)</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>1.0</td>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>1932-6203</td>
<td>1932-6203</td>
<td>PloS one</td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>NaN</td>
<td>124</td>
<td>236</td>
<td>PLoS ONE</td>
<td>1.0</td>
<td>Yes</td>
<td>1.0</td>
<td>PLoS One</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>4.035714</td>
<td>NaN</td>
<td>5</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>2174-8454</td>
<td>2174-8454</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td>NaN</td>
<td>EU-topías</td>
<td>124, 138, 402, 292</td>
<td>209</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>4, 5</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>1098-0121</td>
<td>1098-0121</td>
<td>Physical review. B, Condensed matter and mater...</td>
<td>1998</td>
<td>2015</td>
<td>http://ojps.aip.org/prbo/</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>1.0</td>
<td>1</td>
<td>6</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>0964-1726</td>
<td>0964-1726</td>
<td>Smart materials and structures (Print)</td>
<td>1992</td>
<td>9999</td>
<td>NaN</td>
<td>Smart mater. struct. (Print)</td>
<td>124</td>
<td>234</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>47</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>0022-3468</td>
<td>0022-3468</td>
<td>Journal of pediatric surgery (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org</td>
<td>J. pediatr. surg. (Print)</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>75</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>1432-2064</td>
<td>0178-8051</td>
<td>Probability theory and related fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>124</td>
<td>83</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>preserved</td>
<td>1.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>8</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>0960-1481</td>
<td>0960-1481</td>
<td>Renewable energy</td>
<td>1991</td>
<td>9999</td>
<td>NaN</td>
<td>Renew. energy</td>
<td>124</td>
<td>234</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>119</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>0161-7567</td>
<td>0161-7567</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>217</td>
</tr>
</tbody>
</table>
<p>911 rows × 24 columns</p>
</div>
```python
issn = pd.read_csv('sample/issn_brut.tsv', encoding='utf-8', header=0, sep='\t')
issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
<td>PRINT</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>0001-2815</td>
<td>532</td>
<td>NaN</td>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
<td>PRINT</td>
<td>1</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>0001-4842</td>
<td>498</td>
<td>NaN</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
<td>PRINT</td>
<td>1</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
<td>OTHER</td>
<td>3</td>
<td>1756</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>2470-0045</td>
<td>533</td>
<td>NaN</td>
<td>3</td>
<td>1757</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1758</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
<td>PRINT</td>
<td>1</td>
<td>1759</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>3</td>
<td>1760</td>
</tr>
</tbody>
</table>
<p>1760 rows × 6 columns</p>
</div>
```python
issn_ids = pd.read_csv('sample/issn_ids.tsv', encoding='utf-8', header=0, sep='\t')
issn_ids
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>1399-0039</td>
<td>0001-2815</td>
<td>532</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>1520-4898</td>
<td>0001-4842</td>
<td>498</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>1756</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
</tr>
<tr>
<td>1756</td>
<td>1757</td>
<td>2470-0053</td>
<td>2470-0045</td>
<td>533</td>
</tr>
<tr>
<td>1757</td>
<td>1758</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
</tr>
<tr>
<td>1758</td>
<td>1759</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
</tr>
<tr>
<td>1759</td>
<td>1760</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
</tr>
</tbody>
</table>
<p>1760 rows × 4 columns</p>
</div>
## Extraction de Sherpa Romeo
```python
# extraction des informations à partir des données Sherpa/Romeo
for index, row in issn.iterrows():
journal_id = row['journal']
journal_issn = row['issn']
# if (((index/10) - int(index/10)) == 0) :
# print(index)
# initialisation des variables à extraire
publisher_id = np.nan
publisher_name = ''
publisher_country = ''
publisher_type = ''
publisher_url = ''
# boucle des fichiers json
# test d'existance du fichier
# print(row['issn'])
if os.path.exists('sherpa/data/' + journal_issn + '.json'):
with open('sherpa/data/' + journal_issn + '.json', 'r', encoding='utf-8') as f:
data = json.load(f)
if (len(data['items']) > 0):
publisher_id = data['items'][0]['publishers'][0]['publisher']['id']
if ('country' in data['items'][0]['publishers'][0]['publisher']):
publisher_country = data['items'][0]['publishers'][0]['publisher']['country']
if ('relationship_type' in data['items'][0]['publishers'][0]):
publisher_type = data['items'][0]['publishers'][0]['relationship_type']
if ('url' in data['items'][0]['publishers'][0]['publisher']):
publisher_url = data['items'][0]['publishers'][0]['publisher']['url']
if ('name' in data['items'][0]['publishers'][0]['publisher']['name'][0]):
publisher_name = data['items'][0]['publishers'][0]['publisher']['name'][0]['name']
sherpa_match = 'OK'
publisher_sherpa = publisher_sherpa.append({'journal' : journal_id, 'publisher_id' : publisher_id,
'name' : publisher_name, 'country' : publisher_country,
'type' : publisher_type, 'url' : publisher_url}, ignore_index=True)
else :
print(row['issn'] + ' - trouvé mais vide')
sherpa_match = 'empty'
else :
print(row['issn'] + ' - pas trouvé')
sherpa_match = 'missing'
sherpa_match_issn = sherpa_match_issn.append({'issn' : row['issn'], 'sherpa_match' : sherpa_match}, ignore_index=True)
```
1399-0039 - pas trouvé
1520-8524 - trouvé mais vide
1520-9024 - pas trouvé
1468-2834 - pas trouvé
1551-2916 - pas trouvé
1943-2984 - pas trouvé
1555-7162 - trouvé mais vide
2163-5773 - pas trouvé
1873-4324 - trouvé mais vide
1526-7598 - pas trouvé
1673-3134 - pas trouvé
1777-5884 - pas trouvé
1528-1140 - pas trouvé
1468-2060 - pas trouvé
1552-6259 - pas trouvé
0003-6935 - trouvé mais vide
1520-8842 - pas trouvé
0003-9926 - trouvé mais vide
1538-3679 - pas trouvé
0003-9942 - trouvé mais vide
1538-3687 - pas trouvé
1529-0131 - pas trouvé
1090-2104 - trouvé mais vide
1943-295X - pas trouvé
1878-2434 - pas trouvé
1873-2402 - trouvé mais vide
1872-6240 - trouvé mais vide
1365-2133 - pas trouvé
0007-4403 - trouvé mais vide
1968-3766 - pas trouvé
0008-042X - trouvé mais vide
2104-3329 - pas trouvé
2268-7963 - pas trouvé
1873-3948 - trouvé mais vide
1873-4405 - trouvé mais vide
1872-6836 - trouvé mais vide
1873-4448 - trouvé mais vide
1524-4571 - trouvé mais vide
1873-7838 - trouvé mais vide
1879-2944 - trouvé mais vide
1873-3840 - trouvé mais vide
1973-8102 - trouvé mais vide
0011-1600 - trouvé mais vide
1968-3901 - pas trouvé
1879-2235 - trouvé mais vide
1095-564X - trouvé mais vide
1931-3543 - pas trouvé
1385-013X - trouvé mais vide
1873-3859 - trouvé mais vide
1873-7315 - trouvé mais vide
0013-8584 - trouvé mais vide
2309-4672 - pas trouvé
0014-2239 - trouvé mais vide
2272-9011 - pas trouvé
0945-5795 - pas trouvé
1432-1033 - pas trouvé
1365-2362 - pas trouvé
1090-2422 - trouvé mais vide
1026-7484 - trouvé mais vide
1528-0012 - trouvé mais vide
1872-9533 - trouvé mais vide
0016-9161 - trouvé mais vide
2297-7953 - pas trouvé
1879-2189 - trouvé mais vide
0018-0238 - trouvé mais vide
2297-1971 - pas trouvé
2334-3303 - pas trouvé
1070-6313 - pas trouvé
1873-3255 - trouvé mais vide
1097-0215 - pas trouvé
1879-2146 - trouvé mais vide
0021-8170 - trouvé mais vide
2114-6292 - pas trouvé
1090-266X - trouvé mais vide
1520-8850 - trouvé mais vide
1879-1484 - trouvé mais vide
1067-8832 - pas trouvé
1067-8816 - pas trouvé
1873-2380 - trouvé mais vide
1090-2694 - trouvé mais vide
1520-9032 - pas trouvé
1873-3778 - trouvé mais vide
1945-7197 - pas trouvé
0021-9797 - trouvé mais vide
1090-2716 - trouvé mais vide
1873-5002 - pas trouvé
0022-0728 - trouvé mais vide
1879-2707 - trouvé mais vide
1872-7883 - trouvé mais vide
1527-2427 - trouvé mais vide
1089-8638 - trouvé mais vide
1873-4820 - trouvé mais vide
1872-8561 - trouvé mais vide
1531-5037 - trouvé mais vide
1085-8695 - pas trouvé
1097-6833 - pas trouvé
1879-2553 - trouvé mais vide
1097-6841 - pas trouvé
2050-5639 - pas trouvé
1873-4782 - trouvé mais vide
1878-5883 - trouvé mais vide
1085-8687 - pas trouvé
1097-685X - pas trouvé
1070-6321 - pas trouvé
1091-756X - pas trouvé
1939-5590 - trouvé mais vide
1939-5604 - pas trouvé
1873-1856 - trouvé mais vide
1872-6143 - pas trouvé
0025-6749 - trouvé mais vide
1423-0356 - pas trouvé
0026-4598 - pas trouvé
1432-1874 - pas trouvé
0027-4054 - trouvé mais vide
1873-3514 - trouvé mais vide
1873-0310 - trouvé mais vide
1872-616X - pas trouvé
1402-4896 - pas trouvé
0031-8965 - trouvé mais vide
1521-396X - pas trouvé
1092-0145 - trouvé mais vide
1873-3700 - pas trouvé
1532-2548 - pas trouvé
1527-2400 - trouvé mais vide
0035-1121 - trouvé mais vide
1760-7426 - pas trouvé
0035-1784 - trouvé mais vide
2297-1254 - pas trouvé
0035-3655 - trouvé mais vide
2104-385X - pas trouvé
0036-7486 - trouvé mais vide
1424-4004 - trouvé mais vide
0036-7672 - trouvé mais vide
0036-7699 - trouvé mais vide
0036-7893 - trouvé mais vide
2504-1452 - pas trouvé
1471-1257 - pas trouvé
1879-2766 - trouvé mais vide
1879-2405 - trouvé mais vide
1879-2758 - trouvé mais vide
1464-5416 - pas trouvé
1873-3581 - pas trouvé
1664-2864 - pas trouvé
1879-2731 - pas trouvé
1534-6080 - trouvé mais vide
1873-2623 - pas trouvé
1096-0341 - trouvé mais vide
1878-5646 - trouvé mais vide
1879-2448 - pas trouvé
1879-1298 - trouvé mais vide
1879-2138 - trouvé mais vide
0046-2497 - trouvé mais vide
1776-2936 - pas trouvé
1873-7625 - trouvé mais vide
1879-2472 - pas trouvé
2214-8019 - trouvé mais vide
0065-7727 - trouvé mais vide
1070-6283 - pas trouvé
0066-6653 - trouvé mais vide
0072-0585 - trouvé mais vide
1079-2376 - pas trouvé
1557-7988 - trouvé mais vide
0081-1254 - trouvé mais vide
1523-1755 - pas trouvé
1085-8725 - pas trouvé
1097-6825 - trouvé mais vide
1096-0260 - pas trouvé
1522-8541 - pas trouvé
1551-7616 - pas trouvé
1935-0465 - pas trouvé
1070-633X - pas trouvé
1873-4375 - trouvé mais vide
1070-6291 - pas trouvé
0108-2701 - trouvé mais vide
1600-5759 - pas trouvé
1879-0097 - pas trouvé
1879-2081 - pas trouvé
1873-7323 - trouvé mais vide
1879-3452 - trouvé mais vide
1878-5905 - trouvé mais vide
1532-1991 - pas trouvé
1071-2763 - pas trouvé
1071-8842 - pas trouvé
2156-2202 - pas trouvé
1081-1281 - pas trouvé
1873-7528 - trouvé mais vide
1773-0406 - trouvé mais vide
0151-0193 - trouvé mais vide
2101-0218 - trouvé mais vide
0161-7567 - trouvé mais vide
2160-9292 - trouvé mais vide
1095-3795 - trouvé mais vide
1872-678X - trouvé mais vide
1573-2517 - pas trouvé
1872-7557 - trouvé mais vide
1872-7123 - trouvé mais vide
1872-7441 - trouvé mais vide
1872-7999 - pas trouvé
1879-1514 - pas trouvé
1874-1754 - trouvé mais vide
1872-7697 - trouvé mais vide
1873-5568 - trouvé mais vide
1872-7352 - pas trouvé
1872-9584 - trouvé mais vide
1600-0641 - trouvé mais vide
1872-9576 - trouvé mais vide
1873-5460 - pas trouvé
1873-5584 - trouvé mais vide
1872-695X - pas trouvé
1432-0827 - pas trouvé
1432-1262 - pas trouvé
0181-5512 - trouvé mais vide
1773-0597 - pas trouvé
1879-2367 - trouvé mais vide
1532-2939 - trouvé mais vide
1527-3296 - pas trouvé
1558-1497 - trouvé mais vide
0221-5918 - trouvé mais vide
0248-8663 - trouvé mais vide
1768-3122 - trouvé mais vide
0252-1881 - trouvé mais vide
0252-2969 - trouvé mais vide
1661-5468 - pas trouvé
0254-945X - trouvé mais vide
1662-9760 - pas trouvé
0255-9005 - trouvé mais vide
0258-6800 - trouvé mais vide
1432-0819 - pas trouvé
0259-6199 - trouvé mais vide
1661-3171 - trouvé mais vide
1532-1983 - pas trouvé
1873-2518 - trouvé mais vide
1365-2346 - pas trouvé
1476-5365 - pas trouvé
1067-8824 - pas trouvé
0271-4302 - trouvé mais vide
2158-1525 - pas trouvé
1536-4801 - pas trouvé
1873-457X - pas trouvé
1531-5053 - pas trouvé
1470-8752 - pas trouvé
1879-176X - pas trouvé
1873-4421 - pas trouvé
1432-1998 - pas trouvé
1873-6246 - pas trouvé
1873-6777 - pas trouvé
1879-3533 - trouvé mais vide
1872-8057 - trouvé mais vide
1872-7972 - trouvé mais vide
1879-2723 - trouvé mais vide
1879-2774 - pas trouvé
1873-4766 - trouvé mais vide
1362-4954 - pas trouvé
1365-2842 - pas trouvé
1361-6447 - trouvé mais vide
1872-9118 - trouvé mais vide
1873-7544 - trouvé mais vide
1873-3360 - pas trouvé
1873-2100 - pas trouvé
1872-9657 - trouvé mais vide
1499-2752 - pas trouvé
2567-689X - trouvé mais vide
1432-1238 - pas trouvé
1873-684X - trouvé mais vide
1879-355X - trouvé mais vide
1879-3487 - trouvé mais vide
1873-6785 - trouvé mais vide
1546-3141 - pas trouvé
0362-1340 - trouvé mais vide
1523-2867 - pas trouvé
1558-1160 - trouvé mais vide
1432-2323 - pas trouvé
0365-7116 - trouvé mais vide
1873-2526 - pas trouvé
0368-4466 - trouvé mais vide
1588-2926 - pas trouvé
0369-3392 - trouvé mais vide
1873-2445 - trouvé mais vide
0373-2525 - trouvé mais vide
0373-2967 - trouvé mais vide
2235-3658 - pas trouvé
0373-6156 - trouvé mais vide
2391-1336 - pas trouvé
0374-4256 - trouvé mais vide
0375-1457 - trouvé mais vide
2419-8196 - pas trouvé
1873-2429 - trouvé mais vide
1872-6097 - pas trouvé
1872-6860 - trouvé mais vide
1574-6968 - pas trouvé
1879-0038 - trouvé mais vide
1873-3476 - trouvé mais vide
1873-2755 - trouvé mais vide
1872-6178 - trouvé mais vide
1873-2046 - trouvé mais vide
1872-6283 - trouvé mais vide
0398-3412 - trouvé mais vide
2297-5810 - pas trouvé
0409-8757 - trouvé mais vide
1461-7412 - pas trouvé
1873-1562 - trouvé mais vide
1089-4918 - trouvé mais vide
1538-4500 - pas trouvé
0570-0833 - trouvé mais vide
0583-8401 - trouvé mais vide
1872-7727 - trouvé mais vide
1873-264X - trouvé mais vide
1527-7755 - pas trouvé
1520-8559 - trouvé mais vide
1558-3597 - trouvé mais vide
1873-5134 - pas trouvé
1096-3677 - pas trouvé
2213-0276 - pas trouvé
1958-5381 - pas trouvé
1651-2227 - pas trouvé
0884-1616 - trouvé mais vide
1091-8876 - pas trouvé
1092-8928 - pas trouvé
1089-8646 - pas trouvé
0888-8809 - trouvé mais vide
1944-9917 - trouvé mais vide
1532-0987 - pas trouvé
0894-8275 - trouvé mais vide
1878-5921 - pas trouvé
1520-636X - pas trouvé
1399-3038 - pas trouvé
1873-7196 - trouvé mais vide
1873-4308 - trouvé mais vide
1573-2509 - trouvé mais vide
1879-0658 - trouvé mais vide
1873-2135 - pas trouvé
1873-2143 - pas trouvé
1873-4936 - trouvé mais vide
1873-4944 - pas trouvé
1872-793X - trouvé mais vide
1873-3069 - pas trouvé
1872-8286 - trouvé mais vide
1873-3077 - pas trouvé
1873-4669 - trouvé mais vide
1873-3883 - trouvé mais vide
0926-9630 - trouvé mais vide
1879-8365 - trouvé mais vide
1879-3398 - trouvé mais vide
1873-4359 - trouvé mais vide
1879-0720 - trouvé mais vide
1769-664X - pas trouvé
1432-2218 - pas trouvé
1866-6817 - pas trouvé
1432-2277 - pas trouvé
1435-4373 - pas trouvé
1433-2965 - pas trouvé
1873-3441 - pas trouvé
1362-3044 - pas trouvé
1879-0526 - trouvé mais vide
1879-0828 - pas trouvé
1879-0410 - trouvé mais vide
1873-619X - trouvé mais vide
1873-4235 - trouvé mais vide
1362-511X - pas trouvé
1879-0429 - trouvé mais vide
1879-1786 - trouvé mais vide
1879-0852 - pas trouvé
1879-0682 - pas trouvé
1873-2976 - trouvé mais vide
1464-3405 - trouvé mais vide
1466-1861 - pas trouvé
1555-3892 - pas trouvé
1360-0443 - pas trouvé
1464-3391 - trouvé mais vide
1879-2359 - pas trouvé
0992-986X - trouvé mais vide
2119-4130 - pas trouvé
0995-3817 - trouvé mais vide
2219-2840 - pas trouvé
1010-2248 - trouvé mais vide
1664-9885 - pas trouvé
1873-2666 - pas trouvé
1017-0588 - trouvé mais vide
1018-7987 - trouvé mais vide
1019-0406 - trouvé mais vide
1023-2044 - trouvé mais vide
1023-9332 - trouvé mais vide
2235-1884 - pas trouvé
1560-7917 - pas trouvé
1026-7530 - pas trouvé
1607-8489 - pas trouvé
1127-2236 - pas trouvé
1938-808X - pas trouvé
1095-8657 - trouvé mais vide
1536-3732 - pas trouvé
1049-5258 - trouvé mais vide
1538-4446 - pas trouvé
1095-9572 - trouvé mais vide
1532-6500 - trouvé mais vide
1059-1524 - trouvé mais vide
1095-3787 - trouvé mais vide
1538-4519 - trouvé mais vide
1063-6919 - trouvé mais vide
2332-564X - pas trouvé
2575-7075 - pas trouvé
1940-6029 - trouvé mais vide
1527-2435 - pas trouvé
1527-2419 - pas trouvé
1071-1023 - trouvé mais vide
1520-8567 - pas trouvé
1090-235X - trouvé mais vide
1532-2130 - pas trouvé
1096-0856 - trouvé mais vide
1538-4489 - pas trouvé
1155-4339 - trouvé mais vide
1764-7177 - pas trouvé
1460-9592 - pas trouvé
1878-3511 - pas trouvé
1778-7254 - pas trouvé
1873-4030 - pas trouvé
1873-2844 - trouvé mais vide
1873-5126 - trouvé mais vide
1873-5606 - pas trouvé
1873-2453 - trouvé mais vide
1872-8456 - pas trouvé
2040-2058 - pas trouvé
1878-5840 - trouvé mais vide
1473-6519 - pas trouvé
1879-0690 - trouvé mais vide
1466-609X - pas trouvé
1367-4811 - trouvé mais vide
1873-4286 - pas trouvé
1873-3212 - trouvé mais vide
1873-1759 - pas trouvé
1875-8908 - trouvé mais vide
1872-8952 - trouvé mais vide
1873-1902 - trouvé mais vide
1600-0854 - pas trouvé
1420-5556 - trouvé mais vide
1420-7192 - trouvé mais vide
1662-0879 - pas trouvé
1422-2019 - trouvé mais vide
1422-3449 - trouvé mais vide
1422-5778 - trouvé mais vide
2504-1436 - pas trouvé
1423-3967 - trouvé mais vide
1663-3997 - pas trouvé
1424-1811 - trouvé mais vide
2504-1460 - pas trouvé
1424-4020 - pas trouvé
1424-7410 - trouvé mais vide
1424-7755 - trouvé mais vide
1436-3771 - pas trouvé
1434-6028 - trouvé mais vide
1434-6036 - trouvé mais vide
1439-4456 - pas trouvé
1449-8979 - pas trouvé
1873-6416 - trouvé mais vide
1465-6914 - trouvé mais vide
1478-6362 - pas trouvé
1520-6149 - trouvé mais vide
2379-190X - trouvé mais vide
1522-1601 - pas trouvé
1708-8208 - pas trouvé
1944-7884 - pas trouvé
1527-6473 - pas trouvé
1947-3893 - pas trouvé
1530-1591 - trouvé mais vide
1558-1101 - pas trouvé
1860-2002 - pas trouvé
1552-5279 - pas trouvé
1557-170X - trouvé mais vide
1878-5530 - trouvé mais vide
1878-1519 - trouvé mais vide
1569-9293 - pas trouvé
1873-376X - pas trouvé
1720-8319 - pas trouvé
1610-0379 - trouvé mais vide
1610-0387 - pas trouvé
1778-3569 - trouvé mais vide
1660-3362 - trouvé mais vide
1660-9379 - trouvé mais vide
1660-9603 - trouvé mais vide
1661-1179 - trouvé mais vide
1661-2620 - trouvé mais vide
1661-464X - trouvé mais vide
1661-4941 - trouvé mais vide
1661-8165 - pas trouvé
1662-551X - pas trouvé
1662-5536 - trouvé mais vide
1662-6001 - trouvé mais vide
1662-601X - pas trouvé
1662-8705 - trouvé mais vide
1777-5477 - trouvé mais vide
1810-7621 - pas trouvé
1863-2300 - pas trouvé
1873-2763 - trouvé mais vide
1876-7737 - pas trouvé
1878-8769 - trouvé mais vide
1939-5175 - trouvé mais vide
1945-7928 - trouvé mais vide
1945-7936 - pas trouvé
1945-8452 - trouvé mais vide
1992-2655 - trouvé mais vide
2050-7534 - trouvé mais vide
2101-6275 - pas trouvé
2161-2129 - pas trouvé
2160-5033 - trouvé mais vide
2160-5041 - pas trouvé
2160-9020 - trouvé mais vide
2160-9047 - pas trouvé
2164-3342 - trouvé mais vide
2174-8454 - trouvé mais vide
2340-115X - pas trouvé
2211-3282 - trouvé mais vide
2264-7228 - trouvé mais vide
2297-0703 - trouvé mais vide
2297-6981 - trouvé mais vide
2297-7007 - pas trouvé
2352-1791 - trouvé mais vide
2504-4427 - trouvé mais vide
2504-4435 - trouvé mais vide
```python
publisher_sherpa
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id</th>
<th>name</th>
<th>country</th>
<th>type</th>
<th>url</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>532</td>
<td>45</td>
<td>John Wiley and Sons</td>
<td>gb</td>
<td>former_publisher</td>
<td>http://www.wiley.com/</td>
</tr>
<tr>
<td>1</td>
<td>498</td>
<td>4</td>
<td>American Chemical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://pubs.acs.org/</td>
</tr>
<tr>
<td>2</td>
<td>498</td>
<td>4</td>
<td>American Chemical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://pubs.acs.org/</td>
</tr>
<tr>
<td>3</td>
<td>789</td>
<td>126</td>
<td>Acoustical Society of America</td>
<td>us</td>
<td>society_publisher</td>
<td>http://acousticalsociety.org/</td>
</tr>
<tr>
<td>4</td>
<td>166</td>
<td>3291</td>
<td>Springer</td>
<td>gb</td>
<td>commercial_publisher</td>
<td>https://www.springernature.com/gp/products/jou...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1238</td>
<td>80</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>1239</td>
<td>80</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>1240</td>
<td>533</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>1241</td>
<td>533</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>1242</td>
<td>608</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
</tbody>
</table>
<p>1243 rows × 6 columns</p>
</div>
```python
sherpa_match_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>sherpa_match</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>OK</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>missing</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>OK</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>OK</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>OK</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>OK</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>OK</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>OK</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>empty</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>empty</td>
</tr>
</tbody>
</table>
<p>1760 rows × 2 columns</p>
</div>
```python
# dedup
publisher_sherpa_dedup = publisher_sherpa.drop_duplicates()
publisher_sherpa_dedup
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>publisher_id</th>
<th>name</th>
<th>country</th>
<th>type</th>
<th>url</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>532</td>
<td>45</td>
<td>John Wiley and Sons</td>
<td>gb</td>
<td>former_publisher</td>
<td>http://www.wiley.com/</td>
</tr>
<tr>
<td>1</td>
<td>498</td>
<td>4</td>
<td>American Chemical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://pubs.acs.org/</td>
</tr>
<tr>
<td>3</td>
<td>789</td>
<td>126</td>
<td>Acoustical Society of America</td>
<td>us</td>
<td>society_publisher</td>
<td>http://acousticalsociety.org/</td>
</tr>
<tr>
<td>4</td>
<td>166</td>
<td>3291</td>
<td>Springer</td>
<td>gb</td>
<td>commercial_publisher</td>
<td>https://www.springernature.com/gp/products/jou...</td>
</tr>
<tr>
<td>6</td>
<td>807</td>
<td>3291</td>
<td>Springer</td>
<td>gb</td>
<td>commercial_publisher</td>
<td>https://www.springernature.com/gp/products/jou...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1235</td>
<td>870</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>1236</td>
<td>41</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>1238</td>
<td>80</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>1240</td>
<td>533</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
<tr>
<td>1242</td>
<td>608</td>
<td>10</td>
<td>American Physical Society</td>
<td>us</td>
<td>society_publisher</td>
<td>http://www.aps.org/</td>
</tr>
</tbody>
</table>
<p>808 rows × 6 columns</p>
</div>
```python
sherpa_match_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>sherpa_match</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>OK</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>missing</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>OK</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>OK</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>OK</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>OK</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>OK</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>OK</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>empty</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>empty</td>
</tr>
</tbody>
</table>
<p>1760 rows × 2 columns</p>
</div>
```python
# ajout du issnl et du titre
sherpa_match_issn = pd.merge(sherpa_match_issn, issn_ids, on='issn', how='left')
sherpa_match_issn = pd.merge(sherpa_match_issn, journal[['issnl', 'title']], on='issnl', how='left')
sherpa_match_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>sherpa_match</th>
<th>id</th>
<th>issnl</th>
<th>journal</th>
<th>title</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>OK</td>
<td>1</td>
<td>0001-2815</td>
<td>532</td>
<td>Tissue antigens</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>missing</td>
<td>2</td>
<td>0001-2815</td>
<td>532</td>
<td>Tissue antigens</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>OK</td>
<td>3</td>
<td>0001-4842</td>
<td>498</td>
<td>Accounts of chemical research</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>OK</td>
<td>4</td>
<td>0001-4842</td>
<td>498</td>
<td>Accounts of chemical research</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>OK</td>
<td>5</td>
<td>0001-4966</td>
<td>789</td>
<td>The Journal of the Acoustical Society of America</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>OK</td>
<td>1756</td>
<td>2470-0045</td>
<td>533</td>
<td>Physical review. E (Print)</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>OK</td>
<td>1757</td>
<td>2470-0045</td>
<td>533</td>
<td>Physical review. E (Print)</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>OK</td>
<td>1758</td>
<td>2475-9953</td>
<td>608</td>
<td>Physical review materials</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>empty</td>
<td>1759</td>
<td>2504-4427</td>
<td>994</td>
<td>GG@G (Print)</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>empty</td>
<td>1760</td>
<td>2504-4427</td>
<td>994</td>
<td>GG@G (Print)</td>
</tr>
</tbody>
</table>
<p>1760 rows × 6 columns</p>
</div>
```python
sherpa_match_results = sherpa_match_issn[['id', 'issnl', 'sherpa_match']].groupby(['issnl', 'sherpa_match']).count()
sherpa_match_results
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th></th>
<th>id</th>
</tr>
<tr>
<th>issnl</th>
<th>sherpa_match</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="2" valign="top">0001-2815</td>
<td>OK</td>
<td>1</td>
</tr>
<tr>
<td>missing</td>
<td>1</td>
</tr>
<tr>
<td>0001-4842</td>
<td>OK</td>
<td>2</td>
</tr>
<tr>
<td rowspan="2" valign="top">0001-4966</td>
<td>OK</td>
<td>1</td>
</tr>
<tr>
<td>empty</td>
<td>1</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>2469-9950</td>
<td>OK</td>
<td>2</td>
</tr>
<tr>
<td>2470-0010</td>
<td>OK</td>
<td>2</td>
</tr>
<tr>
<td>2470-0045</td>
<td>OK</td>
<td>2</td>
</tr>
<tr>
<td>2475-9953</td>
<td>OK</td>
<td>1</td>
</tr>
<tr>
<td>2504-4427</td>
<td>empty</td>
<td>2</td>
</tr>
</tbody>
</table>
<p>1302 rows × 1 columns</p>
</div>
```python
sherpa_match_results = sherpa_match_results.reset_index()
sherpa_match_results
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issnl</th>
<th>sherpa_match</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>OK</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0001-2815</td>
<td>missing</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>OK</td>
<td>2</td>
</tr>
<tr>
<td>3</td>
<td>0001-4966</td>
<td>OK</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>empty</td>
<td>1</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1297</td>
<td>2469-9950</td>
<td>OK</td>
<td>2</td>
</tr>
<tr>
<td>1298</td>
<td>2470-0010</td>
<td>OK</td>
<td>2</td>
</tr>
<tr>
<td>1299</td>
<td>2470-0045</td>
<td>OK</td>
<td>2</td>
</tr>
<tr>
<td>1300</td>
<td>2475-9953</td>
<td>OK</td>
<td>1</td>
</tr>
<tr>
<td>1301</td>
<td>2504-4427</td>
<td>empty</td>
<td>2</td>
</tr>
</tbody>
</table>
<p>1302 rows × 3 columns</p>
</div>
```python
sherpa_match_results_ok = sherpa_match_results.loc[sherpa_match_results['sherpa_match'] == 'OK']
issn_ids_issnl = issn_ids[['issnl', 'journal']].drop_duplicates(subset='issnl')
issn_ids_issnl = pd.merge(issn_ids_issnl, sherpa_match_results_ok, on='issnl', how='left')
issn_ids_issnl = pd.merge(issn_ids_issnl, journal[['issnl', 'title']], on='issnl', how='left')
issn_ids_issnl
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issnl</th>
<th>journal</th>
<th>sherpa_match</th>
<th>id</th>
<th>title</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>532</td>
<td>OK</td>
<td>1.0</td>
<td>Tissue antigens</td>
</tr>
<tr>
<td>1</td>
<td>0001-4842</td>
<td>498</td>
<td>OK</td>
<td>2.0</td>
<td>Accounts of chemical research</td>
</tr>
<tr>
<td>2</td>
<td>0001-4966</td>
<td>789</td>
<td>OK</td>
<td>1.0</td>
<td>The Journal of the Acoustical Society of America</td>
</tr>
<tr>
<td>3</td>
<td>0001-6268</td>
<td>166</td>
<td>OK</td>
<td>2.0</td>
<td>Acta neurochirurgica</td>
</tr>
<tr>
<td>4</td>
<td>0001-6322</td>
<td>807</td>
<td>OK</td>
<td>2.0</td>
<td>Acta neuropathologica</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>904</td>
<td>2469-9950</td>
<td>41</td>
<td>OK</td>
<td>2.0</td>
<td>Physical review. B</td>
</tr>
<tr>
<td>905</td>
<td>2470-0010</td>
<td>80</td>
<td>OK</td>
<td>2.0</td>
<td>Physical review. D</td>
</tr>
<tr>
<td>906</td>
<td>2470-0045</td>
<td>533</td>
<td>OK</td>
<td>2.0</td>
<td>Physical review. E (Print)</td>
</tr>
<tr>
<td>907</td>
<td>2475-9953</td>
<td>608</td>
<td>OK</td>
<td>1.0</td>
<td>Physical review materials</td>
</tr>
<tr>
<td>908</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>NaN</td>
<td>GG@G (Print)</td>
</tr>
</tbody>
</table>
<p>909 rows × 5 columns</p>
</div>
```python
journals_not_sherpa = issn_ids_issnl.loc[issn_ids_issnl['sherpa_match'].isna()]
journals_not_sherpa
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issnl</th>
<th>journal</th>
<th>sherpa_match</th>
<th>id</th>
<th>title</th>
</tr>
</thead>
<tbody>
<tr>
<td>24</td>
<td>0003-6935</td>
<td>398</td>
<td>NaN</td>
<td>NaN</td>
<td>Applied optics</td>
</tr>
<tr>
<td>27</td>
<td>0003-9926</td>
<td>605</td>
<td>NaN</td>
<td>NaN</td>
<td>Archives of internal medicine (1960)</td>
</tr>
<tr>
<td>28</td>
<td>0003-9942</td>
<td>974</td>
<td>NaN</td>
<td>NaN</td>
<td>Archives of neurology (Chicago)</td>
</tr>
<tr>
<td>47</td>
<td>0007-4403</td>
<td>885</td>
<td>NaN</td>
<td>NaN</td>
<td>Bulletin de psychologie</td>
</tr>
<tr>
<td>48</td>
<td>0008-042X</td>
<td>180</td>
<td>NaN</td>
<td>NaN</td>
<td>Cahiers pédagogiques (Revue)</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>889</td>
<td>2264-7228</td>
<td>503</td>
<td>NaN</td>
<td>NaN</td>
<td>Distances et médiations des savoirs</td>
</tr>
<tr>
<td>892</td>
<td>2297-0703</td>
<td>989</td>
<td>NaN</td>
<td>NaN</td>
<td>Schweizer Krebs-Bulletin</td>
</tr>
<tr>
<td>893</td>
<td>2297-6981</td>
<td>618</td>
<td>NaN</td>
<td>NaN</td>
<td>Swiss archives of neurology, psychiatry and ps...</td>
</tr>
<tr>
<td>898</td>
<td>2352-1791</td>
<td>639</td>
<td>NaN</td>
<td>NaN</td>
<td>Nuclear materials and energy</td>
</tr>
<tr>
<td>908</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>NaN</td>
<td>GG@G (Print)</td>
</tr>
</tbody>
</table>
<p>101 rows × 5 columns</p>
</div>
```python
sherpa_match_results_empty = sherpa_match_results.loc[sherpa_match_results['sherpa_match'] == 'empty']
sherpa_match_results_missing = sherpa_match_results.loc[sherpa_match_results['sherpa_match'] == 'missing']
del journals_not_sherpa['sherpa_match']
del journals_not_sherpa['id']
journals_not_sherpa = pd.merge(journals_not_sherpa, sherpa_match_results_empty, on='issnl', how='left')
del journals_not_sherpa['id']
journals_not_sherpa = pd.merge(journals_not_sherpa, sherpa_match_results_missing, on='issnl', how='left')
del journals_not_sherpa['id']
journals_not_sherpa
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issnl</th>
<th>journal</th>
<th>title</th>
<th>sherpa_match_x</th>
<th>sherpa_match_y</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0003-6935</td>
<td>398</td>
<td>Applied optics</td>
<td>empty</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>0003-9926</td>
<td>605</td>
<td>Archives of internal medicine (1960)</td>
<td>empty</td>
<td>missing</td>
</tr>
<tr>
<td>2</td>
<td>0003-9942</td>
<td>974</td>
<td>Archives of neurology (Chicago)</td>
<td>empty</td>
<td>missing</td>
</tr>
<tr>
<td>3</td>
<td>0007-4403</td>
<td>885</td>
<td>Bulletin de psychologie</td>
<td>empty</td>
<td>missing</td>
</tr>
<tr>
<td>4</td>
<td>0008-042X</td>
<td>180</td>
<td>Cahiers pédagogiques (Revue)</td>
<td>empty</td>
<td>missing</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>96</td>
<td>2264-7228</td>
<td>503</td>
<td>Distances et médiations des savoirs</td>
<td>empty</td>
<td>NaN</td>
</tr>
<tr>
<td>97</td>
<td>2297-0703</td>
<td>989</td>
<td>Schweizer Krebs-Bulletin</td>
<td>empty</td>
<td>NaN</td>
</tr>
<tr>
<td>98</td>
<td>2297-6981</td>
<td>618</td>
<td>Swiss archives of neurology, psychiatry and ps...</td>
<td>empty</td>
<td>missing</td>
</tr>
<tr>
<td>99</td>
<td>2352-1791</td>
<td>639</td>
<td>Nuclear materials and energy</td>
<td>empty</td>
<td>NaN</td>
</tr>
<tr>
<td>100</td>
<td>2504-4427</td>
<td>994</td>
<td>GG@G (Print)</td>
<td>empty</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>101 rows × 5 columns</p>
</div>
```python
# extraction des informations des journaux à partir des données Sherpa/Romeo
for index, row in issn.iterrows():
journal_id = row['journal']
journal_issn = row['issn']
# boucle des fichiers json
# test d'existance du fichier
# print(row['format'])
if (((index/10) - int(index/10)) == 0) :
print(index)
if os.path.exists('sherpa/data/' + journal_issn + '.json'):
with open('sherpa/data/' + journal_issn + '.json', 'r', encoding='utf-8') as f:
data = json.load(f)
title = np.nan
url = np.nan
if (len(data['items']) > 0):
if ('url' in data['items'][0]):
url = data['items'][0]['url']
if ('title' in data['items'][0]['title'][0]):
title = data['items'][0]['title'][0]['title']
sherpa_journal = sherpa_journal.append({'journal' : journal_id, 'title' : title, 'url' : url}, ignore_index=True)
```
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
560
570
580
590
600
610
620
630
640
650
660
670
680
690
700
710
720
730
740
750
760
770
780
790
800
810
820
830
840
850
860
870
880
890
900
910
920
930
940
950
960
970
980
990
1000
1010
1020
1030
1040
1050
1060
1070
1080
1090
1100
1110
1120
1130
1140
1150
1160
1170
1180
1190
1200
1210
1220
1230
1240
1250
1260
1270
1280
1290
1300
1310
1320
1330
1340
1350
1360
1370
1380
1390
1400
1410
1420
1430
1440
1450
1460
1470
1480
1490
1500
1510
1520
1530
1540
1550
1560
1570
1580
1590
1600
1610
1620
1630
1640
1650
1660
1670
1680
1690
1700
1710
1720
1730
1740
1750
```python
sherpa_journal
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>title</th>
<th>url</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>532</td>
<td>Tissue Antigens</td>
<td>http://onlinelibrary.wiley.com/journal/10.1111...</td>
</tr>
<tr>
<td>1</td>
<td>498</td>
<td>Accounts of Chemical Research</td>
<td>http://pubs.acs.org/journal/achre4</td>
</tr>
<tr>
<td>2</td>
<td>498</td>
<td>Accounts of Chemical Research</td>
<td>http://pubs.acs.org/journal/achre4</td>
</tr>
<tr>
<td>3</td>
<td>789</td>
<td>The Journal of the Acoustical Society of America</td>
<td>http://asa.scitation.org/journal/jas</td>
</tr>
<tr>
<td>4</td>
<td>166</td>
<td>Acta Neurochirurgica</td>
<td>http://link.springer.com/journal/701</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1238</td>
<td>80</td>
<td>Physical Review D</td>
<td>http://prd.aps.org/</td>
</tr>
<tr>
<td>1239</td>
<td>80</td>
<td>Physical Review D</td>
<td>http://prd.aps.org/</td>
</tr>
<tr>
<td>1240</td>
<td>533</td>
<td>Physical Review E</td>
<td>http://journals.aps.org/pre/abstract/10.1103/P...</td>
</tr>
<tr>
<td>1241</td>
<td>533</td>
<td>Physical Review E</td>
<td>http://journals.aps.org/pre/abstract/10.1103/P...</td>
</tr>
<tr>
<td>1242</td>
<td>608</td>
<td>Physical Review Materials</td>
<td>http://journals.aps.org/prmaterials/</td>
</tr>
</tbody>
</table>
<p>1243 rows × 3 columns</p>
</div>
```python
# extraction des informations à partir des données Sherpa/Romeo
for index, row in issn.iterrows():
journal_id = row['journal']
journal_issn = row['issn']
# boucle des fichiers json
# test d'existance du fichier
# print(row['format'])
if (((index/10) - int(index/10)) == 0) :
print(index)
if os.path.exists('sherpa/data/' + journal_issn + '.json'):
with open('sherpa/data/' + journal_issn + '.json', 'r', encoding='utf-8') as f:
myissn = np.nan
mytype = np.nan
data = json.load(f)
if (len(data['items']) > 0):
if ('issns' in data['items'][0]):
issns = data['items'][0]['issns']
for i in issns:
if ('issn' in i):
myissn = i['issn']
if ('type' in i):
mytype = i['type']
sherpa_issn = sherpa_issn.append({'issn' : myissn, 'type' : mytype}, ignore_index=True)
```
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
560
570
580
590
600
610
620
630
640
650
660
670
680
690
700
710
720
730
740
750
760
770
780
790
800
810
820
830
840
850
860
870
880
890
900
910
920
930
940
950
960
970
980
990
1000
1010
1020
1030
1040
1050
1060
1070
1080
1090
1100
1110
1120
1130
1140
1150
1160
1170
1180
1190
1200
1210
1220
1230
1240
1250
1260
1270
1280
1290
1300
1310
1320
1330
1340
1350
1360
1370
1380
1390
1400
1410
1420
1430
1440
1450
1460
1470
1480
1490
1500
1510
1520
1530
1540
1550
1560
1570
1580
1590
1600
1610
1620
1630
1640
1650
1660
1670
1680
1690
1700
1710
1720
1730
1740
1750
```python
sherpa_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>print</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>electronic</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>print</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>electronic</td>
</tr>
<tr>
<td>4</td>
<td>0001-4842</td>
<td>print</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>2196</td>
<td>2470-0045</td>
<td>print</td>
</tr>
<tr>
<td>2197</td>
<td>2470-0053</td>
<td>electronic</td>
</tr>
<tr>
<td>2198</td>
<td>2470-0045</td>
<td>print</td>
</tr>
<tr>
<td>2199</td>
<td>2470-0053</td>
<td>electronic</td>
</tr>
<tr>
<td>2200</td>
<td>2475-9953</td>
<td>electronic</td>
</tr>
</tbody>
</table>
<p>2201 rows × 2 columns</p>
</div>
```python
# dedup
sherpa_issn = sherpa_issn.drop_duplicates()
sherpa_issn
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>print</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>electronic</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>print</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>electronic</td>
</tr>
<tr>
<td>6</td>
<td>0001-4966</td>
<td>print</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>2192</td>
<td>2470-0010</td>
<td>print</td>
</tr>
<tr>
<td>2193</td>
<td>2470-0029</td>
<td>electronic</td>
</tr>
<tr>
<td>2196</td>
<td>2470-0045</td>
<td>print</td>
</tr>
<tr>
<td>2197</td>
<td>2470-0053</td>
<td>electronic</td>
</tr>
<tr>
<td>2200</td>
<td>2475-9953</td>
<td>electronic</td>
</tr>
</tbody>
</table>
<p>1333 rows × 2 columns</p>
</div>
```python
# completer le fichier des issns avec les types de sherpa
issn2 = pd.merge(issn, sherpa_issn, on='issn', how='left')
issn2
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
<td>PRINT</td>
<td>1</td>
<td>1</td>
<td>print</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>0001-2815</td>
<td>532</td>
<td>NaN</td>
<td>3</td>
<td>2</td>
<td>electronic</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
<td>PRINT</td>
<td>1</td>
<td>3</td>
<td>print</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>0001-4842</td>
<td>498</td>
<td>NaN</td>
<td>3</td>
<td>4</td>
<td>electronic</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
<td>PRINT</td>
<td>1</td>
<td>5</td>
<td>print</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
<td>OTHER</td>
<td>3</td>
<td>1756</td>
<td>print</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>2470-0045</td>
<td>533</td>
<td>NaN</td>
<td>3</td>
<td>1757</td>
<td>electronic</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1758</td>
<td>electronic</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
<td>PRINT</td>
<td>1</td>
<td>1759</td>
<td>NaN</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>3</td>
<td>1760</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>1760 rows × 7 columns</p>
</div>
```python
# exports csv
publisher_sherpa_dedup.to_csv('sample/publisher_sherpa.tsv', sep='\t', encoding='utf-8', index=False)
sherpa_match_issn.to_csv('sample/sherpa_match_issn.tsv', sep='\t', encoding='utf-8', index=False)
sherpa_journal.to_csv('sample/sherpa_journal.tsv', sep='\t', encoding='utf-8', index=False)
issn2.to_csv('sample/issn_sherpa.tsv', sep='\t', encoding='utf-8', index=False)
journals_not_sherpa.to_csv('sample/journals_not_sherpa.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# exports excel
publisher_sherpa_dedup.to_excel('sample/publisher_sherpa.xlsx', index=False)
sherpa_match_issn.to_excel('sample/sherpa_match_issn.xlsx', index=False)
sherpa_journal.to_excel('sample/sherpa_journal.xlsx', index=False)
issn2.to_excel('sample/issn_sherpa.xlsx', index=False)
journals_not_sherpa.to_excel('sample/journals_not_sherpa.xlsx', index=False)
```
```python
# ajout des titres Sherpa a la table des revues
# renommer les colonnes
sherpa_journal = sherpa_journal.rename(columns={'journal' : 'id'})
journal = pd.merge(journal, sherpa_journal, on='id', how='left')
journal
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>issn</th>
<th>issnl</th>
<th>title_x</th>
<th>starting_year</th>
<th>end_year</th>
<th>url_x</th>
<th>name_short_iso_4</th>
<th>language</th>
<th>country</th>
<th>doaj_title</th>
<th>doaj_seal</th>
<th>APC</th>
<th>doaj_status</th>
<th>lockss_title</th>
<th>lockss</th>
<th>portico_status</th>
<th>portico</th>
<th>nlch_title</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>doublon_issnl</th>
<th>oa_status</th>
<th>publisher</th>
<th>title_y</th>
<th>url_y</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1660-9379</td>
<td>1660-9379</td>
<td>Revue médicale suisse</td>
<td>2005</td>
<td>9999</td>
<td>NaN</td>
<td>Rev. méd. suisse</td>
<td>138</td>
<td>215</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>0031-9007</td>
<td>0031-9007</td>
<td>Physical review letters (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>Phys. rev. lett. (Print)</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>1.0</td>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>http://prl.aps.org/</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>0031-9007</td>
<td>0031-9007</td>
<td>Physical review letters (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>Phys. rev. lett. (Print)</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>1.0</td>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>http://prl.aps.org/</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
<td>1932-6203</td>
<td>1932-6203</td>
<td>PloS one</td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>NaN</td>
<td>124</td>
<td>236</td>
<td>PLoS ONE</td>
<td>1.0</td>
<td>Yes</td>
<td>1.0</td>
<td>PLoS One</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>4.035714</td>
<td>NaN</td>
<td>5</td>
<td>3</td>
<td>PLoS ONE</td>
<td>http://www.plosone.org/</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>2174-8454</td>
<td>2174-8454</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td>NaN</td>
<td>EU-topías</td>
<td>124, 138, 402, 292</td>
<td>209</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>4, 5</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1341</td>
<td>998</td>
<td>0022-3468</td>
<td>0022-3468</td>
<td>Journal of pediatric surgery (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org</td>
<td>J. pediatr. surg. (Print)</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>75</td>
<td>Journal of Pediatric Surgery</td>
<td>http://www.jpedsurg.org/</td>
</tr>
<tr>
<td>1342</td>
<td>999</td>
<td>1432-2064</td>
<td>0178-8051</td>
<td>Probability theory and related fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>124</td>
<td>83</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>preserved</td>
<td>1.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>8</td>
<td>Probability Theory and Related Fields</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
</tr>
<tr>
<td>1343</td>
<td>999</td>
<td>1432-2064</td>
<td>0178-8051</td>
<td>Probability theory and related fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>124</td>
<td>83</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>preserved</td>
<td>1.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>8</td>
<td>Probability Theory and Related Fields</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
</tr>
<tr>
<td>1344</td>
<td>1000</td>
<td>0960-1481</td>
<td>0960-1481</td>
<td>Renewable energy</td>
<td>1991</td>
<td>9999</td>
<td>NaN</td>
<td>Renew. energy</td>
<td>124</td>
<td>234</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>119</td>
<td>Renewable Energy</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
</tr>
<tr>
<td>1345</td>
<td>1001</td>
<td>0161-7567</td>
<td>0161-7567</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>217</td>
<td>NaN</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>1346 rows × 26 columns</p>
</div>
```python
# choix du titre et url
journal['url'] = journal['url_y']
journal.loc[journal['url_y'].isna(), 'url'] = journal['url_x']
journal['title'] = journal['title_y']
journal.loc[journal['title_y'].isna(), 'title'] = journal['title_x']
journal
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>issn</th>
<th>issnl</th>
<th>title_x</th>
<th>starting_year</th>
<th>end_year</th>
<th>url_x</th>
<th>name_short_iso_4</th>
<th>language</th>
<th>country</th>
<th>doaj_title</th>
<th>doaj_seal</th>
<th>APC</th>
<th>doaj_status</th>
<th>lockss_title</th>
<th>lockss</th>
<th>portico_status</th>
<th>portico</th>
<th>nlch_title</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>doublon_issnl</th>
<th>oa_status</th>
<th>publisher</th>
<th>title_y</th>
<th>url_y</th>
<th>url</th>
<th>title</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1660-9379</td>
<td>1660-9379</td>
<td>Revue médicale suisse</td>
<td>2005</td>
<td>9999</td>
<td>NaN</td>
<td>Rev. méd. suisse</td>
<td>138</td>
<td>215</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>Revue médicale suisse</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>0031-9007</td>
<td>0031-9007</td>
<td>Physical review letters (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>Phys. rev. lett. (Print)</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>1.0</td>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>http://prl.aps.org/</td>
<td>http://prl.aps.org/</td>
<td>Physical Review Letters</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>0031-9007</td>
<td>0031-9007</td>
<td>Physical review letters (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>Phys. rev. lett. (Print)</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>1.0</td>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>http://prl.aps.org/</td>
<td>http://prl.aps.org/</td>
<td>Physical Review Letters</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
<td>1932-6203</td>
<td>1932-6203</td>
<td>PloS one</td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>NaN</td>
<td>124</td>
<td>236</td>
<td>PLoS ONE</td>
<td>1.0</td>
<td>Yes</td>
<td>1.0</td>
<td>PLoS One</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>4.035714</td>
<td>NaN</td>
<td>5</td>
<td>3</td>
<td>PLoS ONE</td>
<td>http://www.plosone.org/</td>
<td>http://www.plosone.org/</td>
<td>PLoS ONE</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>2174-8454</td>
<td>2174-8454</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td>NaN</td>
<td>EU-topías</td>
<td>124, 138, 402, 292</td>
<td>209</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>4, 5</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>EU-topías</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1341</td>
<td>998</td>
<td>0022-3468</td>
<td>0022-3468</td>
<td>Journal of pediatric surgery (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org</td>
<td>J. pediatr. surg. (Print)</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>75</td>
<td>Journal of Pediatric Surgery</td>
<td>http://www.jpedsurg.org/</td>
<td>http://www.jpedsurg.org/</td>
<td>Journal of Pediatric Surgery</td>
</tr>
<tr>
<td>1342</td>
<td>999</td>
<td>1432-2064</td>
<td>0178-8051</td>
<td>Probability theory and related fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>124</td>
<td>83</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>preserved</td>
<td>1.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>8</td>
<td>Probability Theory and Related Fields</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>Probability Theory and Related Fields</td>
</tr>
<tr>
<td>1343</td>
<td>999</td>
<td>1432-2064</td>
<td>0178-8051</td>
<td>Probability theory and related fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>124</td>
<td>83</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>preserved</td>
<td>1.0</td>
<td>Probability Theory and Related Fields</td>
<td>1.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>8</td>
<td>Probability Theory and Related Fields</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>Probability Theory and Related Fields</td>
</tr>
<tr>
<td>1344</td>
<td>1000</td>
<td>0960-1481</td>
<td>0960-1481</td>
<td>Renewable energy</td>
<td>1991</td>
<td>9999</td>
<td>NaN</td>
<td>Renew. energy</td>
<td>124</td>
<td>234</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>preserved</td>
<td>1.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>119</td>
<td>Renewable Energy</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>Renewable Energy</td>
</tr>
<tr>
<td>1345</td>
<td>1001</td>
<td>0161-7567</td>
<td>0161-7567</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>124</td>
<td>236</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>0.0</td>
<td>NaN</td>
<td>NaN</td>
<td>1</td>
<td>217</td>
<td>NaN</td>
<td>NaN</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>Journal of applied physiology: respiratory, en...</td>
</tr>
</tbody>
</table>
<p>1346 rows × 28 columns</p>
</div>
```python
journals_export = journal[['id', 'title', 'name_short_iso_4', 'starting_year', 'end_year', 'url', 'country', 'language', 'oa_status', 'publisher', 'doaj_seal', 'doaj_status', 'lockss', 'portico', 'nlch', 'qoam_av_score']]
journals_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>title</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>url</th>
<th>country</th>
<th>language</th>
<th>oa_status</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td>NaN</td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>2</td>
<td>NaN</td>
<td>0.0</td>
<td>0.0</td>
<td>1.0</td>
<td>0.0</td>
<td>NaN</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>2</td>
<td>NaN</td>
<td>0.0</td>
<td>0.0</td>
<td>1.0</td>
<td>0.0</td>
<td>NaN</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
<td>PLoS ONE</td>
<td>NaN</td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>5</td>
<td>3</td>
<td>1.0</td>
<td>1.0</td>
<td>1.0</td>
<td>0.0</td>
<td>0.0</td>
<td>4.035714</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td>NaN</td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>1</td>
<td>4, 5</td>
<td>NaN</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1341</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>75</td>
<td>NaN</td>
<td>0.0</td>
<td>0.0</td>
<td>1.0</td>
<td>0.0</td>
<td>NaN</td>
</tr>
<tr>
<td>1342</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>1</td>
<td>8</td>
<td>NaN</td>
<td>0.0</td>
<td>1.0</td>
<td>1.0</td>
<td>1.0</td>
<td>NaN</td>
</tr>
<tr>
<td>1343</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>1</td>
<td>8</td>
<td>NaN</td>
<td>0.0</td>
<td>1.0</td>
<td>1.0</td>
<td>1.0</td>
<td>NaN</td>
</tr>
<tr>
<td>1344</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>119</td>
<td>NaN</td>
<td>0.0</td>
<td>0.0</td>
<td>1.0</td>
<td>0.0</td>
<td>NaN</td>
</tr>
<tr>
<td>1345</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>217</td>
<td>NaN</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>1346 rows × 16 columns</p>
</div>
```python
# renommage des champs finaux
journals_export = journals_export.rename(columns={'title' : 'name', 'url' : 'website'})
# remplacement des vides et id à int
journals_export['starting_year'] = journals_export['starting_year'].fillna(0)
journals_export['end_year'] = journals_export['end_year'].fillna(9999)
journals_export['name_short_iso_4'] = journals_export['name_short_iso_4'].fillna('')
journals_export['website'] = journals_export['website'].fillna('')
journals_export['doaj_seal'] = journals_export['doaj_seal'].fillna('0')
journals_export['country'] = journals_export['country'].fillna('999999')
journals_export['language'] = journals_export['language'].fillna('999999')
journals_export['doaj_status'] = journals_export['doaj_status'].astype(int)
journals_export['doaj_seal'] = journals_export['doaj_seal'].astype(int)
journals_export['lockss'] = journals_export['lockss'].astype(int)
journals_export['portico'] = journals_export['portico'].astype(int)
journals_export['nlch'] = journals_export['nlch'].astype(int)
journals_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
<td>PLoS ONE</td>
<td></td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>5</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td></td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>1</td>
<td>4, 5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1341</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1342</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>1</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
</tr>
<tr>
<td>1343</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>1</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
</tr>
<tr>
<td>1344</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>119</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1345</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>217</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>1346 rows × 16 columns</p>
</div>
```python
journals_export = journals_export.drop_duplicates(subset='id')
journals_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
<td>PLoS ONE</td>
<td></td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>5</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td></td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>1</td>
<td>4, 5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>5</td>
<td>5</td>
<td>Physical review B: Condensed matter and materi...</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>1998</td>
<td>2015</td>
<td>http://journals.aps.org/prb/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1339</td>
<td>997</td>
<td>Smart Materials and Structures</td>
<td>Smart mater. struct. (Print)</td>
<td>1992</td>
<td>9999</td>
<td>http://iopscience.iop.org/0964-1726</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1341</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1342</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>1</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
</tr>
<tr>
<td>1344</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>119</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1345</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>217</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>911 rows × 16 columns</p>
</div>
```python
# test journaux sans titre
journals_export.loc[journals_export['name'].isna()]
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
```python
# export et suppression des journaux sans titre
# export csv
journals_export.loc[journals_export['name'].isna()].to_csv('sample/sherpa_journals_without_title.tsv', sep='\t', encoding='utf-8', index=False)
# export excel
journals_export.loc[journals_export['name'].isna()].to_excel('sample/sherpa_journals_without_title.xlsx', index=False)
journals_export = journals_export.loc[journals_export['name'].notna()]
journals_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
<td>PLoS ONE</td>
<td></td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>5</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td></td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>1</td>
<td>4, 5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>5</td>
<td>5</td>
<td>Physical review B: Condensed matter and materi...</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>1998</td>
<td>2015</td>
<td>http://journals.aps.org/prb/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1339</td>
<td>997</td>
<td>Smart Materials and Structures</td>
<td>Smart mater. struct. (Print)</td>
<td>1992</td>
<td>9999</td>
<td>http://iopscience.iop.org/0964-1726</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1341</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1342</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>1</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
</tr>
<tr>
<td>1344</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>119</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1345</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>217</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>911 rows × 16 columns</p>
</div>
```python
journals_export.loc[journals_export['name'].str.contains('(Print)')]
```
C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\strings.py:1843: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
return func(self, *args, **kwargs)
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
</tr>
</thead>
<tbody>
<tr>
<td>86</td>
<td>54</td>
<td>Helvetica physica acta (Print)</td>
<td>Helv. phys. acta (Print)</td>
<td>1928</td>
<td>1999</td>
<td></td>
<td>215</td>
<td>124, 138, 151</td>
<td>1</td>
<td>41</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>239</td>
<td>155</td>
<td>Studies in health technology and informatics (...</td>
<td>Stud. health technol. inform. (Print)</td>
<td>1991</td>
<td>9999</td>
<td></td>
<td>156</td>
<td>124</td>
<td>1</td>
<td>90</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>441</td>
<td>306</td>
<td>Bioethica Forum (Basel. 2008. Print)</td>
<td>Bioeth. Forum (Basel, 2008, Print)</td>
<td>2008</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>138, 124, 151</td>
<td>1</td>
<td>143</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>534</td>
<td>373</td>
<td>Schweizerische Ärztezeitung (Print)</td>
<td>Schweiz. Ärzteztg. (Print)</td>
<td>1952</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>203, 151, 138</td>
<td>1</td>
<td>170</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>601</td>
<td>430</td>
<td>The European physical journal. B, Condensed ma...</td>
<td>Eur. phys. j., B Cond. matter phys. (Print)</td>
<td>1998</td>
<td>9999</td>
<td></td>
<td>76</td>
<td>124</td>
<td>1</td>
<td>195, 43</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1.25</td>
</tr>
<tr>
<td>650</td>
<td>467</td>
<td>Conference on Lasers and Electro-optics (Print)</td>
<td>Conf. Lasers Electro-opt. (Print)</td>
<td>2003</td>
<td>9999</td>
<td>http://www.cleoconference.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>39</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>850</td>
<td>618</td>
<td>Swiss archives of neurology, psychiatry and ps...</td>
<td>Swiss arch. neurol. psychiatry psychother. (Pr...</td>
<td>2016</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>151, 124, 138</td>
<td>6</td>
<td>20</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>901</td>
<td>660</td>
<td>Journal der Deutschen Dermatologischen Gesells...</td>
<td></td>
<td>2003</td>
<td>9999</td>
<td></td>
<td>234</td>
<td>151, 124</td>
<td>1</td>
<td>283</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>957</td>
<td>702</td>
<td>IEEE/LEOS International Conference on Optical ...</td>
<td>IEEE/LEOS Int. Conf. Opt. MEMS Nanophotonics (...</td>
<td>2007</td>
<td>20uu</td>
<td>http://ieeexplore.ieee.org/xpl/conhome.jsp?pun...</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>280</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1104</td>
<td>814</td>
<td>Forumpoenale (Print)</td>
<td>Forumpoenale (Print)</td>
<td>2008</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>151, 203, 138</td>
<td>1</td>
<td>204</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1182</td>
<td>877</td>
<td>Gesnerus (Print)</td>
<td>Gesnerus (Print)</td>
<td>1943</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>124, 138, 151, 203</td>
<td>1</td>
<td>143</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1336</td>
<td>994</td>
<td>GG@G (Print)</td>
<td>GG@G (Print)</td>
<td>2000</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>124</td>
<td>1</td>
<td>380</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
</tbody>
</table>
</div>
```python
journals_export.loc[journals_export['name'].str.contains('(Online)')]
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
</tr>
</thead>
<tbody>
<tr>
<td>1257</td>
<td>936</td>
<td>Plastic and reconstructive surgery (Online)</td>
<td>Plast. reconstr. surg. (Online)</td>
<td>1963</td>
<td>9999</td>
<td>http://gateway.ovid.com/ovidweb.cgi?T=JS&amp;MODE=...</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>363</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
</tbody>
</table>
</div>
```python
# remplacement des mentions " (Print)" et " (Online)" dans les titres
journals_export['name'] = journals_export['name'].str.replace('(Print)', '')
journals_export['name'] = journals_export['name'].str.replace('(Online)', '')
journals_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
<td>PLoS ONE</td>
<td></td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>5</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td></td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>1</td>
<td>4, 5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>5</td>
<td>5</td>
<td>Physical review B: Condensed matter and materi...</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>1998</td>
<td>2015</td>
<td>http://journals.aps.org/prb/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1339</td>
<td>997</td>
<td>Smart Materials and Structures</td>
<td>Smart mater. struct. (Print)</td>
<td>1992</td>
<td>9999</td>
<td>http://iopscience.iop.org/0964-1726</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1341</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1342</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>1</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
</tr>
<tr>
<td>1344</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>119</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1345</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>217</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>911 rows × 16 columns</p>
</div>
```python
journals_export.loc[journals_export['name'].str.contains('(Print)')]
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
```python
journals_export.loc[journals_export['name'].str.contains('(Online)')]
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
## Table sherpa_policies
```python
# creation du DF
col_names = ['journal',
'issn',
'sherpa_id',
'sherpa_uri',
'open_access_prohibited',
'additional_oa_fee',
'article_version',
'license',
'embargo',
'prerequisites',
'prerequisite_funders',
'prerequisite_funders_name',
'prerequisite_funders_fundref',
'prerequisite_funders_ror',
'prerequisite_funders_country',
'prerequisite_funders_url',
'prerequisite_funders_sherpa_id',
'prerequisite_subjects',
'location',
'locations_ir',
'locations_not_ir',
'named_repository',
'named_academic_social_network',
'copyright_owner',
'publisher_deposit',
'archiving',
'conditions',
'public_notes'
]
sherpa_policies = pd.DataFrame(columns = col_names)
sherpa_policies
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>issn</th>
<th>sherpa_id</th>
<th>sherpa_uri</th>
<th>open_access_prohibited</th>
<th>additional_oa_fee</th>
<th>article_version</th>
<th>license</th>
<th>embargo</th>
<th>prerequisites</th>
<th>prerequisite_funders</th>
<th>prerequisite_funders_name</th>
<th>prerequisite_funders_fundref</th>
<th>prerequisite_funders_ror</th>
<th>prerequisite_funders_country</th>
<th>prerequisite_funders_url</th>
<th>prerequisite_funders_sherpa_id</th>
<th>prerequisite_subjects</th>
<th>location</th>
<th>locations_ir</th>
<th>locations_not_ir</th>
<th>named_repository</th>
<th>named_academic_social_network</th>
<th>copyright_owner</th>
<th>publisher_deposit</th>
<th>archiving</th>
<th>conditions</th>
<th>public_notes</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
```python
# dédoublonage par journal id
issn_dedup = issn.drop_duplicates(subset='journal')
issn_dedup
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
<td>PRINT</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
<td>PRINT</td>
<td>1</td>
<td>3</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
<td>PRINT</td>
<td>1</td>
<td>5</td>
</tr>
<tr>
<td>7</td>
<td>0001-6268</td>
<td>0001-6268</td>
<td>166</td>
<td>PRINT</td>
<td>1</td>
<td>8</td>
</tr>
<tr>
<td>9</td>
<td>0001-6322</td>
<td>0001-6322</td>
<td>807</td>
<td>PRINT</td>
<td>1</td>
<td>10</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1751</td>
<td>2469-9950</td>
<td>2469-9950</td>
<td>41</td>
<td>PRINT</td>
<td>1</td>
<td>1752</td>
</tr>
<tr>
<td>1753</td>
<td>2470-0010</td>
<td>2470-0010</td>
<td>80</td>
<td>PRINT</td>
<td>1</td>
<td>1754</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
<td>OTHER</td>
<td>3</td>
<td>1756</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1758</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
<td>PRINT</td>
<td>1</td>
<td>1759</td>
</tr>
</tbody>
</table>
<p>909 rows × 6 columns</p>
</div>
```python
# type de repositories qui provoquent archiving = 1 :
# tous les types : 'academic_social_network', 'any_repository', 'any_website', 'authors_homepage',
# 'funder_designated_location', 'institutional_repository', 'institutional_website', 'named_academic_social_network',
# 'named_repository', 'non_commercial_institutional_repository', 'non_commercial_repository',
# 'non_commercial_social_network', 'non_commercial_subject_repository', 'non_commercial_website',
# 'preprint_repository', 'subject_repository', 'this_journal'
repositories_archiving = ['any_repository',
'institutional_repository',
'institutional_website',
'non_commercial_institutional_repository',
'non_commercial_repository',
'any_website',
'non_commercial_website']
# extraction des termes
for index, row in issn_dedup.iterrows():
journal_id = row['journal']
journal_issn = row['issn']
# boucle des fichiers json
# print(row['format'])
if (((index/10) - int(index/10)) == 0) :
print(index)
# test d'existance du fichier
if os.path.exists('sherpa/data/' + journal_issn + '.json'):
with open('sherpa/data/' + journal_issn + '.json', 'r', encoding='utf-8') as f:
data = json.load(f)
# initialisation des variables à extraire
sherpa_id = np.nan
sherpa_uri = np.nan
open_access_prohibited = np.nan
location = np.nan
locations_ir = ''
locations_not_ir = ''
additional_oa_fee = np.nan
article_versions = np.nan
article_version = np.nan
licenses = []
embargo = 0
prerequisites = np.nan
prerequisite_funders = np.nan
prerequisite_funders_name = np.nan
prerequisite_funders_fundref = np.nan
prerequisite_funders_ror = np.nan
prerequisite_funders_country = np.nan
prerequisite_funders_url = np.nan
prerequisite_funders_sherpa_id = np.nan
prerequisite_subjects = np.nan
named_repository = np.nan
named_academic_social_network = np.nan
copyright_owner = np.nan
publisher_deposit = np.nan
archiving = np.nan
conditions = np.nan
public_notes = np.nan
if (len(data['items']) > 0):
if ('id' in data['items'][0]):
sherpa_id = data['items'][0]['id']
# test si l'id est déjà présent
if sherpa_id in sherpa_policies['sherpa_id'] :
print('SKIP ' + str(sherpa_id))
else :
poilicies = data['items'][0]['publisher_policy']
for poilicy in poilicies:
# initialisation des variables à extraire
sherpa_uri = np.nan
open_access_prohibited = np.nan
if ('uri' in poilicy):
sherpa_uri = poilicy['uri']
if ('open_access_prohibited' in poilicy):
open_access_prohibited = poilicy['open_access_prohibited']
if ('permitted_oa' in poilicy):
poas = poilicy['permitted_oa']
for poa in poas:
additional_oa_fee = np.nan
article_versions = np.nan
article_version = np.nan
licenses = []
embargo = 0
prerequisites = np.nan
prerequisite_funders = np.nan
prerequisite_funders_name = np.nan
prerequisite_funders_fundref = np.nan
prerequisite_funders_ror = np.nan
prerequisite_funders_country = np.nan
prerequisite_funders_url = np.nan
prerequisite_funders_sherpa_id = np.nan
prerequisite_subjects = np.nan
named_repository = np.nan
named_academic_social_network = np.nan
locations_ir = ''
locations_not_ir = ''
copyright_owner = np.nan
conditions = np.nan
public_notes = np.nan
if ('additional_oa_fee' in poa):
additional_oa_fee = poa['additional_oa_fee']
if ('location' in poa):
archiving = 0
location = ''
mylocations = poa['location']['location']
mylocations_text = poa['location']['location_phrases']
if (type(mylocations) is not list):
mylocations = [mylocations]
location = ' ; '.join(mylocations)
for locationi in mylocations:
if locationi in repositories_archiving :
archiving = archiving + 1
for locationi_text in mylocations_text:
if locationi_text['value'] == locationi :
if locations_ir == '':
locations_ir = locations_ir + locationi_text['phrase']
else :
if locationi_text['phrase'] not in locations_ir :
locations_ir = locations_ir + ' ; ' + locationi_text['phrase']
else :
for locationi_text in mylocations_text:
if locationi_text['value'] == locationi :
if locations_not_ir == '':
locations_not_ir = locations_not_ir + locationi_text['phrase']
else :
if locationi_text['phrase'] not in locations_not_ir :
locations_not_ir = locations_not_ir + ' ; ' + locationi_text['phrase']
# print (archiving)
if archiving > 0:
archiving = True
else :
archiving = False
if ('named_repository' in poa['location']):
if (type(poa['location']['named_repository']) is list):
named_repository = ' ; '.join(poa['location']['named_repository'])
else :
named_repository = poa['location']['named_repository']
locations_not_ir = locations_not_ir.replace('Named Repository', named_repository)
locations_ir = locations_ir.replace('Named Repository', named_repository)
if ('named_academic_social_network' in poa['location']):
if (type(poa['location']['named_academic_social_network']) is list):
named_academic_social_network = ' ; '.join(poa['location']['named_academic_social_network'])
else :
named_academic_social_network = poa['location']['named_academic_social_network']
locations_not_ir = locations_not_ir.replace('Named Academic Social Network', named_academic_social_network)
locations_ir = locations_ir.replace('Named Academic Social Network', named_academic_social_network)
if ('embargo' in poa):
# print(poa['embargo'])
embargo_amount = 0
if ('amount' in poa['embargo']):
embargo_amount = poa['embargo']['amount']
if ('units' in poa['embargo']):
if (poa['embargo']['units'] == 'months') :
embargo = embargo_amount
elif (poa['embargo']['units'] == 'years') :
embargo = embargo_amount*12
elif (poa['embargo']['units'] == 'weeks') :
embargo = int(embargo_amount/4)
if (embargo == 0):
embargo = 1
elif (poa['embargo']['units'] == 'days') :
embargo = int(embargo_amount/30)
if (embargo == 0):
embargo = 1
else :
embargo = embargo_amount
if ('prerequisites' in poa):
if 'prerequisites' in poa['prerequisites'] :
if (type(poa['prerequisites']['prerequisites']) is list):
prerequisites = ' ; '.join(poa['prerequisites']['prerequisites'])
else:
prerequisites = poa['prerequisites']['prerequisites']
if ('prerequisite_funders' in poa['prerequisites']):
prerequisite_funders = True
# prerequisite_funders = poa['prerequisites']['prerequisite_funders']
# if (type(poa['prerequisites']['prerequisite_funders']) is list):
# prerequisite_funders = ' ; '.join(poa['prerequisites']['prerequisite_funders'])
# else:
# prerequisite_funders = poa['prerequisites']['prerequisite_funders']
if ('prerequisite_subjects' in poa['prerequisites']):
prerequisite_subjects = True
# prerequisite_subjects = poa['prerequisites']['prerequisite_subjects']
# if (type(poa['prerequisite_subjects']) is list):
# prerequisite_subjects = ' ; '.join(poa['prerequisite_subjects'])
# else:
# prerequisite_subjects = poa['prerequisite_subjects']
if ('copyright_owner' in poa):
copyright_owner = poa['copyright_owner']
if ('publisher_deposit' in poa):
publisher_deposit = ''
if (type(poa['publisher_deposit']) is list):
for deposit in poa['publisher_deposit']:
if 'type' in deposit['repository_metadata']:
publisher_deposit = publisher_deposit + deposit['repository_metadata']['type']
if 'name' in deposit['repository_metadata']:
publisher_deposit = publisher_deposit + ' (' + deposit['repository_metadata']['name'][0]['name'] + ')'
else :
if 'name' in deposit['repository_metadata']:
publisher_deposit = publisher_deposit + deposit['repository_metadata']['name'][0]['name']
publisher_deposit = publisher_deposit + ' ; '
else :
deposit = poa['publisher_deposit']
if 'type' in deposit['repository_metadata']:
publisher_deposit = publisher_deposit + deposit['repository_metadata']['type']
if 'name' in deposit['repository_metadata']:
publisher_deposit = publisher_deposit + ' (' + deposit['repository_metadata']['name'][0]['name'] + ')'
else :
if 'name' in deposit['repository_metadata']:
publisher_deposit = publisher_deposit + deposit['repository_metadata']['name'][0]['name']
publisher_deposit = publisher_deposit + ' ; '
# print (publisher_deposit)
if ('conditions' in poa):
if (type(poa['conditions']) is list):
conditions = ' ; '.join(poa['conditions'])
else:
conditions = poa['conditions']
if ('public_notes' in poa):
if (type(poa['public_notes']) is list):
public_notes = ' ; '.join(poa['public_notes'])
else:
public_notes = poa['public_notes']
if ('license' in poa):
licenses = poa['license']
if (type(licenses) is not list):
licenses = [licenses]
else :
licenses = ['']
# avec article version
if ('article_version' in poa):
article_versions = poa['article_version']
for article_version in article_versions:
for license in licenses:
if ('license' in license):
mylicense = license['license']
else :
mylicense = ''
# avec prerequisites
if ('prerequisites' in poa) :
# avec prerequisites_funders
if ('prerequisite_funders' in poa['prerequisites']):
for prerequisite_fundersi in poa['prerequisites']['prerequisite_funders'] :
prerequisite_funders_name = prerequisite_fundersi['funder_metadata']['name'][0]['name']
if 'acronym' in prerequisite_fundersi['funder_metadata']['name'][0]:
prerequisite_funders_name = prerequisite_funders_name + ' (' + prerequisite_fundersi['funder_metadata']['name'][0]['acronym'] + ')'
if 'identifiers' in prerequisite_fundersi['funder_metadata'] :
for fund_identifier in prerequisite_fundersi['funder_metadata']['identifiers'] :
if fund_identifier['type'] == 'fundref':
prerequisite_funders_fundref = fund_identifier['identifier']
if fund_identifier['type'] == 'ror':
prerequisite_funders_ror = fund_identifier['identifier']
if 'country' in prerequisite_fundersi['funder_metadata']:
prerequisite_funders_country = prerequisite_fundersi['funder_metadata']['country']
if 'url' in prerequisite_fundersi['funder_metadata']:
prerequisite_funders_url = prerequisite_fundersi['funder_metadata']['url'][0]['url']
prerequisite_funders_sherpa_id = prerequisite_fundersi['funder_metadata']['id']
sherpa_policies = sherpa_policies.append({'journal' : journal_id,
'issn' : journal_issn,
'sherpa_id' : sherpa_id,
'sherpa_uri' : sherpa_uri,
'open_access_prohibited' : open_access_prohibited,
'additional_oa_fee' : additional_oa_fee,
'article_version' : article_version,
'license' : mylicense,
'embargo' : embargo,
'prerequisites' : prerequisites,
'prerequisite_funders' : prerequisite_funders,
'prerequisite_funders_name' : prerequisite_funders_name,
'prerequisite_funders_fundref' : prerequisite_funders_fundref,
'prerequisite_funders_ror' : prerequisite_funders_ror,
'prerequisite_funders_country' : prerequisite_funders_country,
'prerequisite_funders_url' : prerequisite_funders_url,
'prerequisite_funders_sherpa_id' : prerequisite_funders_sherpa_id,
'prerequisite_subjects' : prerequisite_subjects,
'location' : location,
'locations_ir' : locations_ir,
'locations_not_ir' : locations_not_ir,
'named_repository' : named_repository,
'named_academic_social_network' : named_academic_social_network,
'copyright_owner' : copyright_owner,
'publisher_deposit' : publisher_deposit,
'archiving' : archiving,
'conditions' : conditions,
'public_notes' : public_notes
}, ignore_index=True)
# sans prerequisites_funders
else :
sherpa_policies = sherpa_policies.append({'journal' : journal_id,
'issn' : journal_issn,
'sherpa_id' : sherpa_id,
'sherpa_uri' : sherpa_uri,
'open_access_prohibited' : open_access_prohibited,
'additional_oa_fee' : additional_oa_fee,
'article_version' : article_version,
'license' : mylicense,
'embargo' : embargo,
'prerequisites' : prerequisites,
'prerequisite_funders' : prerequisite_funders,
'prerequisite_funders_name' : prerequisite_funders_name,
'prerequisite_funders_fundref' : prerequisite_funders_fundref,
'prerequisite_funders_ror' : prerequisite_funders_ror,
'prerequisite_funders_country' : prerequisite_funders_country,
'prerequisite_funders_url' : prerequisite_funders_url,
'prerequisite_funders_sherpa_id' : prerequisite_funders_sherpa_id,
'prerequisite_subjects' : prerequisite_subjects,
'location' : location,
'locations_ir' : locations_ir,
'locations_not_ir' : locations_not_ir,
'named_repository' : named_repository,
'named_academic_social_network' : named_academic_social_network,
'copyright_owner' : copyright_owner,
'publisher_deposit' : publisher_deposit,
'archiving' : archiving,
'conditions' : conditions,
'public_notes' : public_notes
}, ignore_index=True)
# sans prerequisites
else :
sherpa_policies = sherpa_policies.append({'journal' : journal_id,
'issn' : journal_issn,
'sherpa_id' : sherpa_id,
'sherpa_uri' : sherpa_uri,
'open_access_prohibited' : open_access_prohibited,
'additional_oa_fee' : additional_oa_fee,
'article_version' : article_version,
'license' : mylicense,
'embargo' : embargo,
'prerequisites' : prerequisites,
'prerequisite_funders' : prerequisite_funders,
'prerequisite_funders_name' : prerequisite_funders_name,
'prerequisite_funders_fundref' : prerequisite_funders_fundref,
'prerequisite_funders_ror' : prerequisite_funders_ror,
'prerequisite_funders_country' : prerequisite_funders_country,
'prerequisite_funders_url' : prerequisite_funders_url,
'prerequisite_funders_sherpa_id' : prerequisite_funders_sherpa_id,
'prerequisite_subjects' : prerequisite_subjects,
'location' : location,
'locations_ir' : locations_ir,
'locations_not_ir' : locations_not_ir,
'named_repository' : named_repository,
'named_academic_social_network' : named_academic_social_network,
'copyright_owner' : copyright_owner,
'publisher_deposit' : publisher_deposit,
'archiving' : archiving,
'conditions' : conditions,
'public_notes' : public_notes
}, ignore_index=True)
# sans article version
else :
if (type(licenses) is not list):
licenses = [licenses]
for license in licenses:
if ('license' in license):
mylicense = license['license']
else :
mylicense = ''
# avec prerequisites
if ('prerequisites' in poa) :
# avec prerequisites_funders
if ('prerequisite_funders' in poa['prerequisites']):
for prerequisite_fundersi in poa['prerequisites']['prerequisite_funders'] :
prerequisite_funders_name = prerequisite_fundersi['funder_metadata']['name'][0]['name']
if 'acronym' in prerequisite_fundersi['funder_metadata']['name'][0]:
prerequisite_funders_name = prerequisite_funders_name + ' (' + prerequisite_fundersi['funder_metadata']['name'][0]['acronym'] + ')'
if 'identifiers' in prerequisite_fundersi['funder_metadata'] :
for fund_identifier in prerequisite_fundersi['funder_metadata']['identifiers'] :
if fund_identifier['type'] == 'fundref':
prerequisite_funders_fundref = fund_identifier['identifier']
if fund_identifier['type'] == 'ror':
prerequisite_funders_ror = fund_identifier['identifier']
if 'country' in prerequisite_fundersi['funder_metadata']:
prerequisite_funders_country = prerequisite_fundersi['funder_metadata']['country']
if 'url' in prerequisite_fundersi['funder_metadata']:
prerequisite_funders_url = prerequisite_fundersi['funder_metadata']['url'][0]['url']
prerequisite_funders_sherpa_id = prerequisite_fundersi['funder_metadata']['id']
sherpa_policies = sherpa_policies.append({'journal' : journal_id,
'issn' : journal_issn,
'sherpa_id' : sherpa_id,
'sherpa_uri' : sherpa_uri,
'open_access_prohibited' : open_access_prohibited,
'additional_oa_fee' : additional_oa_fee,
'article_version' : article_version,
'license' : mylicense,
'embargo' : embargo,
'prerequisites' : prerequisites,
'prerequisite_funders' : prerequisite_funders,
'prerequisite_funders_name' : prerequisite_funders_name,
'prerequisite_funders_fundref' : prerequisite_funders_fundref,
'prerequisite_funders_ror' : prerequisite_funders_ror,
'prerequisite_funders_country' : prerequisite_funders_country,
'prerequisite_funders_url' : prerequisite_funders_url,
'prerequisite_funders_sherpa_id' : prerequisite_funders_sherpa_id,
'prerequisite_subjects' : prerequisite_subjects,
'location' : location,
'locations_ir' : locations_ir,
'locations_not_ir' : locations_not_ir,
'named_repository' : named_repository,
'named_academic_social_network' : named_academic_social_network,
'copyright_owner' : copyright_owner,
'publisher_deposit' : publisher_deposit,
'archiving' : archiving,
'conditions' : conditions,
'public_notes' : public_notes
}, ignore_index=True)
# sans prerequisites_funders
else :
sherpa_policies = sherpa_policies.append({'journal' : journal_id,
'issn' : journal_issn,
'sherpa_id' : sherpa_id,
'sherpa_uri' : sherpa_uri,
'open_access_prohibited' : open_access_prohibited,
'additional_oa_fee' : additional_oa_fee,
'article_version' : article_version,
'license' : mylicense,
'embargo' : embargo,
'prerequisites' : prerequisites,
'prerequisite_funders' : prerequisite_funders,
'prerequisite_funders_name' : prerequisite_funders_name,
'prerequisite_funders_fundref' : prerequisite_funders_fundref,
'prerequisite_funders_ror' : prerequisite_funders_ror,
'prerequisite_funders_country' : prerequisite_funders_country,
'prerequisite_funders_url' : prerequisite_funders_url,
'prerequisite_funders_sherpa_id' : prerequisite_funders_sherpa_id,
'prerequisite_subjects' : prerequisite_subjects,
'location' : location,
'locations_ir' : locations_ir,
'locations_not_ir' : locations_not_ir,
'named_repository' : named_repository,
'named_academic_social_network' : named_academic_social_network,
'copyright_owner' : copyright_owner,
'publisher_deposit' : publisher_deposit,
'archiving' : archiving,
'conditions' : conditions,
'public_notes' : public_notes
}, ignore_index=True)
# sans prerequisites
else :
sherpa_policies = sherpa_policies.append({'journal' : journal_id,
'issn' : journal_issn,
'sherpa_id' : sherpa_id,
'sherpa_uri' : sherpa_uri,
'open_access_prohibited' : open_access_prohibited,
'additional_oa_fee' : additional_oa_fee,
'article_version' : article_version,
'license' : mylicense,
'embargo' : embargo,
'prerequisites' : prerequisites,
'prerequisite_funders' : prerequisite_funders,
'prerequisite_funders_name' : prerequisite_funders_name,
'prerequisite_funders_fundref' : prerequisite_funders_fundref,
'prerequisite_funders_ror' : prerequisite_funders_ror,
'prerequisite_funders_country' : prerequisite_funders_country,
'prerequisite_funders_url' : prerequisite_funders_url,
'prerequisite_funders_sherpa_id' : prerequisite_funders_sherpa_id,
'prerequisite_subjects' : prerequisite_subjects,
'location' : location,
'locations_ir' : locations_ir,
'locations_not_ir' : locations_not_ir,
'named_repository' : named_repository,
'named_academic_social_network' : named_academic_social_network,
'copyright_owner' : copyright_owner,
'publisher_deposit' : publisher_deposit,
'archiving' : archiving,
'conditions' : conditions,
'public_notes' : public_notes
}, ignore_index=True)
# sans permitted_oa
else :
print ('permitted_oa MISSING')
else :
print ('id MISSING')
```
0
20
40
50
60
SKIP 321
110
SKIP 475
SKIP 476
180
220
250
260
290
300
330
340
360
370
380
420
permitted_oa MISSING
430
permitted_oa MISSING
SKIP 1319
SKIP 880
permitted_oa MISSING
510
permitted_oa MISSING
530
540
550
560
SKIP 1342
570
590
SKIP 3082
SKIP 2465
SKIP 1682
SKIP 325
SKIP 3179
670
680
SKIP 1641
SKIP 1202
720
SKIP 3995
730
SKIP 3475
SKIP 3490
740
750
760
SKIP 1383
SKIP 1357
permitted_oa MISSING
830
840
SKIP 1868
850
SKIP 883
880
890
SKIP 1392
900
910
SKIP 1377
920
SKIP 3443
930
940
SKIP 1123
SKIP 3581
SKIP 3558
SKIP 745
980
990
SKIP 11
SKIP 2499
1000
SKIP 42
1010
1020
SKIP 314
1030
1040
SKIP 1380
SKIP 229
SKIP 1518
SKIP 5682
SKIP 4708
SKIP 1661
1130
SKIP 6585
1140
SKIP 3212
1150
SKIP 335
SKIP 6774
1160
SKIP 6590
1180
SKIP 1639
SKIP 5094
SKIP 1254
1200
SKIP 6325
SKIP 3539
SKIP 1444
SKIP 250
SKIP 1543
SKIP 3415
SKIP 3571
SKIP 3474
SKIP 3586
SKIP 3220
SKIP 3837
SKIP 1650
SKIP 1051
SKIP 3572
SKIP 612
SKIP 6587
SKIP 3567
SKIP 1654
SKIP 4070
SKIP 1643
SKIP 6588
SKIP 1657
SKIP 1687
SKIP 1692
SKIP 1341
1320
SKIP 7150
SKIP 876
1330
SKIP 7007
SKIP 7091
1340
1350
SKIP 173
SKIP 4703
1360
SKIP 2515
1370
SKIP 242
SKIP 3930
SKIP 2004
1400
1410
SKIP 2123
SKIP 1320
SKIP 1459
SKIP 1588
SKIP 7678
SKIP 1391
SKIP 878
SKIP 138
SKIP 7632
SKIP 1644
SKIP 1637
SKIP 2207
SKIP 2428
SKIP 2432
1460
SKIP 2477
SKIP 2430
SKIP 1653
SKIP 2397
SKIP 5935
SKIP 3527
SKIP 148
SKIP 7793
SKIP 4005
SKIP 7768
SKIP 3455
SKIP 1652
SKIP 3570
SKIP 7792
SKIP 3533
SKIP 6586
1520
SKIP 7787
SKIP 3355
1530
SKIP 226
SKIP 1655
SKIP 7783
1540
SKIP 6582
1550
SKIP 7762
SKIP 4691
SKIP 1911
SKIP 1447
SKIP 1778
SKIP 1888
SKIP 228
SKIP 7407
SKIP 7965
1590
1600
1610
SKIP 821
SKIP 823
SKIP 7714
1620
SKIP 172
SKIP 2624
SKIP 3654
SKIP 1659
SKIP 1656
SKIP 1658
SKIP 1393
1640
SKIP 6778
SKIP 8220
SKIP 7872
SKIP 1587
SKIP 822
SKIP 1460
SKIP 6581
SKIP 3568
1670
SKIP 7509
SKIP 7799
SKIP 7765
1680
SKIP 7761
SKIP 7800
1690
SKIP 1244
1710
SKIP 6222
1730
1740
1750
```python
sherpa_policies
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>issn</th>
<th>sherpa_id</th>
<th>sherpa_uri</th>
<th>open_access_prohibited</th>
<th>additional_oa_fee</th>
<th>article_version</th>
<th>license</th>
<th>embargo</th>
<th>prerequisites</th>
<th>prerequisite_funders</th>
<th>prerequisite_funders_name</th>
<th>prerequisite_funders_fundref</th>
<th>prerequisite_funders_ror</th>
<th>prerequisite_funders_country</th>
<th>prerequisite_funders_url</th>
<th>prerequisite_funders_sherpa_id</th>
<th>prerequisite_subjects</th>
<th>location</th>
<th>locations_ir</th>
<th>locations_not_ir</th>
<th>named_repository</th>
<th>named_academic_social_network</th>
<th>copyright_owner</th>
<th>publisher_deposit</th>
<th>archiving</th>
<th>conditions</th>
<th>public_notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/2050</td>
<td>no</td>
<td>no</td>
<td>submitted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; named_repository ; non_comm...</td>
<td>Non-Commercial Institutional Repository</td>
<td>Author's Homepage ; arXiv ; AgEcon ; PhilPaper...</td>
<td>arXiv ; AgEcon ; PhilPapers ; PubMed Central ;...</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must acknowledge acceptance for publication ; ...</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/2050</td>
<td>no</td>
<td>no</td>
<td>accepted</td>
<td></td>
<td>12</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; named_repository ; non_comm...</td>
<td>Non-Commercial Institutional Repository</td>
<td>Author's Homepage ; arXiv ; AgEcon ; PhilPaper...</td>
<td>arXiv ; AgEcon ; PhilPapers ; PubMed Central ;...</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Publisher source must be acknowledged with cit...</td>
<td>NaN</td>
</tr>
<tr>
<td>2</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/3315</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_website ; institutional_repository ; named...</td>
<td>Any Website ; Institutional Repository</td>
<td>PubMed Central ; Subject Repository ; Journal ...</td>
<td>PubMed Central</td>
<td>NaN</td>
<td>authors</td>
<td>disciplinary (PubMed Central) ;</td>
<td>True</td>
<td>Published source must be acknowledged</td>
<td>NaN</td>
</tr>
<tr>
<td>3</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/3315</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by_nc_nd</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_website ; named_repository ; non_commercia...</td>
<td>Any Website ; Non-Commercial Institutional Rep...</td>
<td>PubMed Central ; Non-Commercial Subject Reposi...</td>
<td>PubMed Central</td>
<td>NaN</td>
<td>authors</td>
<td>disciplinary (PubMed Central) ;</td>
<td>True</td>
<td>Published source must be acknowledged</td>
<td>NaN</td>
</tr>
<tr>
<td>4</td>
<td>498</td>
<td>0001-4842</td>
<td>7760</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/4</td>
<td>no</td>
<td>no</td>
<td>submitted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>named_repository ; preprint_repository ; subje...</td>
<td></td>
<td>ChemRxiv ; bioRxiv ; arXiv ; Preprint Reposito...</td>
<td>ChemRxiv ; bioRxiv ; arXiv</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>False</td>
<td>Must not violate ACS ethical Guidelines ; Must...</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>8590</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>no</td>
<td>submitted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; institutional_repository ; ...</td>
<td>Institutional Repository ; Institutional Website</td>
<td>Author's Homepage</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must link to published article ; Publisher cop...</td>
<td>NaN</td>
</tr>
<tr>
<td>8591</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>no</td>
<td>accepted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; institutional_repository ; ...</td>
<td>Institutional Repository ; Institutional Website</td>
<td>Author's Homepage</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must link to published article ; Publisher cop...</td>
<td>NaN</td>
</tr>
<tr>
<td>8592</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>no</td>
<td>published</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; institutional_repository ; ...</td>
<td>Institutional Repository ; Institutional Website</td>
<td>Author's Homepage</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must link to published article ; Publisher cop...</td>
<td>NaN</td>
</tr>
<tr>
<td>8593</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_repository ; this_journal</td>
<td>Any Repository</td>
<td>Journal Website</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>8594</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_repository ; this_journal</td>
<td>Any Repository</td>
<td>Journal Website</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>NaN</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>8595 rows × 28 columns</p>
</div>
```python
# convertir l'index en id
sherpa_policies = sherpa_policies.reset_index()
# ajout de l'id avec l'index + 1
sherpa_policies['id'] = sherpa_policies['index'] + 1
del sherpa_policies['index']
sherpa_policies
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>issn</th>
<th>sherpa_id</th>
<th>sherpa_uri</th>
<th>open_access_prohibited</th>
<th>additional_oa_fee</th>
<th>article_version</th>
<th>license</th>
<th>embargo</th>
<th>prerequisites</th>
<th>prerequisite_funders</th>
<th>prerequisite_funders_name</th>
<th>prerequisite_funders_fundref</th>
<th>prerequisite_funders_ror</th>
<th>prerequisite_funders_country</th>
<th>prerequisite_funders_url</th>
<th>prerequisite_funders_sherpa_id</th>
<th>prerequisite_subjects</th>
<th>location</th>
<th>locations_ir</th>
<th>locations_not_ir</th>
<th>named_repository</th>
<th>named_academic_social_network</th>
<th>copyright_owner</th>
<th>publisher_deposit</th>
<th>archiving</th>
<th>conditions</th>
<th>public_notes</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/2050</td>
<td>no</td>
<td>no</td>
<td>submitted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; named_repository ; non_comm...</td>
<td>Non-Commercial Institutional Repository</td>
<td>Author's Homepage ; arXiv ; AgEcon ; PhilPaper...</td>
<td>arXiv ; AgEcon ; PhilPapers ; PubMed Central ;...</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must acknowledge acceptance for publication ; ...</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/2050</td>
<td>no</td>
<td>no</td>
<td>accepted</td>
<td></td>
<td>12</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; named_repository ; non_comm...</td>
<td>Non-Commercial Institutional Repository</td>
<td>Author's Homepage ; arXiv ; AgEcon ; PhilPaper...</td>
<td>arXiv ; AgEcon ; PhilPapers ; PubMed Central ;...</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Publisher source must be acknowledged with cit...</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/3315</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_website ; institutional_repository ; named...</td>
<td>Any Website ; Institutional Repository</td>
<td>PubMed Central ; Subject Repository ; Journal ...</td>
<td>PubMed Central</td>
<td>NaN</td>
<td>authors</td>
<td>disciplinary (PubMed Central) ;</td>
<td>True</td>
<td>Published source must be acknowledged</td>
<td>NaN</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/3315</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by_nc_nd</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_website ; named_repository ; non_commercia...</td>
<td>Any Website ; Non-Commercial Institutional Rep...</td>
<td>PubMed Central ; Non-Commercial Subject Reposi...</td>
<td>PubMed Central</td>
<td>NaN</td>
<td>authors</td>
<td>disciplinary (PubMed Central) ;</td>
<td>True</td>
<td>Published source must be acknowledged</td>
<td>NaN</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>498</td>
<td>0001-4842</td>
<td>7760</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/4</td>
<td>no</td>
<td>no</td>
<td>submitted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>named_repository ; preprint_repository ; subje...</td>
<td></td>
<td>ChemRxiv ; bioRxiv ; arXiv ; Preprint Reposito...</td>
<td>ChemRxiv ; bioRxiv ; arXiv</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>False</td>
<td>Must not violate ACS ethical Guidelines ; Must...</td>
<td>NaN</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>8590</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>no</td>
<td>submitted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; institutional_repository ; ...</td>
<td>Institutional Repository ; Institutional Website</td>
<td>Author's Homepage</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must link to published article ; Publisher cop...</td>
<td>NaN</td>
<td>8591</td>
</tr>
<tr>
<td>8591</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>no</td>
<td>accepted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; institutional_repository ; ...</td>
<td>Institutional Repository ; Institutional Website</td>
<td>Author's Homepage</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must link to published article ; Publisher cop...</td>
<td>NaN</td>
<td>8592</td>
</tr>
<tr>
<td>8592</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>no</td>
<td>published</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; institutional_repository ; ...</td>
<td>Institutional Repository ; Institutional Website</td>
<td>Author's Homepage</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must link to published article ; Publisher cop...</td>
<td>NaN</td>
<td>8593</td>
</tr>
<tr>
<td>8593</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_repository ; this_journal</td>
<td>Any Repository</td>
<td>Journal Website</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>NaN</td>
<td>NaN</td>
<td>8594</td>
</tr>
<tr>
<td>8594</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_repository ; this_journal</td>
<td>Any Repository</td>
<td>Journal Website</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>NaN</td>
<td>NaN</td>
<td>8595</td>
</tr>
</tbody>
</table>
<p>8595 rows × 29 columns</p>
</div>
```python
# export csv
sherpa_policies.to_csv('sample/sherpa_policies_brut.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export excel
sherpa_policies.to_excel('sample/sherpa_policies_brut.xlsx', index=False)
```
## Calcul de la catégorie "green" et export final des journaux
```python
sherpa_policies
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>issn</th>
<th>sherpa_id</th>
<th>sherpa_uri</th>
<th>open_access_prohibited</th>
<th>additional_oa_fee</th>
<th>article_version</th>
<th>license</th>
<th>embargo</th>
<th>prerequisites</th>
<th>prerequisite_funders</th>
<th>prerequisite_funders_name</th>
<th>prerequisite_funders_fundref</th>
<th>prerequisite_funders_ror</th>
<th>prerequisite_funders_country</th>
<th>prerequisite_funders_url</th>
<th>prerequisite_funders_sherpa_id</th>
<th>prerequisite_subjects</th>
<th>location</th>
<th>locations_ir</th>
<th>locations_not_ir</th>
<th>named_repository</th>
<th>named_academic_social_network</th>
<th>copyright_owner</th>
<th>publisher_deposit</th>
<th>archiving</th>
<th>conditions</th>
<th>public_notes</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/2050</td>
<td>no</td>
<td>no</td>
<td>submitted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; named_repository ; non_comm...</td>
<td>Non-Commercial Institutional Repository</td>
<td>Author's Homepage ; arXiv ; AgEcon ; PhilPaper...</td>
<td>arXiv ; AgEcon ; PhilPapers ; PubMed Central ;...</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must acknowledge acceptance for publication ; ...</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/2050</td>
<td>no</td>
<td>no</td>
<td>accepted</td>
<td></td>
<td>12</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; named_repository ; non_comm...</td>
<td>Non-Commercial Institutional Repository</td>
<td>Author's Homepage ; arXiv ; AgEcon ; PhilPaper...</td>
<td>arXiv ; AgEcon ; PhilPapers ; PubMed Central ;...</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Publisher source must be acknowledged with cit...</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/3315</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_website ; institutional_repository ; named...</td>
<td>Any Website ; Institutional Repository</td>
<td>PubMed Central ; Subject Repository ; Journal ...</td>
<td>PubMed Central</td>
<td>NaN</td>
<td>authors</td>
<td>disciplinary (PubMed Central) ;</td>
<td>True</td>
<td>Published source must be acknowledged</td>
<td>NaN</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>532</td>
<td>0001-2815</td>
<td>11905</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/3315</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by_nc_nd</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_website ; named_repository ; non_commercia...</td>
<td>Any Website ; Non-Commercial Institutional Rep...</td>
<td>PubMed Central ; Non-Commercial Subject Reposi...</td>
<td>PubMed Central</td>
<td>NaN</td>
<td>authors</td>
<td>disciplinary (PubMed Central) ;</td>
<td>True</td>
<td>Published source must be acknowledged</td>
<td>NaN</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>498</td>
<td>0001-4842</td>
<td>7760</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/4</td>
<td>no</td>
<td>no</td>
<td>submitted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>named_repository ; preprint_repository ; subje...</td>
<td></td>
<td>ChemRxiv ; bioRxiv ; arXiv ; Preprint Reposito...</td>
<td>ChemRxiv ; bioRxiv ; arXiv</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>False</td>
<td>Must not violate ACS ethical Guidelines ; Must...</td>
<td>NaN</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>8590</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>no</td>
<td>submitted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; institutional_repository ; ...</td>
<td>Institutional Repository ; Institutional Website</td>
<td>Author's Homepage</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must link to published article ; Publisher cop...</td>
<td>NaN</td>
<td>8591</td>
</tr>
<tr>
<td>8591</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>no</td>
<td>accepted</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; institutional_repository ; ...</td>
<td>Institutional Repository ; Institutional Website</td>
<td>Author's Homepage</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must link to published article ; Publisher cop...</td>
<td>NaN</td>
<td>8592</td>
</tr>
<tr>
<td>8592</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>no</td>
<td>published</td>
<td></td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>authors_homepage ; institutional_repository ; ...</td>
<td>Institutional Repository ; Institutional Website</td>
<td>Author's Homepage</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>Must link to published article ; Publisher cop...</td>
<td>NaN</td>
<td>8593</td>
</tr>
<tr>
<td>8593</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_repository ; this_journal</td>
<td>Any Repository</td>
<td>Journal Website</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>NaN</td>
<td>NaN</td>
<td>8594</td>
</tr>
<tr>
<td>8594</td>
<td>608</td>
<td>2475-9953</td>
<td>33503</td>
<td>https://v2.sherpa.ac.uk/id/publisher_policy/10</td>
<td>no</td>
<td>yes</td>
<td>published</td>
<td>cc_by</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>any_repository ; this_journal</td>
<td>Any Repository</td>
<td>Journal Website</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>True</td>
<td>NaN</td>
<td>NaN</td>
<td>8595</td>
</tr>
</tbody>
</table>
<p>8595 rows × 29 columns</p>
</div>
```python
sherpa_policies_ir = sherpa_policies.loc[(sherpa_policies['archiving'] == True) & (sherpa_policies['article_version'] == 'published') & (sherpa_policies['prerequisite_funders'].isna())][['journal', 'embargo', 'license', 'conditions']]
sherpa_policies_ir
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>embargo</th>
<th>license</th>
<th>conditions</th>
</tr>
</thead>
<tbody>
<tr>
<td>2</td>
<td>532</td>
<td>0</td>
<td>cc_by</td>
<td>Published source must be acknowledged</td>
</tr>
<tr>
<td>3</td>
<td>532</td>
<td>0</td>
<td>cc_by_nc_nd</td>
<td>Published source must be acknowledged</td>
</tr>
<tr>
<td>9</td>
<td>498</td>
<td>12</td>
<td>cc_by</td>
<td>NaN</td>
</tr>
<tr>
<td>10</td>
<td>498</td>
<td>12</td>
<td>cc_by_nc_nd</td>
<td>NaN</td>
</tr>
<tr>
<td>11</td>
<td>498</td>
<td>12</td>
<td>bespoke_license</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>8588</td>
<td>533</td>
<td>0</td>
<td>cc_by</td>
<td>NaN</td>
</tr>
<tr>
<td>8589</td>
<td>533</td>
<td>0</td>
<td>cc_by</td>
<td>NaN</td>
</tr>
<tr>
<td>8592</td>
<td>608</td>
<td>0</td>
<td></td>
<td>Must link to published article ; Publisher cop...</td>
</tr>
<tr>
<td>8593</td>
<td>608</td>
<td>0</td>
<td>cc_by</td>
<td>NaN</td>
</tr>
<tr>
<td>8594</td>
<td>608</td>
<td>0</td>
<td>cc_by</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>1118 rows × 4 columns</p>
</div>
```python
# dedup
sherpa_policies_ir_id = sherpa_policies_ir[['journal', 'embargo']].sort_values(by=['journal', 'embargo'])
sherpa_policies_ir_dedup = sherpa_policies_ir_id.drop_duplicates(subset='journal')
sherpa_policies_ir_dedup
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>embargo</th>
</tr>
</thead>
<tbody>
<tr>
<td>2367</td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>8342</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>7366</td>
<td>5</td>
<td>0</td>
</tr>
<tr>
<td>261</td>
<td>6</td>
<td>12</td>
</tr>
<tr>
<td>7086</td>
<td>7</td>
<td>0</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>6479</td>
<td>996</td>
<td>0</td>
</tr>
<tr>
<td>6873</td>
<td>997</td>
<td>0</td>
</tr>
<tr>
<td>1823</td>
<td>998</td>
<td>0</td>
</tr>
<tr>
<td>3944</td>
<td>999</td>
<td>0</td>
</tr>
<tr>
<td>6750</td>
<td>1000</td>
<td>0</td>
</tr>
</tbody>
</table>
<p>579 rows × 2 columns</p>
</div>
```python
# ajout de la ctégorie green (2)
sherpa_policies_ir_dedup['oa_status'] = 2
sherpa_policies_ir_dedup
```
C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>journal</th>
<th>embargo</th>
<th>oa_status</th>
</tr>
</thead>
<tbody>
<tr>
<td>2367</td>
<td>2</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>8342</td>
<td>3</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>7366</td>
<td>5</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>261</td>
<td>6</td>
<td>12</td>
<td>2</td>
</tr>
<tr>
<td>7086</td>
<td>7</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>6479</td>
<td>996</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>6873</td>
<td>997</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>1823</td>
<td>998</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>3944</td>
<td>999</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>6750</td>
<td>1000</td>
<td>0</td>
<td>2</td>
</tr>
</tbody>
</table>
<p>579 rows × 3 columns</p>
</div>
```python
# merge avec les revues
sherpa_policies_ir_dedup = sherpa_policies_ir_dedup.rename(columns={'journal' : 'id'})
journals_export = pd.merge(journals_export, sherpa_policies_ir_dedup, on='id', how='left')
journals_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status_x</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>embargo</th>
<th>oa_status_y</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>PLoS ONE</td>
<td></td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>5</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
<td>0</td>
<td>2.0</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td></td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>1</td>
<td>4, 5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>Physical review B: Condensed matter and materi...</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>1998</td>
<td>2015</td>
<td>http://journals.aps.org/prb/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>Smart Materials and Structures</td>
<td>Smart mater. struct. (Print)</td>
<td>1992</td>
<td>9999</td>
<td>http://iopscience.iop.org/0964-1726</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>1</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>119</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>217</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>911 rows × 18 columns</p>
</div>
```python
# choix de la catégorie OA
journals_export['oa_status'] = journals_export['oa_status_x']
journals_export.loc[(journals_export['oa_status_x'] == 1) & (journals_export['oa_status_y'].notna()), 'oa_status'] = journals_export['oa_status_y']
journals_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>oa_status_x</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>embargo</th>
<th>oa_status_y</th>
<th>oa_status</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>1.0</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
<td>2.0</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>PLoS ONE</td>
<td></td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>5</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
<td>0</td>
<td>2.0</td>
<td>5.0</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td></td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>1</td>
<td>4, 5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>1.0</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>Physical review B: Condensed matter and materi...</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>1998</td>
<td>2015</td>
<td>http://journals.aps.org/prb/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
<td>2.0</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>Smart Materials and Structures</td>
<td>Smart mater. struct. (Print)</td>
<td>1992</td>
<td>9999</td>
<td>http://iopscience.iop.org/0964-1726</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
<td>2.0</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
<td>2.0</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>1</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
<td>2.0</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>1</td>
<td>119</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>0</td>
<td>2.0</td>
<td>2.0</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>1</td>
<td>217</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>1.0</td>
</tr>
</tbody>
</table>
<p>911 rows × 19 columns</p>
</div>
```python
# 6 : Diamond
# 5 : Gold
# 4 : Full
# 3 : Hybrid
# 2 : Green
# 1 : UNKNOWN
journals_export['oa_status'].value_counts()
```
2.0 518
1.0 306
5.0 70
6.0 17
Name: oa_status, dtype: int64
```python
del journals_export['embargo']
del journals_export['oa_status_x']
del journals_export['oa_status_y']
journals_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>oa_status</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1.0</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2.0</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>PLoS ONE</td>
<td></td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
<td>5.0</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td></td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>4, 5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1.0</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>Physical review B: Condensed matter and materi...</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>1998</td>
<td>2015</td>
<td>http://journals.aps.org/prb/</td>
<td>236</td>
<td>124</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2.0</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>Smart Materials and Structures</td>
<td>Smart mater. struct. (Print)</td>
<td>1992</td>
<td>9999</td>
<td>http://iopscience.iop.org/0964-1726</td>
<td>234</td>
<td>124</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2.0</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2.0</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
<td>2.0</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>119</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2.0</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>217</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1.0</td>
</tr>
</tbody>
</table>
<p>911 rows × 16 columns</p>
</div>
```python
journals_export['oa_status'] = journals_export['oa_status'].astype(int)
journals_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>name</th>
<th>name_short_iso_4</th>
<th>starting_year</th>
<th>end_year</th>
<th>website</th>
<th>country</th>
<th>language</th>
<th>publisher</th>
<th>doaj_seal</th>
<th>doaj_status</th>
<th>lockss</th>
<th>portico</th>
<th>nlch</th>
<th>qoam_av_score</th>
<th>oa_status</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>Revue médicale suisse</td>
<td>Rev. méd. suisse</td>
<td>2005</td>
<td>9999</td>
<td></td>
<td>215</td>
<td>138</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Physical Review Letters</td>
<td>Phys. rev. lett. (Print)</td>
<td>1958</td>
<td>9999</td>
<td>http://prl.aps.org/</td>
<td>236</td>
<td>124</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>PLoS ONE</td>
<td></td>
<td>2006</td>
<td>9999</td>
<td>http://www.plosone.org/</td>
<td>236</td>
<td>124</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>4.035714</td>
<td>5</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>EU-topías</td>
<td>EU-topías</td>
<td>2011</td>
<td>9999</td>
<td></td>
<td>209</td>
<td>124, 138, 402, 292</td>
<td>4, 5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>Physical review B: Condensed matter and materi...</td>
<td>Phys. rev., B, Condens. matter mater. phys.</td>
<td>1998</td>
<td>2015</td>
<td>http://journals.aps.org/prb/</td>
<td>236</td>
<td>124</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>906</td>
<td>997</td>
<td>Smart Materials and Structures</td>
<td>Smart mater. struct. (Print)</td>
<td>1992</td>
<td>9999</td>
<td>http://iopscience.iop.org/0964-1726</td>
<td>234</td>
<td>124</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>907</td>
<td>998</td>
<td>Journal of Pediatric Surgery</td>
<td>J. pediatr. surg. (Print)</td>
<td>1966</td>
<td>9999</td>
<td>http://www.jpedsurg.org/</td>
<td>236</td>
<td>124</td>
<td>75</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>908</td>
<td>999</td>
<td>Probability Theory and Related Fields</td>
<td>Probab. theory relat. fields (Internet)</td>
<td>uuuu</td>
<td>9999</td>
<td>http://www.springerlink.com/content/100451/?p=...</td>
<td>83</td>
<td>124</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>909</td>
<td>1000</td>
<td>Renewable Energy</td>
<td>Renew. energy</td>
<td>1991</td>
<td>9999</td>
<td>http://www.elsevier.com/wps/product/cws_home/9...</td>
<td>234</td>
<td>124</td>
<td>119</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NaN</td>
<td>2</td>
</tr>
<tr>
<td>910</td>
<td>1001</td>
<td>Journal of applied physiology: respiratory, en...</td>
<td>J. appl. physiol.: respir., environ. exercise ...</td>
<td>1977</td>
<td>1984</td>
<td>https://www.physiology.org/journal/jappl</td>
<td>236</td>
<td>124</td>
<td>217</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>911 rows × 16 columns</p>
</div>
```python
# export csv
journals_export.to_csv('sample/journal_fin_sherpa.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export excel
journals_export.to_excel('sample/journal_fin_sherpa.xlsx', index=False)
```
```python
# export csv
sherpa_policies_ir_dedup.to_csv('sample/journal_ir.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export excel
sherpa_policies_ir_dedup.to_excel('sample/journal_ir.xlsx', index=False)
```
```python
```

Event Timeline