Page MenuHomec4science

08_oacct_sherpa_issns.md
No OneTemporary

File Metadata

Created
Sat, Nov 2, 07:49

08_oacct_sherpa_issns.md

# Projet Open Access Compliance Check Tool (OACCT)
Projet P5 de la bibliothèque de l'EPFL en collaboration avec les bibliothèques des Universités de Genève, Lausanne et Berne : https://www.swissuniversities.ch/themen/digitalisierung/p-5-wissenschaftliche-information/projekte/swiss-mooc-service-1-1-1-1
Ce notebook permet d'extraire les données choisis parmis les sources obtenues par API et les traiter pour les rendre exploitables dans l'application OACCT.
Auteur : **Pablo Iriarte**, Université de Genève (pablo.iriarte@unige.ch)
Date de dernière mise à jour : 16.07.2021
## Table ISSNs
```python
import pandas as pd
import csv
import json
import numpy as np
```
```python
issns = pd.read_csv('sample/issn_brut.tsv', encoding='utf-8', sep='\t')
issns
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
<td>PRINT</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>0001-2815</td>
<td>532</td>
<td>NaN</td>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
<td>PRINT</td>
<td>1</td>
<td>3</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>0001-4842</td>
<td>498</td>
<td>NaN</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
<td>PRINT</td>
<td>1</td>
<td>5</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
<td>OTHER</td>
<td>3</td>
<td>1756</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>2470-0045</td>
<td>533</td>
<td>NaN</td>
<td>3</td>
<td>1757</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1758</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
<td>PRINT</td>
<td>1</td>
<td>1759</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>3</td>
<td>1760</td>
</tr>
</tbody>
</table>
<p>1760 rows × 6 columns</p>
</div>
## Ajout du format à partir de Sherpa
```python
# ajout du format par sherpa
issn_sherpa = pd.read_csv('sample/issn_sherpa.tsv', encoding='utf-8', sep='\t')
issn_sherpa
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
<td>PRINT</td>
<td>1</td>
<td>1</td>
<td>print</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>0001-2815</td>
<td>532</td>
<td>NaN</td>
<td>3</td>
<td>2</td>
<td>electronic</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
<td>PRINT</td>
<td>1</td>
<td>3</td>
<td>print</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>0001-4842</td>
<td>498</td>
<td>NaN</td>
<td>3</td>
<td>4</td>
<td>electronic</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
<td>PRINT</td>
<td>1</td>
<td>5</td>
<td>print</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
<td>OTHER</td>
<td>3</td>
<td>1756</td>
<td>print</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>2470-0045</td>
<td>533</td>
<td>NaN</td>
<td>3</td>
<td>1757</td>
<td>electronic</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1758</td>
<td>electronic</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
<td>PRINT</td>
<td>1</td>
<td>1759</td>
<td>NaN</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>3</td>
<td>1760</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>1760 rows × 7 columns</p>
</div>
```python
issn_sherpa['type'] = issn_sherpa['type'].str.upper()
issn_sherpa
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
<td>PRINT</td>
<td>1</td>
<td>1</td>
<td>PRINT</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>0001-2815</td>
<td>532</td>
<td>NaN</td>
<td>3</td>
<td>2</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
<td>PRINT</td>
<td>1</td>
<td>3</td>
<td>PRINT</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>0001-4842</td>
<td>498</td>
<td>NaN</td>
<td>3</td>
<td>4</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
<td>PRINT</td>
<td>1</td>
<td>5</td>
<td>PRINT</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
<td>OTHER</td>
<td>3</td>
<td>1756</td>
<td>PRINT</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>2470-0045</td>
<td>533</td>
<td>NaN</td>
<td>3</td>
<td>1757</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1758</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
<td>PRINT</td>
<td>1</td>
<td>1759</td>
<td>NaN</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>3</td>
<td>1760</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>1760 rows × 7 columns</p>
</div>
```python
issns = pd.merge(issns, issn_sherpa[['issn', 'type']], on='issn', how='outer')
issns
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
<td>PRINT</td>
<td>1</td>
<td>1</td>
<td>PRINT</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>0001-2815</td>
<td>532</td>
<td>NaN</td>
<td>3</td>
<td>2</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
<td>PRINT</td>
<td>1</td>
<td>3</td>
<td>PRINT</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>0001-4842</td>
<td>498</td>
<td>NaN</td>
<td>3</td>
<td>4</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
<td>PRINT</td>
<td>1</td>
<td>5</td>
<td>PRINT</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
<td>OTHER</td>
<td>3</td>
<td>1756</td>
<td>PRINT</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>2470-0045</td>
<td>533</td>
<td>NaN</td>
<td>3</td>
<td>1757</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1758</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
<td>PRINT</td>
<td>1</td>
<td>1759</td>
<td>NaN</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>3</td>
<td>1760</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>1760 rows × 7 columns</p>
</div>
```python
issns['format'].value_counts()
```
PRINT 816
ELECTRONIC 90
OTHER 2
Name: format, dtype: int64
```python
issns['type'].value_counts()
```
PRINT 750
ELECTRONIC 575
Name: type, dtype: int64
```python
# tester les lignes sans type
issns.loc[issns['format'].isnull()].loc[issns['type'].isnull()]
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>5</td>
<td>1520-8524</td>
<td>0001-4966</td>
<td>789</td>
<td>NaN</td>
<td>3</td>
<td>6</td>
<td>NaN</td>
</tr>
<tr>
<td>6</td>
<td>1520-9024</td>
<td>0001-4966</td>
<td>789</td>
<td>NaN</td>
<td>3</td>
<td>7</td>
<td>NaN</td>
</tr>
<tr>
<td>17</td>
<td>1943-2984</td>
<td>0002-7863</td>
<td>8</td>
<td>NaN</td>
<td>3</td>
<td>18</td>
<td>NaN</td>
</tr>
<tr>
<td>23</td>
<td>1555-7162</td>
<td>0002-9343</td>
<td>985</td>
<td>NaN</td>
<td>3</td>
<td>24</td>
<td>NaN</td>
</tr>
<tr>
<td>27</td>
<td>2163-5773</td>
<td>0002-9513</td>
<td>787</td>
<td>NaN</td>
<td>3</td>
<td>28</td>
<td>NaN</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1722</td>
<td>2160-9047</td>
<td>2160-9020</td>
<td>467</td>
<td>NaN</td>
<td>3</td>
<td>1723</td>
<td>NaN</td>
</tr>
<tr>
<td>1729</td>
<td>2340-115X</td>
<td>2174-8454</td>
<td>4</td>
<td>NaN</td>
<td>3</td>
<td>1730</td>
<td>NaN</td>
</tr>
<tr>
<td>1732</td>
<td>2211-3282</td>
<td>2211-2855</td>
<td>990</td>
<td>NaN</td>
<td>3</td>
<td>1733</td>
<td>NaN</td>
</tr>
<tr>
<td>1739</td>
<td>2297-7007</td>
<td>2297-6981</td>
<td>618</td>
<td>NaN</td>
<td>3</td>
<td>1740</td>
<td>NaN</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>3</td>
<td>1760</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>326 rows × 7 columns</p>
</div>
```python
# tester les lignes avec type égal
issns.loc[issns['format'] == issns['type']]
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
<td>PRINT</td>
<td>1</td>
<td>1</td>
<td>PRINT</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
<td>PRINT</td>
<td>1</td>
<td>3</td>
<td>PRINT</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
<td>PRINT</td>
<td>1</td>
<td>5</td>
<td>PRINT</td>
</tr>
<tr>
<td>7</td>
<td>0001-6268</td>
<td>0001-6268</td>
<td>166</td>
<td>PRINT</td>
<td>1</td>
<td>8</td>
<td>PRINT</td>
</tr>
<tr>
<td>9</td>
<td>0001-6322</td>
<td>0001-6322</td>
<td>807</td>
<td>PRINT</td>
<td>1</td>
<td>10</td>
<td>PRINT</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1748</td>
<td>2380-8195</td>
<td>2380-8195</td>
<td>947</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1749</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1749</td>
<td>2469-990X</td>
<td>2469-990X</td>
<td>684</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1750</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1751</td>
<td>2469-9950</td>
<td>2469-9950</td>
<td>41</td>
<td>PRINT</td>
<td>1</td>
<td>1752</td>
<td>PRINT</td>
</tr>
<tr>
<td>1753</td>
<td>2470-0010</td>
<td>2470-0010</td>
<td>80</td>
<td>PRINT</td>
<td>1</td>
<td>1754</td>
<td>PRINT</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1758</td>
<td>ELECTRONIC</td>
</tr>
</tbody>
</table>
<p>774 rows × 7 columns</p>
</div>
```python
# tester les lignes avec type diff
issns.loc[issns['format'] != issns['type']]
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>0001-2815</td>
<td>532</td>
<td>NaN</td>
<td>3</td>
<td>2</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>0001-4842</td>
<td>498</td>
<td>NaN</td>
<td>3</td>
<td>4</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>5</td>
<td>1520-8524</td>
<td>0001-4966</td>
<td>789</td>
<td>NaN</td>
<td>3</td>
<td>6</td>
<td>NaN</td>
</tr>
<tr>
<td>6</td>
<td>1520-9024</td>
<td>0001-4966</td>
<td>789</td>
<td>NaN</td>
<td>3</td>
<td>7</td>
<td>NaN</td>
</tr>
<tr>
<td>8</td>
<td>0942-0940</td>
<td>0001-6268</td>
<td>166</td>
<td>NaN</td>
<td>3</td>
<td>9</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1754</td>
<td>2470-0029</td>
<td>2470-0010</td>
<td>80</td>
<td>NaN</td>
<td>3</td>
<td>1755</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
<td>OTHER</td>
<td>3</td>
<td>1756</td>
<td>PRINT</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>2470-0045</td>
<td>533</td>
<td>NaN</td>
<td>3</td>
<td>1757</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
<td>PRINT</td>
<td>1</td>
<td>1759</td>
<td>NaN</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>3</td>
<td>1760</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>986 rows × 7 columns</p>
</div>
```python
# attribution de l'id du type avec préference par ISSN.org puis Sherpa
# PRINT = 1
# ELECTRONIC = 2
# OTHER = 3
issns['issn_type'] = issns['format']
issns.loc[issns['format'].isna(), 'issn_type'] = issns['type']
issns['issn_type'] = issns['issn_type'].str.replace('PRINT', '1')
issns['issn_type'] = issns['issn_type'].str.replace('ELECTRONIC', '2')
issns['issn_type'] = issns['issn_type'].str.replace('OTHER', '3')
issns['issn_type'] = issns['issn_type'].fillna(3)
issns
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
<td>PRINT</td>
<td>1</td>
<td>1</td>
<td>PRINT</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>0001-2815</td>
<td>532</td>
<td>NaN</td>
<td>2</td>
<td>2</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
<td>PRINT</td>
<td>1</td>
<td>3</td>
<td>PRINT</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>0001-4842</td>
<td>498</td>
<td>NaN</td>
<td>2</td>
<td>4</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
<td>PRINT</td>
<td>1</td>
<td>5</td>
<td>PRINT</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
<td>OTHER</td>
<td>3</td>
<td>1756</td>
<td>PRINT</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>2470-0045</td>
<td>533</td>
<td>NaN</td>
<td>2</td>
<td>1757</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1758</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
<td>PRINT</td>
<td>1</td>
<td>1759</td>
<td>NaN</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>3</td>
<td>1760</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>1760 rows × 7 columns</p>
</div>
```python
# test de diffs
issns.loc[issns['format'] == 'PRINT'].loc[issns['type'] == 'ELECTRONIC']
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>1123</td>
<td>0959-8138</td>
<td>0959-8138</td>
<td>383</td>
<td>PRINT</td>
<td>1</td>
<td>1124</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1191</td>
<td>1025-496X</td>
<td>1025-496X</td>
<td>779</td>
<td>PRINT</td>
<td>1</td>
<td>1192</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1451</td>
<td>1465-6906</td>
<td>1465-6906</td>
<td>773</td>
<td>PRINT</td>
<td>1</td>
<td>1452</td>
<td>ELECTRONIC</td>
</tr>
</tbody>
</table>
</div>
```python
# test de diffs
issns.loc[issns['format'] == 'ELECTRONIC'].loc[issns['type'] == 'PRINT']
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>121</td>
<td>0009-7330</td>
<td>0009-7330</td>
<td>948</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>122</td>
<td>PRINT</td>
</tr>
<tr>
<td>360</td>
<td>0024-3795</td>
<td>0024-3795</td>
<td>968</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>361</td>
<td>PRINT</td>
</tr>
<tr>
<td>595</td>
<td>0163-3864</td>
<td>0163-3864</td>
<td>701</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>596</td>
<td>PRINT</td>
</tr>
<tr>
<td>653</td>
<td>0194-911X</td>
<td>0194-911X</td>
<td>871</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>654</td>
<td>PRINT</td>
</tr>
<tr>
<td>665</td>
<td>0197-9337</td>
<td>0197-9337</td>
<td>672</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>666</td>
<td>PRINT</td>
</tr>
<tr>
<td>711</td>
<td>0270-6474</td>
<td>0270-6474</td>
<td>73</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>712</td>
<td>PRINT</td>
</tr>
<tr>
<td>734</td>
<td>0278-2391</td>
<td>0278-2391</td>
<td>521</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>735</td>
<td>PRINT</td>
</tr>
<tr>
<td>928</td>
<td>0743-7463</td>
<td>0743-7463</td>
<td>114</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>929</td>
<td>PRINT</td>
</tr>
<tr>
<td>1205</td>
<td>1040-4651</td>
<td>1040-4651</td>
<td>886</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1206</td>
<td>PRINT</td>
</tr>
<tr>
<td>1243</td>
<td>1059-7794</td>
<td>1059-7794</td>
<td>440</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1244</td>
<td>PRINT</td>
</tr>
<tr>
<td>1287</td>
<td>1079-5642</td>
<td>1079-5642</td>
<td>468</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1288</td>
<td>PRINT</td>
</tr>
<tr>
<td>1503</td>
<td>1528-3542</td>
<td>1528-3542</td>
<td>547</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1504</td>
<td>PRINT</td>
</tr>
<tr>
<td>1513</td>
<td>1530-6984</td>
<td>1530-6984</td>
<td>36</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1514</td>
<td>PRINT</td>
</tr>
<tr>
<td>1515</td>
<td>1534-4320</td>
<td>1534-4320</td>
<td>735</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1516</td>
<td>PRINT</td>
</tr>
<tr>
<td>1538</td>
<td>1549-9618</td>
<td>1549-9618</td>
<td>158</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1539</td>
<td>PRINT</td>
</tr>
<tr>
<td>1546</td>
<td>1553-734X</td>
<td>1553-734X</td>
<td>240</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1547</td>
<td>PRINT</td>
</tr>
<tr>
<td>1661</td>
<td>1876-6102</td>
<td>1876-6102</td>
<td>249</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1662</td>
<td>PRINT</td>
</tr>
<tr>
<td>1662</td>
<td>1877-0568</td>
<td>1877-0568</td>
<td>675</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1663</td>
<td>PRINT</td>
</tr>
<tr>
<td>1663</td>
<td>1877-7058</td>
<td>1877-7058</td>
<td>632</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1664</td>
<td>PRINT</td>
</tr>
<tr>
<td>1730</td>
<td>2211-1247</td>
<td>2211-1247</td>
<td>113</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1731</td>
<td>PRINT</td>
</tr>
</tbody>
</table>
</div>
```python
# test de diffs
issns.loc[issns['format'].isna()].loc[issns['type'] == 'PRINT']
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>0003-2670</td>
<td>0003-2670</td>
<td>415</td>
<td>NaN</td>
<td>1</td>
<td>32</td>
<td>PRINT</td>
</tr>
<tr>
<td>127</td>
<td>0010-3616</td>
<td>0010-3616</td>
<td>417</td>
<td>NaN</td>
<td>1</td>
<td>128</td>
<td>PRINT</td>
</tr>
<tr>
<td>151</td>
<td>0012-9402</td>
<td>0012-9402</td>
<td>237</td>
<td>NaN</td>
<td>1</td>
<td>152</td>
<td>PRINT</td>
</tr>
<tr>
<td>216</td>
<td>0018-9375</td>
<td>0018-9375</td>
<td>361</td>
<td>NaN</td>
<td>1</td>
<td>217</td>
<td>PRINT</td>
</tr>
<tr>
<td>376</td>
<td>0026-4598</td>
<td>0026-4598</td>
<td>496</td>
<td>NaN</td>
<td>1</td>
<td>377</td>
<td>PRINT</td>
</tr>
<tr>
<td>643</td>
<td>0178-8051</td>
<td>0178-8051</td>
<td>999</td>
<td>NaN</td>
<td>1</td>
<td>644</td>
<td>PRINT</td>
</tr>
<tr>
<td>838</td>
<td>1388-6150</td>
<td>0368-4466</td>
<td>499</td>
<td>NaN</td>
<td>1</td>
<td>839</td>
<td>PRINT</td>
</tr>
<tr>
<td>1192</td>
<td>1560-7917</td>
<td>1025-496X</td>
<td>779</td>
<td>NaN</td>
<td>1</td>
<td>1193</td>
<td>PRINT</td>
</tr>
<tr>
<td>1201</td>
<td>1126-6708</td>
<td>1029-8479</td>
<td>7</td>
<td>NaN</td>
<td>1</td>
<td>1202</td>
<td>PRINT</td>
</tr>
<tr>
<td>1249</td>
<td>1063-651X</td>
<td>1063-651X</td>
<td>588</td>
<td>NaN</td>
<td>1</td>
<td>1250</td>
<td>PRINT</td>
</tr>
<tr>
<td>1531</td>
<td>1538-7933</td>
<td>1538-7836</td>
<td>148</td>
<td>NaN</td>
<td>1</td>
<td>1532</td>
<td>PRINT</td>
</tr>
<tr>
<td>1560</td>
<td>1569-9293</td>
<td>1569-9285</td>
<td>822</td>
<td>NaN</td>
<td>1</td>
<td>1561</td>
<td>PRINT</td>
</tr>
<tr>
<td>1597</td>
<td>1662-4548</td>
<td>1662-453X</td>
<td>421</td>
<td>NaN</td>
<td>1</td>
<td>1598</td>
<td>PRINT</td>
</tr>
<tr>
<td>1658</td>
<td>8756-3282</td>
<td>1873-2763</td>
<td>488</td>
<td>NaN</td>
<td>1</td>
<td>1659</td>
<td>PRINT</td>
</tr>
</tbody>
</table>
</div>
```python
# convertir journal en int
issns['journal'] = issns['journal'].astype(int)
```
```python
# convertir l'index en id
issns = issns.reset_index()
issns['id'] = issns['index'] + 1
del issns['index']
issns
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>issn</th>
<th>issnl</th>
<th>journal</th>
<th>format</th>
<th>issn_type</th>
<th>id</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0001-2815</td>
<td>0001-2815</td>
<td>532</td>
<td>PRINT</td>
<td>1</td>
<td>1</td>
<td>PRINT</td>
</tr>
<tr>
<td>1</td>
<td>1399-0039</td>
<td>0001-2815</td>
<td>532</td>
<td>NaN</td>
<td>2</td>
<td>2</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>2</td>
<td>0001-4842</td>
<td>0001-4842</td>
<td>498</td>
<td>PRINT</td>
<td>1</td>
<td>3</td>
<td>PRINT</td>
</tr>
<tr>
<td>3</td>
<td>1520-4898</td>
<td>0001-4842</td>
<td>498</td>
<td>NaN</td>
<td>2</td>
<td>4</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>4</td>
<td>0001-4966</td>
<td>0001-4966</td>
<td>789</td>
<td>PRINT</td>
<td>1</td>
<td>5</td>
<td>PRINT</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>2470-0045</td>
<td>2470-0045</td>
<td>533</td>
<td>OTHER</td>
<td>3</td>
<td>1756</td>
<td>PRINT</td>
</tr>
<tr>
<td>1756</td>
<td>2470-0053</td>
<td>2470-0045</td>
<td>533</td>
<td>NaN</td>
<td>2</td>
<td>1757</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1757</td>
<td>2475-9953</td>
<td>2475-9953</td>
<td>608</td>
<td>ELECTRONIC</td>
<td>2</td>
<td>1758</td>
<td>ELECTRONIC</td>
</tr>
<tr>
<td>1758</td>
<td>2504-4427</td>
<td>2504-4427</td>
<td>994</td>
<td>PRINT</td>
<td>1</td>
<td>1759</td>
<td>NaN</td>
</tr>
<tr>
<td>1759</td>
<td>2504-4435</td>
<td>2504-4427</td>
<td>994</td>
<td>NaN</td>
<td>3</td>
<td>1760</td>
<td>NaN</td>
</tr>
</tbody>
</table>
<p>1760 rows × 7 columns</p>
</div>
```python
issns['issn_type'] = issns['issn_type'].astype(int)
```
```python
issns_export = issns[['id', 'issn', 'journal', 'issn_type']]
issns_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>issn</th>
<th>journal</th>
<th>issn_type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0001-2815</td>
<td>532</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>1399-0039</td>
<td>532</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>0001-4842</td>
<td>498</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>1520-4898</td>
<td>498</td>
<td>2</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>0001-4966</td>
<td>789</td>
<td>1</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>1756</td>
<td>2470-0045</td>
<td>533</td>
<td>3</td>
</tr>
<tr>
<td>1756</td>
<td>1757</td>
<td>2470-0053</td>
<td>533</td>
<td>2</td>
</tr>
<tr>
<td>1757</td>
<td>1758</td>
<td>2475-9953</td>
<td>608</td>
<td>2</td>
</tr>
<tr>
<td>1758</td>
<td>1759</td>
<td>2504-4427</td>
<td>994</td>
<td>1</td>
</tr>
<tr>
<td>1759</td>
<td>1760</td>
<td>2504-4435</td>
<td>994</td>
<td>3</td>
</tr>
</tbody>
</table>
<p>1760 rows × 4 columns</p>
</div>
```python
# supprimer les doublons par ISSN
issns_export = issns_export.drop_duplicates(subset='issn')
issns_export
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>issn</th>
<th>journal</th>
<th>issn_type</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>0001-2815</td>
<td>532</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>1399-0039</td>
<td>532</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>0001-4842</td>
<td>498</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>1520-4898</td>
<td>498</td>
<td>2</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>0001-4966</td>
<td>789</td>
<td>1</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1755</td>
<td>1756</td>
<td>2470-0045</td>
<td>533</td>
<td>3</td>
</tr>
<tr>
<td>1756</td>
<td>1757</td>
<td>2470-0053</td>
<td>533</td>
<td>2</td>
</tr>
<tr>
<td>1757</td>
<td>1758</td>
<td>2475-9953</td>
<td>608</td>
<td>2</td>
</tr>
<tr>
<td>1758</td>
<td>1759</td>
<td>2504-4427</td>
<td>994</td>
<td>1</td>
</tr>
<tr>
<td>1759</td>
<td>1760</td>
<td>2504-4435</td>
<td>994</td>
<td>3</td>
</tr>
</tbody>
</table>
<p>1760 rows × 4 columns</p>
</div>
```python
# esport JSON
result = issns_export.to_json(orient='records', force_ascii=False)
parsed = json.loads(result)
with open('sample/issn.json', 'w', encoding='utf-8') as file:
json.dump(parsed, file, indent=2, ensure_ascii=False)
```
```python
# export csv
issns_export.to_csv('sample/issn.tsv', sep='\t', encoding='utf-8', index=False)
```
```python
# export excel
issns_export.to_excel('sample/issn.xlsx', index=False)
```

Event Timeline