05_oacct_issns.md
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Tue, Oct 8, 13:48

05_oacct_issns.md
View Options

	# Projet Open Access Compliance Check Tool (OACCT)

	Projet P5 de la bibliothèque de l'EPFL en collaboration avec les bibliothèques des Universités de Genève, Lausanne et Berne : https://www.swissuniversities.ch/themen/digitalisierung/p-5-wissenschaftliche-information/projekte/swiss-mooc-service-1-1-1-1

	Ce notebook permet d'extraire les données choisis parmis les sources obtenues par API et les traiter pour les rendre exploitables dans l'application OACCT.

	Auteur : Pablo Iriarte, Université de Genève (pablo.iriarte@unige.ch)
	Date de dernière mise à jour : 16.07.2021

	## Table ISSNs


	```python
	import pandas as pd
	import csv
	import json
	import numpy as np
	import os
	```


	```python
	# ajout des ISSN-L
	issns = pd.read_csv('issn/20171102.ISSN-to-ISSN-L.txt', encoding='utf-8', header=0, sep='\t')
	issns
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>ISSN</th>
	<th>ISSN-L</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>0000-0019</td>
	<td>0000-0019</td>
	</tr>
	<tr>
	<td>1</td>
	<td>0000-0027</td>
	<td>0000-0027</td>
	</tr>
	<tr>
	<td>2</td>
	<td>0000-0043</td>
	<td>0000-0043</td>
	</tr>
	<tr>
	<td>3</td>
	<td>0000-0051</td>
	<td>0000-0051</td>
	</tr>
	<tr>
	<td>4</td>
	<td>0000-006X</td>
	<td>0000-006X</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1995913</td>
	<td>8756-9957</td>
	<td>8756-9957</td>
	</tr>
	<tr>
	<td>1995914</td>
	<td>8756-9965</td>
	<td>8756-9965</td>
	</tr>
	<tr>
	<td>1995915</td>
	<td>8756-9973</td>
	<td>8756-9973</td>
	</tr>
	<tr>
	<td>1995916</td>
	<td>8756-9981</td>
	<td>8756-9981</td>
	</tr>
	<tr>
	<td>1995917</td>
	<td>8756-999X</td>
	<td>8756-999X</td>
	</tr>
	</tbody>
	</table>
	<p>1995918 rows × 2 columns</p>
	</div>




	```python
	# renommer les colonnes
	issns = issns.rename(columns={'ISSN' : 'issn', 'ISSN-L' : 'issnl'})
	issns
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>0000-0019</td>
	<td>0000-0019</td>
	</tr>
	<tr>
	<td>1</td>
	<td>0000-0027</td>
	<td>0000-0027</td>
	</tr>
	<tr>
	<td>2</td>
	<td>0000-0043</td>
	<td>0000-0043</td>
	</tr>
	<tr>
	<td>3</td>
	<td>0000-0051</td>
	<td>0000-0051</td>
	</tr>
	<tr>
	<td>4</td>
	<td>0000-006X</td>
	<td>0000-006X</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1995913</td>
	<td>8756-9957</td>
	<td>8756-9957</td>
	</tr>
	<tr>
	<td>1995914</td>
	<td>8756-9965</td>
	<td>8756-9965</td>
	</tr>
	<tr>
	<td>1995915</td>
	<td>8756-9973</td>
	<td>8756-9973</td>
	</tr>
	<tr>
	<td>1995916</td>
	<td>8756-9981</td>
	<td>8756-9981</td>
	</tr>
	<tr>
	<td>1995917</td>
	<td>8756-999X</td>
	<td>8756-999X</td>
	</tr>
	</tbody>
	</table>
	<p>1995918 rows × 2 columns</p>
	</div>




	```python
	journals = pd.read_csv('sample/journals_brut.tsv', encoding='utf-8', sep='\t', usecols=(['id', 'issn', 'issnl']))
	journals
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>id</th>
	<th>issn</th>
	<th>issnl</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>1</td>
	<td>1660-9379</td>
	<td>1660-9379</td>
	</tr>
	<tr>
	<td>1</td>
	<td>2</td>
	<td>0031-9007</td>
	<td>0031-9007</td>
	</tr>
	<tr>
	<td>2</td>
	<td>3</td>
	<td>1932-6203</td>
	<td>1932-6203</td>
	</tr>
	<tr>
	<td>3</td>
	<td>4</td>
	<td>2174-8454</td>
	<td>2174-8454</td>
	</tr>
	<tr>
	<td>4</td>
	<td>5</td>
	<td>1098-0121</td>
	<td>1098-0121</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>906</td>
	<td>997</td>
	<td>0964-1726</td>
	<td>0964-1726</td>
	</tr>
	<tr>
	<td>907</td>
	<td>998</td>
	<td>0022-3468</td>
	<td>0022-3468</td>
	</tr>
	<tr>
	<td>908</td>
	<td>999</td>
	<td>1432-2064</td>
	<td>0178-8051</td>
	</tr>
	<tr>
	<td>909</td>
	<td>1000</td>
	<td>0960-1481</td>
	<td>0960-1481</td>
	</tr>
	<tr>
	<td>910</td>
	<td>1001</td>
	<td>0161-7567</td>
	<td>0161-7567</td>
	</tr>
	</tbody>
	</table>
	<p>911 rows × 3 columns</p>
	</div>




	```python
	# renomer les colonnes id
	journals = journals.rename(columns = {'id' : 'journal'})
	journals
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>journal</th>
	<th>issn</th>
	<th>issnl</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>1</td>
	<td>1660-9379</td>
	<td>1660-9379</td>
	</tr>
	<tr>
	<td>1</td>
	<td>2</td>
	<td>0031-9007</td>
	<td>0031-9007</td>
	</tr>
	<tr>
	<td>2</td>
	<td>3</td>
	<td>1932-6203</td>
	<td>1932-6203</td>
	</tr>
	<tr>
	<td>3</td>
	<td>4</td>
	<td>2174-8454</td>
	<td>2174-8454</td>
	</tr>
	<tr>
	<td>4</td>
	<td>5</td>
	<td>1098-0121</td>
	<td>1098-0121</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>906</td>
	<td>997</td>
	<td>0964-1726</td>
	<td>0964-1726</td>
	</tr>
	<tr>
	<td>907</td>
	<td>998</td>
	<td>0022-3468</td>
	<td>0022-3468</td>
	</tr>
	<tr>
	<td>908</td>
	<td>999</td>
	<td>1432-2064</td>
	<td>0178-8051</td>
	</tr>
	<tr>
	<td>909</td>
	<td>1000</td>
	<td>0960-1481</td>
	<td>0960-1481</td>
	</tr>
	<tr>
	<td>910</td>
	<td>1001</td>
	<td>0161-7567</td>
	<td>0161-7567</td>
	</tr>
	</tbody>
	</table>
	<p>911 rows × 3 columns</p>
	</div>




	```python
	# test journals sans issn
	journals.loc[journals['issn'].isna()]
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>journal</th>
	<th>issn</th>
	<th>issnl</th>
	</tr>
	</thead>
	<tbody>
	</tbody>
	</table>
	</div>




	```python
	journals.loc[journals['journal'] == 5]
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>journal</th>
	<th>issn</th>
	<th>issnl</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>4</td>
	<td>5</td>
	<td>1098-0121</td>
	<td>1098-0121</td>
	</tr>
	</tbody>
	</table>
	</div>



	## Extraction du format


	```python
	# creation du DF
	col_names = ['issn',
	'format'
	]
	journals_format = pd.DataFrame(columns = col_names)
	journals_format
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>format</th>
	</tr>
	</thead>
	<tbody>
	</tbody>
	</table>
	</div>




	```python
	# extraction des informations à partir des données ISSN.org
	for index, row in journals.iterrows():
	# myid = row['journal']
	myissn = row['issn']
	# myissnl = row['issnl']
	if (((index/10) - int(index/10)) == 0) :
	print(index)
	# initialisation des variables à extraire
	myformat = np.nan
	# export en json
	if os.path.exists('issn/data/' + myissn + '.json'):
	with open('issn/data/' + myissn + '.json', 'r', encoding='utf-8') as f:
	data = json.load(f)
	for x in data['@graph']:
	if ('@id' in x):
	if (x['@id'] == 'resource/ISSN/' + myissn):
	if ('format' in x):
	myformats = x['format']
	if type(myformats) is list:
	myformat = myformats[0].replace('vocabularies/medium#', '')
	else :
	myformat = myformats.replace('vocabularies/medium#', '')
	# journals_format.at[index,'journal'] = myid
	journals_format.at[index,'issn'] = myissn
	# journals2.at[index,'issnl'] = myissnl
	journals_format.at[index,'format'] = myformat
	else :
	print(row['issn'] + ' - pas trouvé')
	```

	0
	10
	20
	30
	40
	50
	60
	70
	80
	90
	100
	110
	120
	130
	140
	150
	160
	170
	180
	190
	200
	210
	220
	230
	240
	250
	260
	270
	280
	290
	300
	310
	320
	330
	340
	350
	360
	370
	380
	390
	400
	410
	420
	430
	440
	450
	460
	470
	480
	490
	500
	510
	520
	530
	540
	550
	560
	570
	580
	590
	600
	610
	620
	630
	640
	650
	660
	670
	680
	690
	700
	710
	720
	730
	740
	750
	760
	770
	780
	790
	800
	810
	820
	830
	840
	850
	860
	870
	880
	890
	900
	910



	```python
	journals_format
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>format</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>1660-9379</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>1</td>
	<td>0031-9007</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>2</td>
	<td>1932-6203</td>
	<td>Online</td>
	</tr>
	<tr>
	<td>3</td>
	<td>2174-8454</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>4</td>
	<td>1098-0121</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>906</td>
	<td>0964-1726</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>907</td>
	<td>0022-3468</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>908</td>
	<td>1432-2064</td>
	<td>Online</td>
	</tr>
	<tr>
	<td>909</td>
	<td>0960-1481</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>910</td>
	<td>0161-7567</td>
	<td>Print</td>
	</tr>
	</tbody>
	</table>
	<p>911 rows × 2 columns</p>
	</div>




	```python
	# test
	journals_format.loc[journals_format['format'].isnull()]
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>format</th>
	</tr>
	</thead>
	<tbody>
	</tbody>
	</table>
	</div>




	```python
	journals_format['format'].value_counts()
	```




	Print 817
	Online 92
	Other 2
	Name: format, dtype: int64




	```python
	del journals['issn']
	```


	```python
	issns = pd.merge(issns, journals, on='issnl', how='outer')
	issns
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	<th>journal</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>0000-0019</td>
	<td>0000-0019</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1</td>
	<td>2150-4008</td>
	<td>0000-0019</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>2</td>
	<td>0000-0027</td>
	<td>0000-0027</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>3</td>
	<td>0000-0043</td>
	<td>0000-0043</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>4</td>
	<td>0000-0051</td>
	<td>0000-0051</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1995915</td>
	<td>8756-9973</td>
	<td>8756-9973</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1995916</td>
	<td>8756-9981</td>
	<td>8756-9981</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1995917</td>
	<td>8756-999X</td>
	<td>8756-999X</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1995918</td>
	<td>NaN</td>
	<td>2624-8557</td>
	<td>120.0</td>
	</tr>
	<tr>
	<td>1995919</td>
	<td>NaN</td>
	<td>0032-1052</td>
	<td>936.0</td>
	</tr>
	</tbody>
	</table>
	<p>1995920 rows × 3 columns</p>
	</div>




	```python
	# tester les lignes sans issn
	issns.loc[issns['issn'].isna()]
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	<th>journal</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>1995918</td>
	<td>NaN</td>
	<td>2624-8557</td>
	<td>120.0</td>
	</tr>
	<tr>
	<td>1995919</td>
	<td>NaN</td>
	<td>0032-1052</td>
	<td>936.0</td>
	</tr>
	</tbody>
	</table>
	</div>




	```python
	# garder les lilgnes non null
	issns = issns.loc[issns['issn'].notna()]
	```


	```python
	# isoler les lignes avec marge
	issns2 = issns.loc[issns['journal'].notna()]
	issns2
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	<th>journal</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>334</td>
	<td>0001-2815</td>
	<td>0001-2815</td>
	<td>532.0</td>
	</tr>
	<tr>
	<td>335</td>
	<td>1399-0039</td>
	<td>0001-2815</td>
	<td>532.0</td>
	</tr>
	<tr>
	<td>493</td>
	<td>0001-4842</td>
	<td>0001-4842</td>
	<td>498.0</td>
	</tr>
	<tr>
	<td>494</td>
	<td>1520-4898</td>
	<td>0001-4842</td>
	<td>498.0</td>
	</tr>
	<tr>
	<td>505</td>
	<td>0001-4966</td>
	<td>0001-4966</td>
	<td>789.0</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1921352</td>
	<td>2470-0045</td>
	<td>2470-0045</td>
	<td>533.0</td>
	</tr>
	<tr>
	<td>1921353</td>
	<td>2470-0053</td>
	<td>2470-0045</td>
	<td>533.0</td>
	</tr>
	<tr>
	<td>1925740</td>
	<td>2475-9953</td>
	<td>2475-9953</td>
	<td>608.0</td>
	</tr>
	<tr>
	<td>1951854</td>
	<td>2504-4427</td>
	<td>2504-4427</td>
	<td>994.0</td>
	</tr>
	<tr>
	<td>1951855</td>
	<td>2504-4435</td>
	<td>2504-4427</td>
	<td>994.0</td>
	</tr>
	</tbody>
	</table>
	<p>1760 rows × 3 columns</p>
	</div>




	```python
	# ajout du format par ISSN
	issns2 = pd.merge(issns2, journals_format, on='issn', how='outer')
	issns2
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	<th>journal</th>
	<th>format</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>0001-2815</td>
	<td>0001-2815</td>
	<td>532.0</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>1</td>
	<td>1399-0039</td>
	<td>0001-2815</td>
	<td>532.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>2</td>
	<td>0001-4842</td>
	<td>0001-4842</td>
	<td>498.0</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>3</td>
	<td>1520-4898</td>
	<td>0001-4842</td>
	<td>498.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>4</td>
	<td>0001-4966</td>
	<td>0001-4966</td>
	<td>789.0</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1758</td>
	<td>2504-4427</td>
	<td>2504-4427</td>
	<td>994.0</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>1759</td>
	<td>2504-4435</td>
	<td>2504-4427</td>
	<td>994.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1760</td>
	<td>2624-8557</td>
	<td>NaN</td>
	<td>NaN</td>
	<td>Online</td>
	</tr>
	<tr>
	<td>1761</td>
	<td>2469-9926</td>
	<td>NaN</td>
	<td>NaN</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>1762</td>
	<td>1529-4242</td>
	<td>NaN</td>
	<td>NaN</td>
	<td>Online</td>
	</tr>
	</tbody>
	</table>
	<p>1763 rows × 4 columns</p>
	</div>




	```python
	# isoler les lignes avec marge
	issns2 = issns2.loc[issns2['journal'].notna()]
	issns2
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	<th>journal</th>
	<th>format</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>0001-2815</td>
	<td>0001-2815</td>
	<td>532.0</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>1</td>
	<td>1399-0039</td>
	<td>0001-2815</td>
	<td>532.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>2</td>
	<td>0001-4842</td>
	<td>0001-4842</td>
	<td>498.0</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>3</td>
	<td>1520-4898</td>
	<td>0001-4842</td>
	<td>498.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>4</td>
	<td>0001-4966</td>
	<td>0001-4966</td>
	<td>789.0</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1755</td>
	<td>2470-0045</td>
	<td>2470-0045</td>
	<td>533.0</td>
	<td>Other</td>
	</tr>
	<tr>
	<td>1756</td>
	<td>2470-0053</td>
	<td>2470-0045</td>
	<td>533.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1757</td>
	<td>2475-9953</td>
	<td>2475-9953</td>
	<td>608.0</td>
	<td>Online</td>
	</tr>
	<tr>
	<td>1758</td>
	<td>2504-4427</td>
	<td>2504-4427</td>
	<td>994.0</td>
	<td>Print</td>
	</tr>
	<tr>
	<td>1759</td>
	<td>2504-4435</td>
	<td>2504-4427</td>
	<td>994.0</td>
	<td>NaN</td>
	</tr>
	</tbody>
	</table>
	<p>1760 rows × 4 columns</p>
	</div>




	```python
	issns2['format'] = issns2['format'].str.upper()
	issns2['format'] = issns2['format'].str.replace('ONLINE', 'ELECTRONIC')
	# DigitalCarrier
	issns2['format'] = issns2['format'].str.replace('DIGITALCARRIER', 'ELECTRONIC')
	issns2
	```

	C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
	A value is trying to be set on a copy of a slice from a DataFrame.
	Try using .loc[row_indexer,col_indexer] = value instead

	See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
	"""Entry point for launching an IPython kernel.
	C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:
	A value is trying to be set on a copy of a slice from a DataFrame.
	Try using .loc[row_indexer,col_indexer] = value instead

	See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

	C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning:
	A value is trying to be set on a copy of a slice from a DataFrame.
	Try using .loc[row_indexer,col_indexer] = value instead

	See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
	after removing the cwd from sys.path.





	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	<th>journal</th>
	<th>format</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>0001-2815</td>
	<td>0001-2815</td>
	<td>532.0</td>
	<td>PRINT</td>
	</tr>
	<tr>
	<td>1</td>
	<td>1399-0039</td>
	<td>0001-2815</td>
	<td>532.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>2</td>
	<td>0001-4842</td>
	<td>0001-4842</td>
	<td>498.0</td>
	<td>PRINT</td>
	</tr>
	<tr>
	<td>3</td>
	<td>1520-4898</td>
	<td>0001-4842</td>
	<td>498.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>4</td>
	<td>0001-4966</td>
	<td>0001-4966</td>
	<td>789.0</td>
	<td>PRINT</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1755</td>
	<td>2470-0045</td>
	<td>2470-0045</td>
	<td>533.0</td>
	<td>OTHER</td>
	</tr>
	<tr>
	<td>1756</td>
	<td>2470-0053</td>
	<td>2470-0045</td>
	<td>533.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1757</td>
	<td>2475-9953</td>
	<td>2475-9953</td>
	<td>608.0</td>
	<td>ELECTRONIC</td>
	</tr>
	<tr>
	<td>1758</td>
	<td>2504-4427</td>
	<td>2504-4427</td>
	<td>994.0</td>
	<td>PRINT</td>
	</tr>
	<tr>
	<td>1759</td>
	<td>2504-4435</td>
	<td>2504-4427</td>
	<td>994.0</td>
	<td>NaN</td>
	</tr>
	</tbody>
	</table>
	<p>1760 rows × 4 columns</p>
	</div>




	```python
	issns2['format'].value_counts()
	```




	PRINT 816
	ELECTRONIC 90
	OTHER 2
	Name: format, dtype: int64




	```python
	# tester les lignes sans issn
	issns2.loc[issns2['format'].isnull()]
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	<th>journal</th>
	<th>format</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>1</td>
	<td>1399-0039</td>
	<td>0001-2815</td>
	<td>532.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>3</td>
	<td>1520-4898</td>
	<td>0001-4842</td>
	<td>498.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>5</td>
	<td>1520-8524</td>
	<td>0001-4966</td>
	<td>789.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>6</td>
	<td>1520-9024</td>
	<td>0001-4966</td>
	<td>789.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>8</td>
	<td>0942-0940</td>
	<td>0001-6268</td>
	<td>166.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1750</td>
	<td>2469-9934</td>
	<td>2469-9926</td>
	<td>870.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1752</td>
	<td>2469-9969</td>
	<td>2469-9950</td>
	<td>41.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1754</td>
	<td>2470-0029</td>
	<td>2470-0010</td>
	<td>80.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1756</td>
	<td>2470-0053</td>
	<td>2470-0045</td>
	<td>533.0</td>
	<td>NaN</td>
	</tr>
	<tr>
	<td>1759</td>
	<td>2504-4435</td>
	<td>2504-4427</td>
	<td>994.0</td>
	<td>NaN</td>
	</tr>
	</tbody>
	</table>
	<p>852 rows × 4 columns</p>
	</div>




	```python
	# attribution de l'id du type
	# PRINT = 1
	# ELECTRONIC = 2
	# OTHER = 3
	issns2['issn_type'] = issns2['format']
	issns2['issn_type'] = issns2['issn_type'].str.replace('PRINT', '1')
	issns2['issn_type'] = issns2['issn_type'].str.replace('ELECTRONIC', '2')
	issns2['issn_type'] = issns2['issn_type'].str.replace('OTHER', '3')
	issns2['issn_type'] = issns2['issn_type'].fillna(3)
	issns2
	```

	C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:5: SettingWithCopyWarning:
	A value is trying to be set on a copy of a slice from a DataFrame.
	Try using .loc[row_indexer,col_indexer] = value instead

	See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
	"""
	C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:6: SettingWithCopyWarning:
	A value is trying to be set on a copy of a slice from a DataFrame.
	Try using .loc[row_indexer,col_indexer] = value instead

	See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

	C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:7: SettingWithCopyWarning:
	A value is trying to be set on a copy of a slice from a DataFrame.
	Try using .loc[row_indexer,col_indexer] = value instead

	See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
	import sys
	C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:8: SettingWithCopyWarning:
	A value is trying to be set on a copy of a slice from a DataFrame.
	Try using .loc[row_indexer,col_indexer] = value instead

	See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

	C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:9: SettingWithCopyWarning:
	A value is trying to be set on a copy of a slice from a DataFrame.
	Try using .loc[row_indexer,col_indexer] = value instead

	See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
	if __name__ == '__main__':





	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	<th>journal</th>
	<th>format</th>
	<th>issn_type</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>0001-2815</td>
	<td>0001-2815</td>
	<td>532.0</td>
	<td>PRINT</td>
	<td>1</td>
	</tr>
	<tr>
	<td>1</td>
	<td>1399-0039</td>
	<td>0001-2815</td>
	<td>532.0</td>
	<td>NaN</td>
	<td>3</td>
	</tr>
	<tr>
	<td>2</td>
	<td>0001-4842</td>
	<td>0001-4842</td>
	<td>498.0</td>
	<td>PRINT</td>
	<td>1</td>
	</tr>
	<tr>
	<td>3</td>
	<td>1520-4898</td>
	<td>0001-4842</td>
	<td>498.0</td>
	<td>NaN</td>
	<td>3</td>
	</tr>
	<tr>
	<td>4</td>
	<td>0001-4966</td>
	<td>0001-4966</td>
	<td>789.0</td>
	<td>PRINT</td>
	<td>1</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1755</td>
	<td>2470-0045</td>
	<td>2470-0045</td>
	<td>533.0</td>
	<td>OTHER</td>
	<td>3</td>
	</tr>
	<tr>
	<td>1756</td>
	<td>2470-0053</td>
	<td>2470-0045</td>
	<td>533.0</td>
	<td>NaN</td>
	<td>3</td>
	</tr>
	<tr>
	<td>1757</td>
	<td>2475-9953</td>
	<td>2475-9953</td>
	<td>608.0</td>
	<td>ELECTRONIC</td>
	<td>2</td>
	</tr>
	<tr>
	<td>1758</td>
	<td>2504-4427</td>
	<td>2504-4427</td>
	<td>994.0</td>
	<td>PRINT</td>
	<td>1</td>
	</tr>
	<tr>
	<td>1759</td>
	<td>2504-4435</td>
	<td>2504-4427</td>
	<td>994.0</td>
	<td>NaN</td>
	<td>3</td>
	</tr>
	</tbody>
	</table>
	<p>1760 rows × 5 columns</p>
	</div>




	```python
	# convertir journal en int
	issns2['journal'] = issns2['journal'].astype(int)
	```

	C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:
	A value is trying to be set on a copy of a slice from a DataFrame.
	Try using .loc[row_indexer,col_indexer] = value instead

	See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy




	```python
	# convertir l'index en id
	issns2 = issns2.reset_index()
	issns2['id'] = issns2['index'] + 1
	del issns2['index']
	issns2
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	<th>journal</th>
	<th>format</th>
	<th>issn_type</th>
	<th>id</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>0001-2815</td>
	<td>0001-2815</td>
	<td>532</td>
	<td>PRINT</td>
	<td>1</td>
	<td>1</td>
	</tr>
	<tr>
	<td>1</td>
	<td>1399-0039</td>
	<td>0001-2815</td>
	<td>532</td>
	<td>NaN</td>
	<td>3</td>
	<td>2</td>
	</tr>
	<tr>
	<td>2</td>
	<td>0001-4842</td>
	<td>0001-4842</td>
	<td>498</td>
	<td>PRINT</td>
	<td>1</td>
	<td>3</td>
	</tr>
	<tr>
	<td>3</td>
	<td>1520-4898</td>
	<td>0001-4842</td>
	<td>498</td>
	<td>NaN</td>
	<td>3</td>
	<td>4</td>
	</tr>
	<tr>
	<td>4</td>
	<td>0001-4966</td>
	<td>0001-4966</td>
	<td>789</td>
	<td>PRINT</td>
	<td>1</td>
	<td>5</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1755</td>
	<td>2470-0045</td>
	<td>2470-0045</td>
	<td>533</td>
	<td>OTHER</td>
	<td>3</td>
	<td>1756</td>
	</tr>
	<tr>
	<td>1756</td>
	<td>2470-0053</td>
	<td>2470-0045</td>
	<td>533</td>
	<td>NaN</td>
	<td>3</td>
	<td>1757</td>
	</tr>
	<tr>
	<td>1757</td>
	<td>2475-9953</td>
	<td>2475-9953</td>
	<td>608</td>
	<td>ELECTRONIC</td>
	<td>2</td>
	<td>1758</td>
	</tr>
	<tr>
	<td>1758</td>
	<td>2504-4427</td>
	<td>2504-4427</td>
	<td>994</td>
	<td>PRINT</td>
	<td>1</td>
	<td>1759</td>
	</tr>
	<tr>
	<td>1759</td>
	<td>2504-4435</td>
	<td>2504-4427</td>
	<td>994</td>
	<td>NaN</td>
	<td>3</td>
	<td>1760</td>
	</tr>
	</tbody>
	</table>
	<p>1760 rows × 6 columns</p>
	</div>




	```python
	issns2['issn_type'] = issns2['issn_type'].astype(int)
	```


	```python
	# supprimer les doublons par ISSN
	issns2 = issns2.drop_duplicates(subset='issn')
	issns2
	```




	<div>
	<style scoped>
	.dataframe tbody tr th:only-of-type {
	vertical-align: middle;
	}

	.dataframe tbody tr th {
	vertical-align: top;
	}

	.dataframe thead th {
	text-align: right;
	}
	</style>
	<table border="1" class="dataframe">
	<thead>
	<tr style="text-align: right;">
	<th></th>
	<th>issn</th>
	<th>issnl</th>
	<th>journal</th>
	<th>format</th>
	<th>issn_type</th>
	<th>id</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>0</td>
	<td>0001-2815</td>
	<td>0001-2815</td>
	<td>532</td>
	<td>PRINT</td>
	<td>1</td>
	<td>1</td>
	</tr>
	<tr>
	<td>1</td>
	<td>1399-0039</td>
	<td>0001-2815</td>
	<td>532</td>
	<td>NaN</td>
	<td>3</td>
	<td>2</td>
	</tr>
	<tr>
	<td>2</td>
	<td>0001-4842</td>
	<td>0001-4842</td>
	<td>498</td>
	<td>PRINT</td>
	<td>1</td>
	<td>3</td>
	</tr>
	<tr>
	<td>3</td>
	<td>1520-4898</td>
	<td>0001-4842</td>
	<td>498</td>
	<td>NaN</td>
	<td>3</td>
	<td>4</td>
	</tr>
	<tr>
	<td>4</td>
	<td>0001-4966</td>
	<td>0001-4966</td>
	<td>789</td>
	<td>PRINT</td>
	<td>1</td>
	<td>5</td>
	</tr>
	<tr>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	<td>...</td>
	</tr>
	<tr>
	<td>1755</td>
	<td>2470-0045</td>
	<td>2470-0045</td>
	<td>533</td>
	<td>OTHER</td>
	<td>3</td>
	<td>1756</td>
	</tr>
	<tr>
	<td>1756</td>
	<td>2470-0053</td>
	<td>2470-0045</td>
	<td>533</td>
	<td>NaN</td>
	<td>3</td>
	<td>1757</td>
	</tr>
	<tr>
	<td>1757</td>
	<td>2475-9953</td>
	<td>2475-9953</td>
	<td>608</td>
	<td>ELECTRONIC</td>
	<td>2</td>
	<td>1758</td>
	</tr>
	<tr>
	<td>1758</td>
	<td>2504-4427</td>
	<td>2504-4427</td>
	<td>994</td>
	<td>PRINT</td>
	<td>1</td>
	<td>1759</td>
	</tr>
	<tr>
	<td>1759</td>
	<td>2504-4435</td>
	<td>2504-4427</td>
	<td>994</td>
	<td>NaN</td>
	<td>3</td>
	<td>1760</td>
	</tr>
	</tbody>
	</table>
	<p>1760 rows × 6 columns</p>
	</div>




	```python
	# export csv
	issns2.to_csv('sample/issn_brut.tsv', sep='\t', encoding='utf-8', index=False)
	```


	```python
	# export excel
	issns2.to_excel('sample/issn_brut.xlsx', index=False)
	```


	```python
	# export CSV des IDs
	issns2[['id', 'issn', 'issnl', 'journal']].to_csv('sample/issn_ids.tsv', sep='\t', encoding='utf-8', index=False)
	```


	```python
	# export excel des IDs
	issns2[['id', 'issn', 'issnl', 'journal']].to_excel('sample/issn_ids.xlsx', index=False)
	```

05_oacct_issns.mdNo OneTemporaryActions

File Metadata

05_oacct_issns.mdView Options

Event Timeline

05_oacct_issns.md
No OneTemporary
Actions

05_oacct_issns.md
View Options