This file is larger than 256 KB, so syntax highlighting was skipped.
# Projet Open Access Compliance Check Tool (OACCT)
Projet P5 de la bibliothèque de l'EPFL en collaboration avec les bibliothèques des Universités de Genève, Lausanne et Berne : https://www.swissuniversities.ch/themen/digitalisierung/p-5-wissenschaftliche-information/projekte/swiss-mooc-service-1-1-1-1
Ce notebook permet d'extraire les données choisis parmis les sources obtenues par API et les traiter pour les rendre exploitables dans l'application OACCT.
Auteur : **Pablo Iriarte**, Université de Genève (pablo.iriarte@unige.ch)
Date de dernière mise à jour : 16.07.2021
## Extraction des données des revues
## Corpus initial
ISSNs des revues des publication archivées sur l'AoU UNIGE et sur Infoscience EPFL
* Fichier des ISSNs de l'AoU exporté le 16.10.2020
* Fichier des ISSNs de Infoscience exporté le 28.01.2021
* Données extraits à partir du JSON de ISSN.org
```python
import pandas as pd
import csv
import json
import numpy as np
import os
# paramètre pour le nombre de journaux dans le sample (0 pour prendre tout)
oas = oas.append({'id' : 2, 'status' : 'Green', 'description' : 'Paywalled access journal, usually allows the archive of submitted or accepted version on institutional repositories (embargo periods may apply)', 'subscription' : 1, 'accepted_manuscript' : 1, 'apc' : 0, 'final_version' : 0}, ignore_index=True)
oas = oas.append({'id' : 3, 'status' : 'hybrid', 'description' : 'Paywalled access journal, offers several Open Access upon payment of APCs. It allows offten the archive of published version on institutional repositories (embargo periods can apply)', 'subscription' : 1, 'accepted_manuscript' : 1, 'apc' : 1, 'final_version' : 1}, ignore_index=True)
oas = oas.append({'id' : 5, 'status' : 'Gold', 'description' : 'Open Access journal (payment of APCs may apply). It allows offten the archive of published version on institutional repositories (embargo periods can apply)', 'subscription' : 0, 'accepted_manuscript' : 1, 'apc' : 1, 'final_version' : 1}, ignore_index=True)
oas = oas.append({'id' : 6, 'status' : 'Diamond', 'description' : 'Open Access journal (without payment of APCs). It allows offten the archive of published version on institutional repositories (embargo periods can apply)', 'subscription' : 0, 'accepted_manuscript' : 1, 'apc' : 0, 'final_version' : 1}, ignore_index=True)
```
```python
oas
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>id</th>
<th>status</th>
<th>description</th>
<th>subscription</th>
<th>accepted_manuscript</th>
<th>apc</th>
<th>final_version</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>UNKNOWN</td>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Green</td>
<td>Paywalled access journal, usually allows the a...</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>hybrid</td>
<td>Paywalled access journal, offers several Open ...</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>5</td>
<td>Gold</td>
<td>Open Access journal (payment of APCs may apply...</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>6</td>
<td>Diamond</td>
<td>Open Access journal (without payment of APCs)....</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>
</div>
```python
# esport JSON
result = oas.to_json(orient='records', force_ascii=False)
parsed = json.loads(result)
with open('sample/oa.json', 'w', encoding='utf-8') as file:
C:\Users\iriarte\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>access_type</th>
<th>coverage_depth</th>
<th>coverage_notes</th>
<th>date_first_issue_online</th>
<th>date_last_issue_online</th>
<th>date_monograph_published_online</th>
<th>date_monograph_published_print</th>
<th>embargo_info</th>
<th>first_author</th>
<th>first_editor</th>
<th>...</th>
<th>num_last_vol_online</th>
<th>online_identifier</th>
<th>parent_publication_title_id</th>
<th>preceding_publication_title_id</th>
<th>print_identifier</th>
<th>publication_title</th>
<th>publication_type</th>
<th>publisher_name</th>
<th>title_id</th>
<th>title_url</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>NaN</td>
<td>fulltext</td>
<td>NaN</td>
<td>1969</td>
<td>2015</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>...</td>
<td>47.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>1074-0708</td>
<td>Journal of Agricultural and Applied Economics</td>