diff --git a/SQLFusionEmboitement_v2.ipynb b/SQLFusionEmboitement_v2.ipynb
new file mode 100644
index 0000000..9607c7e
--- /dev/null
+++ b/SQLFusionEmboitement_v2.ipynb
@@ -0,0 +1,298 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "3f467ad7-4bdd-4ce9-86f6-738f90481072",
+ "metadata": {},
+ "source": [
+ "# Exercice SQL – fusion et emboitement de requêtes\n",
+ "\n",
+ "Cette série d'exercices a pour but de vous permettre de **pratiquer l'écriture de requêtes SQL avec les opérateurs ensemblistes**. \n",
+ "Nous allons utiliser :\n",
+ "* la base de données QSpatiaLite concernant les cigognes que nous avons utilisée précédemment\n",
+ "* SQL pour écrire les requêtes sur cette base de données\n",
+ "* le language de programmation Python afin d'interroger cette base de données (exécuter les requêtes)\n",
+ "\n",
+ "## Objectifs\n",
+ "\n",
+ "A la fin de cette série d'exercices vous devriez être en mesure de :\n",
+ "* distinguer les différents opérateurs ensemblistes et leurs effets\n",
+ "* écrire des requêtes réalisant des opérations logiques avec ces opérateurs\n",
+ "* interroger une base de données avec Python\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "301c618a-d9d4-47a5-bdb3-16a29f350a28",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "# 1. Accès à la base de données\n",
+ "Pour manipuler une base de donnée avec Python nous avons besoin des librairies `sqlite3` et `pandas`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1661fb65-c207-4d64-8163-d2e648744eeb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Importing libraries\n",
+ "import sqlite3\n",
+ "import pandas"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "83333b57-f511-4ccf-8c65-9819d4a29d5e",
+ "metadata": {},
+ "source": [
+ "D'abord on établit une connexion à la base de données :"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2ae2db69-8429-4dea-949f-8457ccf0b6e1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Etablir la connexion à la base de données\n",
+ "con = sqlite3.connect(\"stork-mooc.sqlite\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "217ca30f-8d8a-41f4-a4c0-1819d81c6f14",
+ "metadata": {},
+ "source": [
+ "Pour lister toutes les tables disponibles, on peut utiliser la table `sqlite_master` avec la requete suivante :\n",
+ "\n",
+ "```SELECT name FROM sqlite_master WHERE type='table';```\n",
+ "\n",
+ "Pour exécuter cette requête avec Python, on utilise la librairie `pandas` et sa fonction `read_sql_query` qui nous retourne directement une table.\n",
+ "Pour afficher cette table il suffit ensuite de faire un `display`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "949ae7ba-22e4-486d-9343-fd435d53eb8f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Donner la requête à executer\n",
+ "query = \"SELECT name FROM sqlite_master WHERE type='table';\"\n",
+ "\n",
+ "# Executer la requête et récupérer la table résultat\n",
+ "resulttable = pandas.read_sql_query(query, con)\n",
+ "\n",
+ "# Afficher la table résultat\n",
+ "display(resulttable)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9b3f5582-f7e5-4857-92b5-fb8cab5b26e5",
+ "metadata": {},
+ "source": [
+ "De la même manière on peut afficher le détail de la table \"Migration\". \n",
+ "Si on veut limiter l'affichage aux 5 premières lignes, il suffit d'ajouter `.head()` sur la table résultat."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ba3d8cff-de55-4bed-a726-a52d312a333c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Donner la requête à executer\n",
+ "query = \"select * from migration\"\n",
+ "\n",
+ "# Executer la requête et récupérer la table résultat\n",
+ "resulttable = pandas.read_sql_query(query, con)\n",
+ "\n",
+ "# Afficher les 5 premières lignes de la table résultat\n",
+ "display(resulttable.head())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ef5254fa-db75-4232-b380-8db6a581bebb",
+ "metadata": {},
+ "source": [
+ "# 2. Exercices avec SQL\n",
+ "\n",
+ "Pour tous les exercices suivants, **donnez la syntaxe SQL pour les requêtes demandées et testez vos requêtes.** \n",
+ "Afin que vos réponses puissent être corrigées adéquatement, veuillez suivre les quelques règles suivantes :\n",
+ "* Quand plusieurs attributs sont demandés, gardez l’ordre dans lequel ils apparaissent dans la question\n",
+ "* N’utilisez pas d’alias du nom de colonne ou table (sauf lorsque c’est demandé)\n",
+ "* Ne rajoutez pas de clause telle que WHERE, GROUP BY, ORDER BY, LIMIT\n",
+ "\n",
+ "## Exercice 1\n",
+ "\n",
+ "**Question**: Donnez le nom de toutes les cigognes qui ont migré à la fois en 2001 et en 2003 (en utilisant UNION, EXCEPT ou INTERSECT)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "08fc5f61-202a-4e10-a0f6-c06333274b9b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Donner la requête à executer\n",
+ "query = \"select * from migration\" # TODO: MODIFIER LA REQUETE ICI\n",
+ "\n",
+ "# Executer la requête et récupérer la table résultat\n",
+ "resulttable = pandas.read_sql_query(query, con)\n",
+ "\n",
+ "# Afficher la table résultat\n",
+ "display(resulttable)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "39f68f2a-4743-4d08-ae3e-3947318e4c52",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "
\n",
+ "\n",
+ "**Besoin d'un indice ?** Cliquez sur les trois petits points \"...\" ci-dessous.\n",
+ "\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2cbcdf75-f154-4f3c-86b2-dce797949ff9",
+ "metadata": {
+ "jupyter": {
+ "source_hidden": true
+ },
+ "tags": []
+ },
+ "source": [
+ "\n",
+ "\n",
+ "\n",
+ "Le résultat de votre requête doit avoir 4 lignes et 1 colonne. \n",
+ "Afin de déterminer le bon opérateur logique à utiliser, il peut être utile de faire un petit schéma sur le papier. \n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "74657acc-86c2-4bb9-b194-564b9f5f58aa",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "**Self-checks**\n",
+ "\n",
+ "Vous pouvez executer la cellule suivante qui va vérifier automatiquement quelques caractéristiques du résultat de votre requête. \n",
+ "Ces tests ne sont pas suffisants pour assurer que vous avez la bonne réponse, mais ils vont vous permettre de savoir si vous êtes sur la bonne voie.\n",
+ " \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "203d28d4-16da-4dea-a0d1-4cfb2d74aa10",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "assert len(resulttable.index) == 4, f'Le résultat de vote requête a {len(resulttable.index)} ligne, alors que la solution a 4 lignes.'\n",
+ "assert len(resulttable.columns) == 1, f'Le résultat de vote requête a {len(resulttable.columns)} colonnes, alors que la solution a 1 colonne (la colonne \"name\").'\n",
+ "print(\"Le résultat de vote requête a le bon nombre de lignes (4) et de colonnes (1), c'est un bon début !\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6ac9efb7-ea81-4663-80e2-ab78ac61b5b6",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "**Validation**\n",
+ "\n",
+ "Executez la cellule suivante afin de savoir si votre requête est correcte ou non. \n",
+ "Lisez attentivement les éventuels messages d'erreur afin de corriger votre requête.\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "839e19f7-bdc8-494d-9fae-b41e3035e4a1",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "from SQLFusionEmboitement import *\n",
+ "question1_validation(query)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dc9e0a45-ab04-44d6-94e5-5a38e02db8b1",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "\n",
+ "\n",
+ "**Solution** - Cliquez sur les trois petits points \"...\" ci-dessous.\n",
+ "\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3248b00b-42fc-4043-849d-ce206ef3d94e",
+ "metadata": {
+ "jupyter": {
+ "source_hidden": true
+ },
+ "tags": []
+ },
+ "source": [
+ "\n",
+ "\n",
+ "Voici la requête correcte :\n",
+ " \n",
+ "`select name from migration where departureyear=2001 INTERSECT select name from migration where departureyear=2003`\n",
+ " \n",
+ "
\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.10"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/SQLFusionEmboitement_v2.py b/SQLFusionEmboitement_v2.py
new file mode 100644
index 0000000..38b356e
--- /dev/null
+++ b/SQLFusionEmboitement_v2.py
@@ -0,0 +1,115 @@
+import sqlite3
+import pandas
+import hashlib
+import pickle
+from pandas._testing import assert_frame_equal
+
+"""
+Utility functions
+"""
+
+def run_query(q):
+ con = sqlite3.connect("stork-mooc.sqlite")
+ return pandas.read_sql_query(q, con)
+
+def generate_hash(dataframe):
+ return hashlib.md5(pickle.dumps(dataframe)).hexdigest()
+
+def check_hash(dataframe, dataframehash):
+ return generate_hash(dataframe) == dataframehash
+
+
+"""
+Validation function 1
+=> We compare with the correct answer:
+ select name from migration where departureyear=2001
+ intersect
+ select name from migration where departureyear=2003
+
+"""
+def question1_validation_correct(studentquery):
+
+ try:
+
+ # *** Option 1 ***
+ # Compare with result of the correct query - but gives the correct query to the student
+ correctquery = "select name from migration where departureyear=2001 intersect select name from migration where departureyear=2003"
+ assert_frame_equal(run_query(studentquery), run_query(correctquery))
+
+ # *** Option 2 ***
+ # Compare with result table - but gives the correct result table to the student (who still has to figure out the query)
+ # (can also be quite cumbersome to write when the correct result table becomes large)
+ correctresult = pandas.DataFrame(data=["Annamarie", "Felix", "Jonas", "Prinzesschen"], columns=["name"])
+ assert_frame_equal(run_query(studentquery), correctresult)
+
+ # *** Option 3 ***
+ # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table
+ # (this also means that the error message cannot help students correct their answer)
+ # NB: the hash of the correct result table has to be generated beforehand (using the utility function)
+ # generate_hash(run_query(correctquery)
+ response_hash = 'b27b60409c967bcb8a93dbbc75ba33cd'
+ assert check_hash(run_query(studentquery), response_hash), "Incorrect query."
+
+ except AssertionError as ae:
+ print("Your query didn't pass the validation tests.")
+ # We raise the exception here to stop execution and prevent the success message to be printed
+ raise ae
+
+ print("Your query has passed the validation tests and is evaluated as correct, well done!")
+ return True
+
+
+"""
+Validation function 2
+=> We compare with the possible mistakes made by students
+
+Therefore we check that the result obtained by the student is different from:
+* the union of 2001 and 2003
+* 2001 except 2003
+* 2003 except 2001
+* 2001 only
+* 2003 only
+"""
+def question1_validation(studentquery):
+
+ # Misconception: use of UNION
+ query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003"
+ if run_query(studentquery).equals(run_query(query)):
+ print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.'
+ +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.'
+ +'\nHave you drawn a diagram on paper to determine which logical operator to use?')
+
+ # Misconception: use of EXCEPT
+ query = "select name from migration where departureyear=2001 except select name from migration where departureyear=2003"
+ if run_query(studentquery).equals(run_query(query)):
+ print('You selected the storks who departed in 2001 and REMOVED the storks who also departed in 2003.'
+ +'\nYou should get those who departed BOTH in 2001 AND 2003.'
+ +'\nHave you drawn a diagram on paper to determine which logical operator to use?')
+
+ # Misconception: use of EXCEPT (opposite direction)
+ query = "select name from migration where departureyear=2003 except select name from migration where departureyear=2001"
+ if run_query(studentquery).equals(run_query(query)):
+ print('You selected the storks who departed in 2003 and REMOVED the storks who also departed in 2001.'
+ +'\nYou should get those who departed BOTH in 2001 AND 2003.'
+ +'\nHave you drawn a diagram on paper to determine which logical operator to use?')
+
+ # Misconception: partial query
+ query = "select name from migration where departureyear=2001"
+ if run_query(studentquery).equals(run_query(query)):
+ print('You selected only the storks who departed in 2001.'
+ +'\nThis is a good way to start writing this type of query but you are only halfway to the solution.'
+ + '\nHow could you get the storks who departed BOTH in 2001 AND 2003?'
+ + '\nMaybe draw a diagram to determine which logical operator to use?')
+
+ # Misconception: partial query (other side)
+ query = "select name from migration where departureyear=2003"
+ if run_query(studentquery).equals(run_query(query)):
+ print('You selected only the storks who departed in 2003.'
+ +'\nThis is a good way to start writing this type of query but you are only halfway to the solution.'
+ + '\nHow could you get the storks who departed BOTH in 2001 AND 2003?'
+ + '\nMaybe draw a diagram to determine which logical operator to use?')
+
+
+ # Default case for all other misconceptions
+ # => we fall back on the general validation function
+ assert question1_validation_correct(studentquery)
\ No newline at end of file