diff --git a/SQLFusionEmboitement_v2.ipynb b/SQLFusionEmboitement_v2.ipynb new file mode 100644 index 0000000..9607c7e --- /dev/null +++ b/SQLFusionEmboitement_v2.ipynb @@ -0,0 +1,298 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3f467ad7-4bdd-4ce9-86f6-738f90481072", + "metadata": {}, + "source": [ + "# Exercice SQL – fusion et emboitement de requêtes\n", + "\n", + "Cette série d'exercices a pour but de vous permettre de **pratiquer l'écriture de requêtes SQL avec les opérateurs ensemblistes**. \n", + "Nous allons utiliser :\n", + "* la base de données QSpatiaLite concernant les cigognes que nous avons utilisée précédemment\n", + "* SQL pour écrire les requêtes sur cette base de données\n", + "* le language de programmation Python afin d'interroger cette base de données (exécuter les requêtes)\n", + "\n", + "## Objectifs\n", + "\n", + "A la fin de cette série d'exercices vous devriez être en mesure de :\n", + "* distinguer les différents opérateurs ensemblistes et leurs effets\n", + "* écrire des requêtes réalisant des opérations logiques avec ces opérateurs\n", + "* interroger une base de données avec Python\n" + ] + }, + { + "cell_type": "markdown", + "id": "301c618a-d9d4-47a5-bdb3-16a29f350a28", + "metadata": { + "tags": [] + }, + "source": [ + "# 1. Accès à la base de données\n", + "Pour manipuler une base de donnée avec Python nous avons besoin des librairies `sqlite3` et `pandas`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1661fb65-c207-4d64-8163-d2e648744eeb", + "metadata": {}, + "outputs": [], + "source": [ + "# Importing libraries\n", + "import sqlite3\n", + "import pandas" + ] + }, + { + "cell_type": "markdown", + "id": "83333b57-f511-4ccf-8c65-9819d4a29d5e", + "metadata": {}, + "source": [ + "D'abord on établit une connexion à la base de données :" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2ae2db69-8429-4dea-949f-8457ccf0b6e1", + "metadata": {}, + "outputs": [], + "source": [ + "# Etablir la connexion à la base de données\n", + "con = sqlite3.connect(\"stork-mooc.sqlite\")" + ] + }, + { + "cell_type": "markdown", + "id": "217ca30f-8d8a-41f4-a4c0-1819d81c6f14", + "metadata": {}, + "source": [ + "Pour lister toutes les tables disponibles, on peut utiliser la table `sqlite_master` avec la requete suivante :\n", + "\n", + "```SELECT name FROM sqlite_master WHERE type='table';```\n", + "\n", + "Pour exécuter cette requête avec Python, on utilise la librairie `pandas` et sa fonction `read_sql_query` qui nous retourne directement une table.\n", + "Pour afficher cette table il suffit ensuite de faire un `display`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "949ae7ba-22e4-486d-9343-fd435d53eb8f", + "metadata": {}, + "outputs": [], + "source": [ + "# Donner la requête à executer\n", + "query = \"SELECT name FROM sqlite_master WHERE type='table';\"\n", + "\n", + "# Executer la requête et récupérer la table résultat\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Afficher la table résultat\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "9b3f5582-f7e5-4857-92b5-fb8cab5b26e5", + "metadata": {}, + "source": [ + "De la même manière on peut afficher le détail de la table \"Migration\". \n", + "Si on veut limiter l'affichage aux 5 premières lignes, il suffit d'ajouter `.head()` sur la table résultat." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba3d8cff-de55-4bed-a726-a52d312a333c", + "metadata": {}, + "outputs": [], + "source": [ + "# Donner la requête à executer\n", + "query = \"select * from migration\"\n", + "\n", + "# Executer la requête et récupérer la table résultat\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Afficher les 5 premières lignes de la table résultat\n", + "display(resulttable.head())" + ] + }, + { + "cell_type": "markdown", + "id": "ef5254fa-db75-4232-b380-8db6a581bebb", + "metadata": {}, + "source": [ + "# 2. Exercices avec SQL\n", + "\n", + "Pour tous les exercices suivants, **donnez la syntaxe SQL pour les requêtes demandées et testez vos requêtes.** \n", + "Afin que vos réponses puissent être corrigées adéquatement, veuillez suivre les quelques règles suivantes :\n", + "* Quand plusieurs attributs sont demandés, gardez l’ordre dans lequel ils apparaissent dans la question\n", + "* N’utilisez pas d’alias du nom de colonne ou table (sauf lorsque c’est demandé)\n", + "* Ne rajoutez pas de clause telle que WHERE, GROUP BY, ORDER BY, LIMIT\n", + "\n", + "## Exercice 1\n", + "\n", + "**Question**: Donnez le nom de toutes les cigognes qui ont migré à la fois en 2001 et en 2003 (en utilisant UNION, EXCEPT ou INTERSECT)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08fc5f61-202a-4e10-a0f6-c06333274b9b", + "metadata": {}, + "outputs": [], + "source": [ + "# Donner la requête à executer\n", + "query = \"select * from migration\" # TODO: MODIFIER LA REQUETE ICI\n", + "\n", + "# Executer la requête et récupérer la table résultat\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Afficher la table résultat\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "39f68f2a-4743-4d08-ae3e-3947318e4c52", + "metadata": { + "tags": [] + }, + "source": [ + "
\n", + "\n", + "**Besoin d'un indice ?** Cliquez sur les trois petits points \"...\" ci-dessous.\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "2cbcdf75-f154-4f3c-86b2-dce797949ff9", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "\n", + "Le résultat de votre requête doit avoir 4 lignes et 1 colonne. \n", + "Afin de déterminer le bon opérateur logique à utiliser, il peut être utile de faire un petit schéma sur le papier. \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "74657acc-86c2-4bb9-b194-564b9f5f58aa", + "metadata": {}, + "source": [ + "
\n", + "\n", + "**Self-checks**\n", + "\n", + "Vous pouvez executer la cellule suivante qui va vérifier automatiquement quelques caractéristiques du résultat de votre requête. \n", + "Ces tests ne sont pas suffisants pour assurer que vous avez la bonne réponse, mais ils vont vous permettre de savoir si vous êtes sur la bonne voie.\n", + " \n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "203d28d4-16da-4dea-a0d1-4cfb2d74aa10", + "metadata": {}, + "outputs": [], + "source": [ + "assert len(resulttable.index) == 4, f'Le résultat de vote requête a {len(resulttable.index)} ligne, alors que la solution a 4 lignes.'\n", + "assert len(resulttable.columns) == 1, f'Le résultat de vote requête a {len(resulttable.columns)} colonnes, alors que la solution a 1 colonne (la colonne \"name\").'\n", + "print(\"Le résultat de vote requête a le bon nombre de lignes (4) et de colonnes (1), c'est un bon début !\")" + ] + }, + { + "cell_type": "markdown", + "id": "6ac9efb7-ea81-4663-80e2-ab78ac61b5b6", + "metadata": {}, + "source": [ + "
\n", + "\n", + "**Validation**\n", + "\n", + "Executez la cellule suivante afin de savoir si votre requête est correcte ou non. \n", + "Lisez attentivement les éventuels messages d'erreur afin de corriger votre requête.\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "839e19f7-bdc8-494d-9fae-b41e3035e4a1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQLFusionEmboitement import *\n", + "question1_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "dc9e0a45-ab04-44d6-94e5-5a38e02db8b1", + "metadata": { + "tags": [] + }, + "source": [ + "
\n", + "\n", + "**Solution** - Cliquez sur les trois petits points \"...\" ci-dessous.\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "3248b00b-42fc-4043-849d-ce206ef3d94e", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Voici la requête correcte :\n", + " \n", + "`select name from migration where departureyear=2001 INTERSECT select name from migration where departureyear=2003`\n", + " \n", + "
\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/SQLFusionEmboitement_v2.py b/SQLFusionEmboitement_v2.py new file mode 100644 index 0000000..38b356e --- /dev/null +++ b/SQLFusionEmboitement_v2.py @@ -0,0 +1,115 @@ +import sqlite3 +import pandas +import hashlib +import pickle +from pandas._testing import assert_frame_equal + +""" +Utility functions +""" + +def run_query(q): + con = sqlite3.connect("stork-mooc.sqlite") + return pandas.read_sql_query(q, con) + +def generate_hash(dataframe): + return hashlib.md5(pickle.dumps(dataframe)).hexdigest() + +def check_hash(dataframe, dataframehash): + return generate_hash(dataframe) == dataframehash + + +""" +Validation function 1 +=> We compare with the correct answer: + select name from migration where departureyear=2001 + intersect + select name from migration where departureyear=2003 + +""" +def question1_validation_correct(studentquery): + + try: + + # *** Option 1 *** + # Compare with result of the correct query - but gives the correct query to the student + correctquery = "select name from migration where departureyear=2001 intersect select name from migration where departureyear=2003" + assert_frame_equal(run_query(studentquery), run_query(correctquery)) + + # *** Option 2 *** + # Compare with result table - but gives the correct result table to the student (who still has to figure out the query) + # (can also be quite cumbersome to write when the correct result table becomes large) + correctresult = pandas.DataFrame(data=["Annamarie", "Felix", "Jonas", "Prinzesschen"], columns=["name"]) + assert_frame_equal(run_query(studentquery), correctresult) + + # *** Option 3 *** + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery) + response_hash = 'b27b60409c967bcb8a93dbbc75ba33cd' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + return True + + +""" +Validation function 2 +=> We compare with the possible mistakes made by students + +Therefore we check that the result obtained by the student is different from: +* the union of 2001 and 2003 +* 2001 except 2003 +* 2003 except 2001 +* 2001 only +* 2003 only +""" +def question1_validation(studentquery): + + # Misconception: use of UNION + query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + if run_query(studentquery).equals(run_query(query)): + print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + # Misconception: use of EXCEPT + query = "select name from migration where departureyear=2001 except select name from migration where departureyear=2003" + if run_query(studentquery).equals(run_query(query)): + print('You selected the storks who departed in 2001 and REMOVED the storks who also departed in 2003.' + +'\nYou should get those who departed BOTH in 2001 AND 2003.' + +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + # Misconception: use of EXCEPT (opposite direction) + query = "select name from migration where departureyear=2003 except select name from migration where departureyear=2001" + if run_query(studentquery).equals(run_query(query)): + print('You selected the storks who departed in 2003 and REMOVED the storks who also departed in 2001.' + +'\nYou should get those who departed BOTH in 2001 AND 2003.' + +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + # Misconception: partial query + query = "select name from migration where departureyear=2001" + if run_query(studentquery).equals(run_query(query)): + print('You selected only the storks who departed in 2001.' + +'\nThis is a good way to start writing this type of query but you are only halfway to the solution.' + + '\nHow could you get the storks who departed BOTH in 2001 AND 2003?' + + '\nMaybe draw a diagram to determine which logical operator to use?') + + # Misconception: partial query (other side) + query = "select name from migration where departureyear=2003" + if run_query(studentquery).equals(run_query(query)): + print('You selected only the storks who departed in 2003.' + +'\nThis is a good way to start writing this type of query but you are only halfway to the solution.' + + '\nHow could you get the storks who departed BOTH in 2001 AND 2003?' + + '\nMaybe draw a diagram to determine which logical operator to use?') + + + # Default case for all other misconceptions + # => we fall back on the general validation function + assert question1_validation_correct(studentquery) \ No newline at end of file