diff --git a/1-SQL_basics/SQL_introduction.ipynb b/1-SQL_basics/SQL_introduction.ipynb new file mode 100644 index 0000000..f66e824 --- /dev/null +++ b/1-SQL_basics/SQL_introduction.ipynb @@ -0,0 +1,631 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3f467ad7-4bdd-4ce9-86f6-738f90481072", + "metadata": { + "tags": [] + }, + "source": [ + "# SQL exercise – Introduction\n", + "\n", + "The questions in this exercise are about the syntax of SQL queries.\n", + "\n", + "We will use:\n", + "* the QSpatiaLite stork database that we used earlier;\n", + "* SQL to write queries on this database;\n", + "* the programming language Python to query this database (execute the queries).\n", + "\n", + "
\n", + "
\n", + " How to use this notebook?
\n", + " This notebook consists of text cells and code cells (Python). The code cells must be executed in order to see the output of the program.
To execute a cell, select it and then simply click on the \"play\" button () in the toolbar at the top of the notebook, or type shift + enter.
It is important to run the code cells in the same order as they appear in the notebook.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "a08280d1-df23-4e8a-ae6f-64b55897c079", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "4c50fb80-2bb2-4b29-8ead-05766a824b17", + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, + "source": [ + "# Accessing the database with Python\n", + "\n", + "In this first part, we show you with an example how to write and execute your queries on the stork database with Python." + ] + }, + { + "cell_type": "markdown", + "id": "e3d8815d-b67b-4d99-90bb-455a66a8cbfa", + "metadata": { + "tags": [] + }, + "source": [ + "## Importing Python libraries\n", + "To manipulate a database with Python we need the `sqlite3` and `pandas` libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1661fb65-c207-4d64-8163-d2e648744eeb", + "metadata": {}, + "outputs": [], + "source": [ + "# Importing libraries\n", + "import sqlite3\n", + "import pandas" + ] + }, + { + "cell_type": "markdown", + "id": "83333b57-f511-4ccf-8c65-9819d4a29d5e", + "metadata": {}, + "source": [ + "## Connection to the database \n", + "First, a connection to the database is established:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2ae2db69-8429-4dea-949f-8457ccf0b6e1", + "metadata": {}, + "outputs": [], + "source": [ + "# Establish the connection to the database:\n", + "con = sqlite3.connect(\"stork-mooc.sqlite\")" + ] + }, + { + "cell_type": "markdown", + "id": "8006f1ac-8e69-48ac-84bb-879ae9ad2ffd", + "metadata": {}, + "source": [ + "## Executing SQL queries\n", + "\n", + "To execute a SQL query on our database with Python, we use the `read_sql_query` function from the `pandas` library which :\n", + "* executes the query\n", + "* returns the result table\n", + "To display this table, you just have to do a `display`.\n", + "\n", + "Here is an example, which executes the query `select * from migration` and displays the result:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba3d8cff-de55-4bed-a726-a52d312a333c", + "metadata": {}, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\"\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "810b6945-370b-4f7a-993a-1bb6ed83138e", + "metadata": {}, + "source": [ + "
\n", + "\n", + "**Question** : modify the above query to retrieve the contents of the `stork` table." + ] + }, + { + "cell_type": "markdown", + "id": "ecb47a7a-a6dd-44b4-9ce1-28e8d491b7e2", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "df308bbe-b320-4ce9-8724-4b6b27908bc6", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select * from stork`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "ce2931e6-8935-407c-9f03-2dbdbce7f33c", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "507ced8b-8ac0-4361-91f8-86d57a4b0659", + "metadata": {}, + "source": [ + "# Exercises with SQL\n", + "\n", + "For each of the following questions, **give the SQL syntax for the requested queries** and **test your queries.** \n", + "\n", + "As a reminder, here is the UML diagram of our stork database. \n", + "\n", + "\"UML" + ] + }, + { + "cell_type": "markdown", + "id": "cdef6666-e9c7-4614-8d89-95ea6629d98d", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 1\n", + "**Question** : Give the name and year of birth of all the storks.\n", + "\n", + "Here is how to test your query: \n", + "1. Edit the query in the cell below and then execute the cell to see the result. \n", + "You can edit and re-execute the cell as many times as necessary. \n", + "2. Once you are satisfied with your query, you can execute the validation cell below to check if your query is correct or not. \n", + "3. Finally, you can compare your query with the solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08fc5f61-202a-4e10-a0f6-c06333274b9b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "6ac9efb7-ea81-4663-80e2-ab78ac61b5b6", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "839e19f7-bdc8-494d-9fae-b41e3035e4a1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_introduction import *\n", + "question1_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "dc9e0a45-ab04-44d6-94e5-5a38e02db8b1", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "3248b00b-42fc-4043-849d-ce206ef3d94e", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select Stork.name, Stork.yearofbirth from Stork`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "5ded8691-5cc7-4ffd-8e7a-8b9e68ec9b28", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 2\n", + "\n", + "**Question** : Rename the \"name\" column to \"nom\" and the \"yearofbirth\" column to \"annee_naissance\", then give the name and year of birth for all storks in the database.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0eef1a44-4781-4011-8dc2-cea0c48b4ccb", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "e48b0894-c785-4b8e-a87e-e90baac5bdd7", + "metadata": { + "tags": [] + }, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b60cc09c-cbe5-4055-bdbf-dff6ef205faa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_introduction import *\n", + "question2_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "8764c3f4-57c1-4690-a841-ef558b20e36f", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "74758991-7c8a-49be-8c23-d99f3505b03b", + "metadata": { + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select Stork.name as nom, Stork.yearofbirth as annee_naissance from Stork`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "baf8ea9d-52c7-4f09-b2da-f7a2350edb84", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 3\n", + "\n", + "**Question** : List all of the years in which a stork is born so that each year only appears once." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6d53f99a-d108-446c-911a-23d6b7e26687", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "94ffa2a0-ac1b-4ba5-b7b1-6a59c22aa6f7", + "metadata": { + "tags": [] + }, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "edd89993-4524-4260-ae45-c03dcef23817", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_introduction import *\n", + "question3_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "9dbd7018-fe8c-47ef-b32f-e8ebce6fb84b", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e4b270be-9464-4b53-8dc9-ae0b6d45f30f", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select distinct Stork.yearofbirth from Stork`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "2181501d-9609-45db-add3-11fef5627712", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 4\n", + "\n", + "**Question** : Select all of the attributes in the Member table (use the symbol shortcut)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b0301e33-84c2-4519-ba42-315e274b4771", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "a6d3cbb4-ba05-4f42-a5cf-57c034a7031f", + "metadata": { + "tags": [] + }, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f6fa35d-2308-4d9f-91ad-3b849f122479", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_introduction import *\n", + "question4_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "b5acd255-bf2f-4eb4-b3ac-e0f1d3b9c3e7", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "4b0e5608-1b34-47ca-9e4e-eac6f9fb944b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query :\n", + " \n", + "`Select * from member`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "43863949-1e0d-4dee-9f08-f560700677a5", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 5\n", + "\n", + "**Question** : Select the names of all members as well as the names of all storks (do not join the tables)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "934f9bfa-dd22-456f-9137-7eed970324a4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "296dde86-dd6f-4d2f-8c17-d8ab577d9455", + "metadata": { + "tags": [] + }, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4aaf1326-f94a-4527-b334-266e5989238f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_introduction import *\n", + "question5_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "0ef2b7d1-2221-4833-b7f6-2741adb7e24c", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e3030f35-7b38-4be0-852b-b1dff7635117", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`Select member.name, stork.name from member, stork`\n", + " \n", + "
\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/1-SQL_basics/SQL_introduction.py b/1-SQL_basics/SQL_introduction.py new file mode 100644 index 0000000..40ac2d6 --- /dev/null +++ b/1-SQL_basics/SQL_introduction.py @@ -0,0 +1,199 @@ +import sqlite3 +import pandas +import hashlib +import pickle +from pandas._testing import assert_frame_equal + +""" +Utility functions +""" +def run_query(q): + con = sqlite3.connect("stork-mooc.sqlite") + return pandas.read_sql_query(q, con) + +def generate_hash(dataframe): + return hashlib.md5(pickle.dumps(dataframe)).hexdigest() + +def check_hash(dataframe, dataframehash): + return generate_hash(dataframe) == dataframehash + + +""" +Validation function for Question 1 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question1_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '42d655bac29b2d16b27e41a6c10ad713' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 2 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question2_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '76f2c51487cb966bd3c554c1a96957cb' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 3 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question3_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = 'd54eca32b2124f5a6583bf8048aa48e1' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 4 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question4_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '63f4af4d374da63e8f5293c7ae9027c0' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + """ +Validation function for Question 5 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question5_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '126e62188b4a0e56dc933a66637d6310' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + \ No newline at end of file diff --git a/1-SQL_basics/stork-mooc.sqlite b/1-SQL_basics/stork-mooc.sqlite new file mode 100644 index 0000000..74740eb Binary files /dev/null and b/1-SQL_basics/stork-mooc.sqlite differ diff --git a/1-SQL_basics/uml_diagram_stork_database.png b/1-SQL_basics/uml_diagram_stork_database.png new file mode 100644 index 0000000..6c2a969 Binary files /dev/null and b/1-SQL_basics/uml_diagram_stork_database.png differ diff --git a/2-WHERE_clause/SQL_conditions.ipynb b/2-WHERE_clause/SQL_conditions.ipynb new file mode 100644 index 0000000..e4e1462 --- /dev/null +++ b/2-WHERE_clause/SQL_conditions.ipynb @@ -0,0 +1,883 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "88f7253c-9f21-4600-916b-0cf95eb2bc9a", + "metadata": { + "tags": [] + }, + "source": [ + "# SQL exercise – Conditional queries\n", + "\n", + "The questions in this exercise are about the syntax of SQL queries.\n", + "\n", + "We will use:\n", + "* the QSpatiaLite stork database that we used earlier;\n", + "* SQL to write queries on this database;\n", + "* the programming language Python to query this database (execute the queries).\n", + "\n", + "
\n", + "
\n", + " How to use this notebook?
\n", + " This notebook consists of text cells and code cells (Python). The code cells must be executed in order to see the output of the program.
To execute a cell, select it and then simply click on the \"play\" button () in the toolbar at the top of the notebook, or type shift + enter.
It is important to run the code cells in the same order as they appear in the notebook.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "329266d4-d6a6-4288-afa5-f09f1b49b3b6", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "f729f7c4-8f0d-4257-b630-f2196d029404", + "metadata": { + "tags": [] + }, + "source": [ + "# Accessing the database with Python\n", + "\n", + "In this first part, we show you with an example how to write and execute your queries on the stork database with Python." + ] + }, + { + "cell_type": "markdown", + "id": "2618b301-9b25-424a-85be-b5926f4c1e59", + "metadata": { + "tags": [] + }, + "source": [ + "## Importing Python libraries\n", + "To manipulate a database with Python we need the `sqlite3` and `pandas` libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "471250f8-7def-4004-8b0f-87d62cd225f7", + "metadata": {}, + "outputs": [], + "source": [ + "# Importing libraries\n", + "import sqlite3\n", + "import pandas" + ] + }, + { + "cell_type": "markdown", + "id": "cd0f126e-7ee4-4679-af77-c6d7c1500731", + "metadata": {}, + "source": [ + "## Connection to the database \n", + "First, a connection to the database is established:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5fa025fa-c7ef-416c-8949-09c4c99166a7", + "metadata": {}, + "outputs": [], + "source": [ + "# Establish the connection to the database:\n", + "con = sqlite3.connect(\"stork-mooc.sqlite\")" + ] + }, + { + "cell_type": "markdown", + "id": "256ac196-c886-4fac-a85d-312a5e8cea7e", + "metadata": {}, + "source": [ + "## Executing SQL queries\n", + "\n", + "To execute a SQL query on our database with Python, we use the `read_sql_query` function from the `pandas` library which :\n", + "* executes the query\n", + "* returns the result table\n", + "To display this table, you just have to do a `display`.\n", + "\n", + "Here is an example, which executes the query `select * from migration` and displays the result:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3712f3c8-027f-4492-bffc-7d1d70eb8400", + "metadata": {}, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\"\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "a8110b94-caae-442e-9f72-6aca5fe084c7", + "metadata": {}, + "source": [ + "
\n", + "\n", + "**Question** : modify the above query to retrieve the contents of the `stork` table." + ] + }, + { + "cell_type": "markdown", + "id": "92a6c39d-b5a6-4510-8043-e6cea497bcae", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "b91caccd-0777-4dca-8dba-a73e9a11f807", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select * from stork`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "5403e1d0-7cf3-4362-8d9c-6947dbaf3f45", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "277ba881-3bc5-4f0c-87e3-95e9ad56e7db", + "metadata": {}, + "source": [ + "# Exercises with SQL\n", + "\n", + "For each of the following questions, **give the SQL syntax for the requested queries** and **test your queries.** \n", + "\n", + "As a reminder, here is the UML diagram of our stork database. \n", + "\n", + "\"UML" + ] + }, + { + "cell_type": "markdown", + "id": "4aa2d382-244c-456d-a829-ef66f5b31d29", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 1\n", + "**Question** : Give the names of all storks who were born in 2001.\n", + "\n", + "Here is how to test your query: \n", + "1. Edit the query in the cell below and then execute the cell to see the result. \n", + "You can edit and re-execute the cell as many times as necessary. \n", + "2. Once you are satisfied with your query, you can execute the validation cell below to check if your query is correct or not. \n", + "3. Finally, you can compare your query with the solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4ce7da44-0f13-431e-8e4a-0135783dec0c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "72a6b65e-0da5-4ed0-a5bd-54d3f9a9515d", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "839e19f7-bdc8-494d-9fae-b41e3035e4a1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_conditions import *\n", + "question1_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "32614c72-109d-4e7c-9b6c-0d4ea6e9edfd", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "3248b00b-42fc-4043-849d-ce206ef3d94e", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`Select stork.name from stork where Stork.yearofbirth =2001`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "5ded8691-5cc7-4ffd-8e7a-8b9e68ec9b28", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 2\n", + "\n", + "**Question** : Give the names of all storks who were born between 1999 and 2001.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1a628b1f-3039-4e1a-b6b2-f7b1fe8f608b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "ce865e3f-7d86-448e-a4b4-d48ec87cc90d", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b60cc09c-cbe5-4055-bdbf-dff6ef205faa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_conditions import *\n", + "question2_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "9836d9ff-5de5-4c03-ade5-499dd095dc45", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "74758991-7c8a-49be-8c23-d99f3505b03b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`Select stork.name from stork where Stork.yearofbirth between 1999 and 2001`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "baf8ea9d-52c7-4f09-b2da-f7a2350edb84", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 3\n", + "\n", + "**Question** : Give the names of all storks whose names finish with an \"a\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a692c23c-2930-4382-8fcf-e9ddf20598b4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "4ea4d75e-15f8-41c5-82a0-bae894f20cee", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "edd89993-4524-4260-ae45-c03dcef23817", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_conditions import *\n", + "question3_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "8fe88b53-9c96-4d53-b3f0-b364d048255b", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e4b270be-9464-4b53-8dc9-ae0b6d45f30f", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`Select stork.name from stork where Stork.name like '%a'`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "2181501d-9609-45db-add3-11fef5627712", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 4\n", + "\n", + "**Question** : Give the names of all storks whose names contain either a \"p\" or an \"s\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21138ba1-2cec-447a-a76a-88806e636c22", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "0c092ac8-da93-434e-b3d3-9229f6a6f8bd", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f6fa35d-2308-4d9f-91ad-3b849f122479", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_conditions import *\n", + "question4_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "dd4c5fb5-69a0-4395-a9ad-c4fa7e5dc483", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "4b0e5608-1b34-47ca-9e4e-eac6f9fb944b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`Select stork.name from stork where Stork.name like '%p%' or Stork.name like '%s%'`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "43863949-1e0d-4dee-9f08-f560700677a5", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 5\n", + "\n", + "**Question** : Give the names of all storks whose names contain either an \"a\" or an \"i\", but do not contain an \"e\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f288bc46-692b-46b6-a424-19d90e3476bd", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "90ccae8d-2b3b-4e44-b443-6cc556df1179", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4aaf1326-f94a-4527-b334-266e5989238f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_conditions import *\n", + "question5_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "3a38f8f3-c3a7-4c17-9ee2-5bd61df4f3b6", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e3030f35-7b38-4be0-852b-b1dff7635117", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`Select stork.name from stork where (Stork.name like '%a%' or Stork.name like '%i%') and Stork.name not like '%e%'`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "dbe64f35-bfbc-46df-8ee3-439e94c17738", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 6\n", + "\n", + "**Question** : List the storks who have a sponsor along with the name of their sponsor (use only the WHERE clause, do not use JOIN yet!)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f510aa9b-259d-44a8-b7dc-1a3b0513f4d0", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "ae1bd125-f379-4720-9e72-e50e959da3cc", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "524ed0f6-8363-4491-9d12-929eb3bcf9e4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_conditions import *\n", + "question6_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "86da7b5c-9810-483e-962c-0b4cea076e8d", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "00228895-ea26-4dab-949a-409057b9a8e2", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`Select MemberStork.name , Member.name from MemberStork, Member where MemberStork.memberid = Member.id`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "3ec79f57-622d-4d8c-96cc-99d99b198656", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 7\n", + "\n", + "**Question** : List all of the storks and the name of their sponsor (this time, use the JOIN clause). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de57ea19-6d96-483e-b154-589c08e94ebf", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "f5b758ef-eab2-4f5b-bebf-5f6deca42053", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "92ed5890-138a-43c8-9a9e-d13d10de731c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_conditions import *\n", + "question7_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "d6f8edcd-befc-4bb5-b91e-7f98901a2fc5", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "8c21b02a-1318-4d62-9662-ce51158a0edf", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select MemberStork.name, Member.name from MemberStork JOIN Member ON MemberStork.memberid = Member.id`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "af3a4574-75c2-468d-9981-4f55321fc320", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 8\n", + "\n", + "**Question** : Provide the query that returns the name of the stork that does not have a sponsor. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66b863b8-4df6-4434-973d-444223160e93", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "7c119b3e-1c54-4510-b475-9a27ab9277f6", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ab671537-d049-46a0-9285-b1b5e6f2207c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_conditions import *\n", + "question8_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "2cc5e261-d094-4989-a3d9-b134ac7a9e40", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "c42af5c5-d028-48ba-bbf8-86445795da9b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`SELECT Stork.name FROM Stork LEFT JOIN MemberStork ON Stork.name = MemberStork.name where MemberStork.name is null`\n", + " \n", + "
\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/2-WHERE_clause/SQL_conditions.py b/2-WHERE_clause/SQL_conditions.py new file mode 100644 index 0000000..ddc6f9f --- /dev/null +++ b/2-WHERE_clause/SQL_conditions.py @@ -0,0 +1,306 @@ +import sqlite3 +import pandas +import hashlib +import pickle +from pandas._testing import assert_frame_equal + +""" +Utility functions +""" +def run_query(q): + con = sqlite3.connect("stork-mooc.sqlite") + return pandas.read_sql_query(q, con) + +def generate_hash(dataframe): + return hashlib.md5(pickle.dumps(dataframe)).hexdigest() + +def check_hash(dataframe, dataframehash): + return generate_hash(dataframe) == dataframehash + + +""" +Validation function for Question 1 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question1_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '8869f41a629d6097c2160161c908052d' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + +""" +Validation function for Question 2 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question2_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = 'd618a65ae888ae46407b46961ed5cb98' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + +""" +Validation function for Question 3 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question3_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '627c11065f3552ddcdb15d9d8c038106' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + +""" +Validation function for Question 4 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question4_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '72f6637a62c1d94a14f55cc988c85e31' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + +""" +Validation function for Question 5 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question5_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = 'cfd14c89d1c4c443b1525d6a0b5325b7' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + +""" +Validation function for Question 6 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question6_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '6627ab377e443da6509aaf65bd50c645' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + +""" +Validation function for Question 7 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question7_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '6627ab377e443da6509aaf65bd50c645' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + +""" +Validation function for Question 8 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question8_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = 'e7cc677dbf21b7f5728e423a1bf36ce3' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + \ No newline at end of file diff --git a/2-WHERE_clause/stork-mooc.sqlite b/2-WHERE_clause/stork-mooc.sqlite new file mode 100644 index 0000000..74740eb Binary files /dev/null and b/2-WHERE_clause/stork-mooc.sqlite differ diff --git a/2-WHERE_clause/uml_diagram_stork_database.png b/2-WHERE_clause/uml_diagram_stork_database.png new file mode 100644 index 0000000..6c2a969 Binary files /dev/null and b/2-WHERE_clause/uml_diagram_stork_database.png differ diff --git a/3-Aggregation_sorting/SQL_aggregation.ipynb b/3-Aggregation_sorting/SQL_aggregation.ipynb new file mode 100644 index 0000000..0545b80 --- /dev/null +++ b/3-Aggregation_sorting/SQL_aggregation.ipynb @@ -0,0 +1,713 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "fbb55442-3bb2-4349-a1cf-6a6e31223ac9", + "metadata": { + "tags": [] + }, + "source": [ + "# SQL exercise – Sorting and aggregating results from an SQL query \n", + "\n", + "The questions in this exercise are about the syntax of SQL queries.\n", + "\n", + "We will use:\n", + "* the QSpatiaLite stork database that we used earlier;\n", + "* SQL to write queries on this database;\n", + "* the programming language Python to query this database (execute the queries).\n", + "\n", + "
\n", + "
\n", + " How to use this notebook?
\n", + " This notebook consists of text cells and code cells (Python). The code cells must be executed in order to see the output of the program.
To execute a cell, select it and then simply click on the \"play\" button () in the toolbar at the top of the notebook, or type shift + enter.
It is important to run the code cells in the same order as they appear in the notebook.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "2910727c-4ef6-4b7b-9299-fdf13b6cb97b", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "584c8be5-e7f0-4958-b0c9-605e85cab93c", + "metadata": { + "tags": [] + }, + "source": [ + "# Accessing the database with Python\n", + "\n", + "In this first part, we show you with an example how to write and execute your queries on the stork database with Python." + ] + }, + { + "cell_type": "markdown", + "id": "973aa7ac-4e1e-4f22-98ff-dd7145757f0a", + "metadata": { + "tags": [] + }, + "source": [ + "## Importing Python libraries\n", + "To manipulate a database with Python we need the `sqlite3` and `pandas` libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "431bc147-6209-407c-9bf5-bffe7599604d", + "metadata": {}, + "outputs": [], + "source": [ + "# Importing libraries\n", + "import sqlite3\n", + "import pandas" + ] + }, + { + "cell_type": "markdown", + "id": "4d65a6f1-b050-4a3b-a3b1-9c7bf17c7f6f", + "metadata": {}, + "source": [ + "## Connection to the database \n", + "First, a connection to the database is established:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "75265ea6-4977-41bd-9498-043b9421b853", + "metadata": {}, + "outputs": [], + "source": [ + "# Establish the connection to the database:\n", + "con = sqlite3.connect(\"stork-mooc.sqlite\")" + ] + }, + { + "cell_type": "markdown", + "id": "2ae60c22-65df-413f-956f-b8009094576a", + "metadata": {}, + "source": [ + "## Executing SQL queries\n", + "\n", + "To execute a SQL query on our database with Python, we use the `read_sql_query` function from the `pandas` library which :\n", + "* executes the query\n", + "* returns the result table\n", + "To display this table, you just have to do a `display`.\n", + "\n", + "Here is an example, which executes the query `select * from migration` and displays the result:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d1b8e64-2b57-41fa-812d-bc8f046935f3", + "metadata": {}, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\"\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "dc7b9759-d7fc-46e9-87b2-3821155e37b6", + "metadata": {}, + "source": [ + "
\n", + "\n", + "**Question** : modify the above query to retrieve the contents of the `stork` table." + ] + }, + { + "cell_type": "markdown", + "id": "90bb2c4b-04a8-4756-90c3-def44f6b2c98", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "b5c5dfa1-02f1-4708-8db7-c9ec9610e9d4", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select * from stork`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "0c0ae1db-9414-4647-91bf-258224cac621", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "96c8744f-8111-4aaa-92d2-221208bdf724", + "metadata": {}, + "source": [ + "# Exercises with SQL\n", + "\n", + "For each of the following questions, **give the SQL syntax for the requested queries** and **test your queries.** \n", + "\n", + "As a reminder, here is the UML diagram of our stork database. \n", + "\n", + "\"UML" + ] + }, + { + "cell_type": "markdown", + "id": "cdc19960-8b8d-438e-93a2-af63369532ab", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 1\n", + "**Question** : Select the mean age of the storks in 2010; name the column “age_moyen”.\n", + "\n", + "Here is how to test your query: \n", + "1. Edit the query in the cell below and then execute the cell to see the result. \n", + "You can edit and re-execute the cell as many times as necessary. \n", + "2. Once you are satisfied with your query, you can execute the validation cell below to check if your query is correct or not. \n", + "3. Finally, you can compare your query with the solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "205d8ac7-cea8-4f67-959b-a6c9948d0571", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "3fccc96a-e289-4881-86c8-3d897e89fe0c", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "839e19f7-bdc8-494d-9fae-b41e3035e4a1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_aggregation import *\n", + "question1_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "37e139f8-003b-4033-85d0-6432fcce10d5", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "3248b00b-42fc-4043-849d-ce206ef3d94e", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`SELECT avg(2010-Stork.yearofbirth) AS age_moyen FROM Stork`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "5ded8691-5cc7-4ffd-8e7a-8b9e68ec9b28", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 2\n", + "\n", + "**Question** : For all years where at least one stork is born, list the number born; name the column “num_stork”." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5f6ce632-102f-481e-a2af-25a6aecdc889", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "edaa1f70-4d8a-4cae-931a-ada4760d354c", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b60cc09c-cbe5-4055-bdbf-dff6ef205faa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_aggregation import *\n", + "question2_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "413ba9bd-1a59-455d-a430-ce1e77962602", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "74758991-7c8a-49be-8c23-d99f3505b03b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`SELECT Stork.yearofbirth, count(Stork.yearofbirth) As num_stork FROM Stork GROUP BY Stork.yearofbirth`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "baf8ea9d-52c7-4f09-b2da-f7a2350edb84", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 3\n", + "\n", + "**Question** : Give the name and ID of all sponsors as well as how many storks they sponsor (use the JOIN clause)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6b091e2e-1af8-4cc0-bb1c-9ffa05708438", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "33b11759-fdd6-4f55-b03f-b3a2004ff564", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "edd89993-4524-4260-ae45-c03dcef23817", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_aggregation import *\n", + "question3_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "47f1675f-d9ab-4792-aca1-c3b61fd16f4f", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e4b270be-9464-4b53-8dc9-ae0b6d45f30f", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select member.id, member.name, count(*) from Member left join MemberStork on Member.id=MemberStork.memberid group by Member.id, Member.name`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "2181501d-9609-45db-add3-11fef5627712", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 4\n", + "\n", + "**Question** : Give the name and ID of all sponsors who sponsor at least two storks (use the JOIN and HAVING clauses)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1bbb2a1b-1d4e-4d1c-8ead-26f8499bc6b3", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "5bb0fffd-adab-4f4c-8baa-815f6e5f1db9", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f6fa35d-2308-4d9f-91ad-3b849f122479", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_aggregation import *\n", + "question4_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "dc8a2378-8a37-4e92-888a-73d433913b45", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "4b0e5608-1b34-47ca-9e4e-eac6f9fb944b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`SELECT Member.id, Member.name, count(MemberStork.name) FROM Member Left join MemberStork on Member.id = MemberStork.memberid Group by Member.id, Member.name Having count(MemberStork.name) >= 2`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "43863949-1e0d-4dee-9f08-f560700677a5", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 5\n", + "\n", + "**Question** : Select the names of all storks and order them from oldest to youngest (but do not return the year of birth column)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1fe8368-5515-4846-bb3f-7dec58827cdb", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "f5a2065f-e7f4-4577-aa04-8729604e3572", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4aaf1326-f94a-4527-b334-266e5989238f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_aggregation import *\n", + "question5_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "00956dcf-2a10-41c1-9aad-d08166d836ae", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e3030f35-7b38-4be0-852b-b1dff7635117", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`SELECT Stork.name FROM Stork ORDER by Stork.yearofbirth asc`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "4c55e7b3-2ab9-46c4-9f88-c9366b3dcd13", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 6\n", + "\n", + "**Question** : Return the names of the three oldest storks (use the LIMIT clause)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "689ba410-4c24-4e0c-ba5d-bf51b362a865", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "85879166-2e4c-4a6e-be7a-e819f6bae96a", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3551772d-e74e-4d39-ba20-c1712dac29e3", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_aggregation import *\n", + "question6_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "daeceac3-3124-492a-b984-682ce74224d1", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "720c7da3-26e5-415d-932e-9e0f808c6f51", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`SELECT Stork.name FROM Stork ORDER by Stork.yearofbirth asc LIMIT 3`\n", + " \n", + "
\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/3-Aggregation_sorting/SQL_aggregation.py b/3-Aggregation_sorting/SQL_aggregation.py new file mode 100644 index 0000000..ce532e2 --- /dev/null +++ b/3-Aggregation_sorting/SQL_aggregation.py @@ -0,0 +1,234 @@ +import sqlite3 +import pandas +import hashlib +import pickle +from pandas._testing import assert_frame_equal + +""" +Utility functions +""" +def run_query(q): + con = sqlite3.connect("stork-mooc.sqlite") + return pandas.read_sql_query(q, con) + +def generate_hash(dataframe): + return hashlib.md5(pickle.dumps(dataframe)).hexdigest() + +def check_hash(dataframe, dataframehash): + return generate_hash(dataframe) == dataframehash + + +""" +Validation function for Question 1 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question1_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '7ad8b2587b715b5a77ee027b1750a48b' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 2 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question2_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '8dc1ad6015d07cef58ab22328a96e3f8' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 3 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question3_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '82f3e4184eadcb8464904c72c6f6ce4f' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 4 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question4_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '1240afe0bff475eb73537854acaf0093' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + """ +Validation function for Question 5 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question5_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '1bd88d8d51206e4969048c9a8ce7dbae' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 5 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question6_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '7a63ba00c472fdc6142848ce885dac13' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + \ No newline at end of file diff --git a/3-Aggregation_sorting/stork-mooc.sqlite b/3-Aggregation_sorting/stork-mooc.sqlite new file mode 100644 index 0000000..74740eb Binary files /dev/null and b/3-Aggregation_sorting/stork-mooc.sqlite differ diff --git a/3-Aggregation_sorting/uml_diagram_stork_database.png b/3-Aggregation_sorting/uml_diagram_stork_database.png new file mode 100644 index 0000000..6c2a969 Binary files /dev/null and b/3-Aggregation_sorting/uml_diagram_stork_database.png differ diff --git a/4-Compound_nested_queries/SQL_fusion.ipynb b/4-Compound_nested_queries/SQL_fusion.ipynb new file mode 100644 index 0000000..ae31ed5 --- /dev/null +++ b/4-Compound_nested_queries/SQL_fusion.ipynb @@ -0,0 +1,626 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "29331406-bdbf-4ac8-b1a1-e71de069fb05", + "metadata": { + "tags": [] + }, + "source": [ + "# SQL exercise – Compound and nested SQL queries\n", + "\n", + "The questions in this exercise are about the syntax of SQL queries.\n", + "\n", + "We will use:\n", + "* the QSpatiaLite stork database that we used earlier;\n", + "* SQL to write queries on this database;\n", + "* the programming language Python to query this database (execute the queries).\n", + "\n", + "
\n", + "
\n", + " How to use this notebook?
\n", + " This notebook consists of text cells and code cells (Python). The code cells must be executed in order to see the output of the program.
To execute a cell, select it and then simply click on the \"play\" button () in the toolbar at the top of the notebook, or type shift + enter.
It is important to run the code cells in the same order as they appear in the notebook.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "cce1c593-9fe7-4b64-9ce0-b3e589c29488", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "ad6d3b11-2eb8-41c7-a8f9-32aad5d552a5", + "metadata": { + "tags": [] + }, + "source": [ + "# Accessing the database with Python\n", + "\n", + "In this first part, we show you with an example how to write and execute your queries on the stork database with Python." + ] + }, + { + "cell_type": "markdown", + "id": "f40617a7-951c-4ff6-b3d4-f19e710e8dee", + "metadata": { + "tags": [] + }, + "source": [ + "## Importing Python libraries\n", + "To manipulate a database with Python we need the `sqlite3` and `pandas` libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1728d534-3f64-4cb2-860e-7fd6247041e8", + "metadata": {}, + "outputs": [], + "source": [ + "# Importing libraries\n", + "import sqlite3\n", + "import pandas" + ] + }, + { + "cell_type": "markdown", + "id": "39ffe37e-403e-4a19-b228-669d245a9c0f", + "metadata": {}, + "source": [ + "## Connection to the database \n", + "First, a connection to the database is established:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fdfc5079-cc8d-4f4c-b3a1-6c8d1ac85ad6", + "metadata": {}, + "outputs": [], + "source": [ + "# Establish the connection to the database:\n", + "con = sqlite3.connect(\"stork-mooc.sqlite\")" + ] + }, + { + "cell_type": "markdown", + "id": "d2ed37e5-7ee7-4799-bcdb-2234589257e1", + "metadata": {}, + "source": [ + "## Executing SQL queries\n", + "\n", + "To execute a SQL query on our database with Python, we use the `read_sql_query` function from the `pandas` library which :\n", + "* executes the query\n", + "* returns the result table\n", + "To display this table, you just have to do a `display`.\n", + "\n", + "Here is an example, which executes the query `select * from migration` and displays the result:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e256f9b1-2eee-47a5-b506-0c4ef035520c", + "metadata": {}, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\"\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "dc751f1e-ef59-40dc-9a6f-eec4665c9c5e", + "metadata": {}, + "source": [ + "
\n", + "\n", + "**Question** : modify the above query to retrieve the contents of the `stork` table." + ] + }, + { + "cell_type": "markdown", + "id": "a48f4694-1c58-43cb-9245-99c0af063f29", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "b663bfc9-e3cb-4195-95c0-b34a4a8b1079", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select * from stork`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "c0ebf2e7-09e9-4290-9b6c-83a6f8c1502b", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "cbd82ad7-2d85-4287-a5b1-413e2bf1acde", + "metadata": { + "tags": [] + }, + "source": [ + "# Exercises with SQL\n", + "\n", + "For each of the following questions, **give the SQL syntax for the requested queries** and **test your queries.** \n", + "\n", + "As a reminder, here is the UML diagram of our stork database. \n", + "\n", + "\"UML" + ] + }, + { + "cell_type": "markdown", + "id": "3811c85a-dc52-4259-b538-16d198ad120b", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 1\n", + "**Question** : Return the names of all storks who migrated in both 2001 and 2003 (use the UNION, EXCEPT or INTERSECT operators).\n", + "\n", + "Here is how to test your query: \n", + "1. Edit the query in the cell below and then execute the cell to see the result. \n", + "You can edit and re-execute the cell as many times as necessary. \n", + "2. Once you are satisfied with your query, you can execute the validation cell below to check if your query is correct or not. \n", + "3. Finally, you can compare your query with the solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da5ceda4-2489-4d20-b128-55343cb6dcf3", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "caffe1a6-0617-4c3d-838c-e48a4342642d", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "839e19f7-bdc8-494d-9fae-b41e3035e4a1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_fusion import *\n", + "question1_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "b4efcf80-e101-47b5-9584-f2c8af299a8e", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "3248b00b-42fc-4043-849d-ce206ef3d94e", + "metadata": { + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select name from migration where departureyear=2001 intersect select name from migration where departureyear=2003`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "5ded8691-5cc7-4ffd-8e7a-8b9e68ec9b28", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 2\n", + "\n", + "**Question** : Return the names of all storks who migrated in 2001, but not in 2003 (use the UNION, EXCEPT or INSTERSECT operators).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8e7fa944-0dd6-410c-8b98-3ce7c2296028", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "a51feebe-bbfb-49ec-8217-61f08395d75f", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b60cc09c-cbe5-4055-bdbf-dff6ef205faa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_fusion import *\n", + "question2_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "b005725d-094e-4c88-a066-c6bb561240b0", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "74758991-7c8a-49be-8c23-d99f3505b03b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select name from migration where departureyear=2001 except select name from migration where departureyear=2003`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "baf8ea9d-52c7-4f09-b2da-f7a2350edb84", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 3\n", + "\n", + "**Question** : Return the names of all storks who migrated in both 2001 and 2003 (use the IN clause to create a nested query). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "effa9adc-db64-40e0-9fb7-d77c068f5b3a", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "f5fbdd80-6dc0-4748-8b80-5aa97367948e", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "edd89993-4524-4260-ae45-c03dcef23817", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_fusion import *\n", + "question3_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "9780abdf-f469-4682-b97d-dfae186fc85c", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e4b270be-9464-4b53-8dc9-ae0b6d45f30f", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select name from migration where departureyear=2001 and name in (select name from migration where departureyear=2003)`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "2181501d-9609-45db-add3-11fef5627712", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 4\n", + "\n", + "**Question** : Select the names of all storks who migrated prior to the birth of the youngest stork (use the Migration table in the primary query and the Stork table in a subquery in the WHERE clause)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3356bb0c-b1a7-43cd-9b3d-b67864694c2d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "cc4f7793-0cef-42c5-9b23-154ee16fa467", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f6fa35d-2308-4d9f-91ad-3b849f122479", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_fusion import *\n", + "question4_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "09eb69f0-908e-4e40-bcad-9853fca49826", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "4b0e5608-1b34-47ca-9e4e-eac6f9fb944b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select distinct name from Migration where departureyear<(select max(yearofbirth) from Stork)`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "43863949-1e0d-4dee-9f08-f560700677a5", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 5\n", + "\n", + "**Question** :Select the names of all storks who migrated prior to the birth of the youngest stork (use the Migration table in the primary query and the Stork table in a subquery in the FROM clause)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d6bb47c7-72ca-4264-a797-aab5e2f0b053", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "b5307431-89a3-4010-96ea-6dab15923bf1", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4aaf1326-f94a-4527-b334-266e5989238f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_fusion import *\n", + "question5_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "6f147f9e-647b-40f7-842a-b54ac6866ea7", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e3030f35-7b38-4be0-852b-b1dff7635117", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select distinct name from Migration, (select max(yearofbirth) as max from Stork) where departureyear\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/4-Compound_nested_queries/SQL_fusion.py b/4-Compound_nested_queries/SQL_fusion.py new file mode 100644 index 0000000..710970a --- /dev/null +++ b/4-Compound_nested_queries/SQL_fusion.py @@ -0,0 +1,199 @@ +import sqlite3 +import pandas +import hashlib +import pickle +from pandas._testing import assert_frame_equal + +""" +Utility functions +""" +def run_query(q): + con = sqlite3.connect("stork-mooc.sqlite") + return pandas.read_sql_query(q, con) + +def generate_hash(dataframe): + return hashlib.md5(pickle.dumps(dataframe)).hexdigest() + +def check_hash(dataframe, dataframehash): + return generate_hash(dataframe) == dataframehash + + +""" +Validation function for Question 1 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question1_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = 'b27b60409c967bcb8a93dbbc75ba33cd' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 2 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question2_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '41af0d89e1ead68b5ed15fa9013a3493' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 3 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question3_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = 'b27b60409c967bcb8a93dbbc75ba33cd' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 4 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question4_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '54caca6dd78ac6240be397455424bbda' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + """ +Validation function for Question 5 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question5_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '54caca6dd78ac6240be397455424bbda' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + \ No newline at end of file diff --git a/4-Compound_nested_queries/stork-mooc.sqlite b/4-Compound_nested_queries/stork-mooc.sqlite new file mode 100644 index 0000000..74740eb Binary files /dev/null and b/4-Compound_nested_queries/stork-mooc.sqlite differ diff --git a/4-Compound_nested_queries/uml_diagram_stork_database.png b/4-Compound_nested_queries/uml_diagram_stork_database.png new file mode 100644 index 0000000..6c2a969 Binary files /dev/null and b/4-Compound_nested_queries/uml_diagram_stork_database.png differ diff --git a/5-DDL_DML_views/SQL_DML.ipynb b/5-DDL_DML_views/SQL_DML.ipynb new file mode 100644 index 0000000..a085655 --- /dev/null +++ b/5-DDL_DML_views/SQL_DML.ipynb @@ -0,0 +1,625 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3f467ad7-4bdd-4ce9-86f6-738f90481072", + "metadata": {}, + "source": [ + "# Exercice SQL – Data definition and manipulation\n", + "\n", + "The questions in this exercise are about the syntax of SQL queries.\n", + "\n", + "We will use:\n", + "* the QSpatiaLite stork database that we used earlier;\n", + "* SQL to write queries on this database;\n", + "* the programming language Python to query this database (execute the queries).\n", + "\n", + "
\n", + "
\n", + " How to use this notebook?
\n", + " This notebook consists of text cells and code cells (Python). The code cells must be executed in order to see the output of the program.
To execute a cell, select it and then simply click on the \"play\" button () in the toolbar at the top of the notebook, or type shift + enter.
It is important to run the code cells in the same order as they appear in the notebook.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "b4e21f2d-b7bb-4e6e-bd05-a158f2524056", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "fa2e6b4a-e0dc-48b7-a032-4c8da2184fa4", + "metadata": { + "tags": [] + }, + "source": [ + "# Accessing the database with Python\n", + "\n", + "In this first part, we show you with an example how to write and execute your queries on the stork database with Python." + ] + }, + { + "cell_type": "markdown", + "id": "6ff3944a-2984-4854-91f7-1b566d3553bb", + "metadata": { + "tags": [] + }, + "source": [ + "## Importing Python libraries\n", + "To manipulate a database with Python we need the `sqlite3` and `pandas` libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "083f4802-f93c-472d-b68e-887cc7798a78", + "metadata": {}, + "outputs": [], + "source": [ + "# Importing libraries\n", + "import sqlite3\n", + "import pandas" + ] + }, + { + "cell_type": "markdown", + "id": "9a95b8ef-a5e5-42bc-a1ee-991c444ea2aa", + "metadata": {}, + "source": [ + "## Connection to the database \n", + "First, a connection to the database is established:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "affdc6db-df17-4a0a-989b-bc470aa2e8f7", + "metadata": {}, + "outputs": [], + "source": [ + "# Establish the connection to the database:\n", + "con = sqlite3.connect(\"stork-mooc.sqlite\")" + ] + }, + { + "cell_type": "markdown", + "id": "0c36f96f-9c67-441d-a8c7-d24cf10c9945", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "8116c37c-3db2-4d92-bf3a-faf066969321", + "metadata": { + "tags": [] + }, + "source": [ + "# Exercises with SQL\n", + "\n", + "For each of the following questions, **give the SQL syntax for the requested queries** and **test your queries.** \n", + "\n", + "As a reminder, here is the UML diagram of our stork database. \n", + "\n", + "\"UML" + ] + }, + { + "cell_type": "markdown", + "id": "9a49ab22-6ef6-4c0a-98e1-a089f7d1bb63", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "# Special feature of this exercise\n", + "\n", + "This exercise involves queries allowing to **modify the stork database**. Thus, the following questions will allow you to create a new table named \"observations\", to modify it and to delete it. \n", + "\n", + "Since your queries will allow you to modify the original database, a copy of the database named **\"stork-mooc-BACKUP.sqlite \"** is available in the folder of this exercise if a problem with the database occurs during the exercise. In this case, delete the current version of your database \"stork-mooc.sqlite\", create a copy of the file \"stork-mooc-BACKUP.sqlite\" and rename this copy \"stork-mooc.sqlite\". **Always keep an unmodified copy of \"stork-mooc-BACKUP.sqlite\"!" + ] + }, + { + "cell_type": "markdown", + "id": "03a31944-09b6-429c-86ea-0cb1307680bf", + "metadata": { + "tags": [] + }, + "source": [ + "## Let's look at the current state of the \"observations\" table\n", + "In principle, at the beginning of this exercise the \"observations\" table does not yet exist and the following command should return a message indicating it indeed does not exist.\n", + "\n", + "**If it does not the case, you need to start from scratch by deleting the existing table.** To do this, run the very last cell containing code at the bottom of this file (this executes a DROP TABLE query which allows you to delete a table) and then re-execute the cell below ('display_observations_table()') to ensure that the 'observations' table no longer exists. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9912fe28-bec7-447c-a37c-2ce2af9a122d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "display_observations_table()" + ] + }, + { + "cell_type": "markdown", + "id": "f46cdd5c-c09e-4adf-a88b-c4cf02ace948", + "metadata": { + "tags": [] + }, + "source": [ + "## Let's get started!\n", + "You can now start answering the following questions. The first question will allow you to create the 'observations' table. The other questions will allow you to modify it. Finally, you will be able to delete it. **If necessary, feel free to run the last cell of code to delete the 'observations' table and then start again from question 1.**" + ] + }, + { + "cell_type": "markdown", + "id": "cdef6666-e9c7-4614-8d89-95ea6629d98d", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 1\n", + "**Question** : Enter the query that will create a new \"observations\" table to store the observations on storks. In the table include the date (\"day\" should be \"date\" format) and the name (\"storkname\" should be \"text\" format)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08fc5f61-202a-4e10-a0f6-c06333274b9b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query\n", + "conn.cursor().execute(query)\n", + "conn.commit()\n", + "\n", + "# Display the result table\n", + "display_observations_table()" + ] + }, + { + "cell_type": "markdown", + "id": "b0d93855-76b2-403f-b945-b9a9a67f8451", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "839e19f7-bdc8-494d-9fae-b41e3035e4a1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_DML import *\n", + "question1_validation()" + ] + }, + { + "cell_type": "markdown", + "id": "1c40c206-4e86-4abb-9bd4-38b8bbb1584c", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "3248b00b-42fc-4043-849d-ce206ef3d94e", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`CREATE TABLE observations ('date' date, 'storkname' text)`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "ce69bcad-8b2c-43ed-bac2-f2d1feaa636a", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 2\n", + "**Question** : The new \"observations\" table is missing a field for the observer. Write the query to add a new \"member\" attribute to the table. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "96467df3-bb81-44d1-9d8c-c4bab83f0854", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query\n", + "conn.cursor().execute(query)\n", + "conn.commit()\n", + "\n", + "# Display the result table\n", + "display_observations_table()" + ] + }, + { + "cell_type": "markdown", + "id": "6af457ad-0a71-4147-a094-7521e4e7e98a", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fcf954a0-e697-4b20-bb98-764b66cbf127", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_DML import *\n", + "question2_validation()" + ] + }, + { + "cell_type": "markdown", + "id": "40c1cb56-0899-41da-939b-ccc40c2ef149", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "c0d3979d-5605-41bf-8d2c-8f04bce172c5", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`ALTER TABLE observations ADD member text`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "08372c86-215b-4aa6-81f1-74b86bf391fd", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 3\n", + "**Question** : Enter the query to enter the following tuples into the new \"observations\" table: 2016-08-12 / Annelise / Arthur, 2016-08-13 / Felix / Arthur, 2016-08-15 / Max / Jeanne" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "851477b4-bb70-4eaf-a8ad-719fa316f5e1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query\n", + "conn.cursor().execute(query)\n", + "conn.commit()\n", + "\n", + "# Display the result table\n", + "display_observations_table()" + ] + }, + { + "cell_type": "markdown", + "id": "42c6eec1-eca5-4c84-a72e-46ff301b6243", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "98845eb1-4056-4ab9-ba52-470f37044990", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_DML import *\n", + "question3_validation()" + ] + }, + { + "cell_type": "markdown", + "id": "105a1701-50ab-4963-9a5b-a6386dbe1b92", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e659c9aa-60d4-4bdf-b217-2ee096880499", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`insert into 'observations' values('2016-08-12' ,'Annelise','Arthur'),( '2016-08-13','Felix','Arthur'),('2016-08-15' ,'Max','Jeanne')`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "b7cd601b-409d-45a6-93ca-3cb44b69ce76", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 4\n", + "**Question** : Enter the query to correct the second observation by changing Felix to Johann. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1b49f899-2985-42ec-a6ed-4a89878c8fc1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query\n", + "conn.cursor().execute(query)\n", + "conn.commit()\n", + "\n", + "# Display the result table\n", + "display_observations_table()" + ] + }, + { + "cell_type": "markdown", + "id": "2a18c2b5-cce3-4542-a285-e3081a77e21b", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6cc04231-b3f8-454e-a57d-5c765ca7a547", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_DML import *\n", + "question4_validation()" + ] + }, + { + "cell_type": "markdown", + "id": "f022181f-a1f6-4ef8-8990-89bc7edcd52f", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "a3450ff9-11d3-4a93-81a2-f7ce9b538897", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`update 'observations' set storkname ='Johann' where storkname ='Felix'`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "1623f395-b41c-4a20-8560-5fcb78c188aa", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 5\n", + "**Question** : Enter the query to delete the observation made by Jeanne from the \"observations\" table. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c81307a5-2eea-4e68-914f-d33ab6f3ae53", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query\n", + "conn.cursor().execute(query)\n", + "conn.commit()\n", + "\n", + "# Display the result table\n", + "display_observations_table()" + ] + }, + { + "cell_type": "markdown", + "id": "bfe3aa7d-f619-45e4-9de8-bde91446ae34", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6698085a-0690-46ed-a30b-0b106b3f1f5a", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_DML import *\n", + "question5_validation()" + ] + }, + { + "cell_type": "markdown", + "id": "0f8c0e39-3791-487a-9723-fa932cc483ad", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "3eea65e0-2bac-42de-856a-14180deec02f", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`delete from 'observations' where member='Jeanne'`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "9ac7dba0-2ed1-4a43-a7fd-47130b0e8a0c", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "## Deleting the table\n", + "Execute the following code cell to delete the 'observations' table. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f977221-06fa-4098-a627-931b16d0b5cd", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "query = \"DROP TABLE 'observations'\"\n", + "\n", + "# Execute the query\n", + "conn.cursor().execute(query)\n", + "conn.commit()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/5-DDL_DML_views/SQL_DML.py b/5-DDL_DML_views/SQL_DML.py new file mode 100644 index 0000000..de378f5 --- /dev/null +++ b/5-DDL_DML_views/SQL_DML.py @@ -0,0 +1,141 @@ +import sqlite3 +import pandas +import hashlib +import pickle +from sqlalchemy import create_engine +from pandas.io.sql import DatabaseError + + +""" +Utility functions +""" +def generate_hash(dataframe): + return hashlib.md5(pickle.dumps(dataframe)).hexdigest() + +def run_query(q): + conn = sqlite3.connect("stork-mooc.sqlite") + return pandas.read_sql_query(q, conn) + +def display_observations_table(): + query = "select * from observations" + table = pandas.DataFrame() + try: + table = run_query(query) + except DatabaseError as e: + print(f"Looks like the 'observations' table does not exist...\n") + return table + +""" +HOW TO GENERATE THE CORRECT HASH (to be done in a separate notebook) +correctresult = pandas.DataFrame(data=[["2021-10-14", "stork1"],["2021-11-19", "stork2"]], columns=['date', 'storkname']) +display(correctresult) +correctresponse_hash = generate_hash(correctresult) +correctresponse_hash +""" + +""" +Validation function for Question 1 +""" +def question1_validation(): + + correctresponse_hash = 'ceb5fcf0c091dcbf3d0d29a8b1dac802' + + selectresult = display_observations_table() + "display(selectresult)" + studentresult_hash = generate_hash(selectresult) + "studentresult_hash" + + try: + assert studentresult_hash == correctresponse_hash, "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + +""" +Validation function for Question 2 +""" +def question2_validation(): + + correctresponse_hash = '035dfc2eebd673a8a5a2e492e6d7e7eb' + + selectresult = display_observations_table() + "display(selectresult)" + studentresult_hash = generate_hash(selectresult) + "studentresult_hash" + + try: + assert studentresult_hash == correctresponse_hash, "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + +""" +Validation function for Question 3 +""" +def question3_validation(): + + correctresponse_hash = '62dcf7b4d9643d755fdd9c68a8ddea9d' + + selectresult = display_observations_table() + "display(selectresult)" + studentresult_hash = generate_hash(selectresult) + "studentresult_hash" + + try: + assert studentresult_hash == correctresponse_hash, "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + +""" +Validation function for Question 4 +""" +def question4_validation(): + + correctresponse_hash = 'e5c660b69dfb8c640fbd9ce541f0df8c' + + selectresult = display_observations_table() + "display(selectresult)" + studentresult_hash = generate_hash(selectresult) + "studentresult_hash" + + try: + assert studentresult_hash == correctresponse_hash, "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + +""" +Validation function for Question 5 +""" +def question5_validation(): + + correctresponse_hash = 'b75771bbba787b972d33b95146c23a60' + + selectresult = display_observations_table() + "display(selectresult)" + studentresult_hash = generate_hash(selectresult) + "studentresult_hash" + + try: + assert studentresult_hash == correctresponse_hash, "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + \ No newline at end of file diff --git a/5-DDL_DML_views/stork-mooc-BACKUP.sqlite b/5-DDL_DML_views/stork-mooc-BACKUP.sqlite new file mode 100644 index 0000000..9a9bbbf Binary files /dev/null and b/5-DDL_DML_views/stork-mooc-BACKUP.sqlite differ diff --git a/5-DDL_DML_views/stork-mooc.sqlite b/5-DDL_DML_views/stork-mooc.sqlite new file mode 100644 index 0000000..29872f2 Binary files /dev/null and b/5-DDL_DML_views/stork-mooc.sqlite differ diff --git a/5-DDL_DML_views/uml_diagram_stork_database.png b/5-DDL_DML_views/uml_diagram_stork_database.png new file mode 100644 index 0000000..6c2a969 Binary files /dev/null and b/5-DDL_DML_views/uml_diagram_stork_database.png differ diff --git a/6-Geometrical_spatial_queries/SQL_geometrical_queries.ipynb b/6-Geometrical_spatial_queries/SQL_geometrical_queries.ipynb new file mode 100644 index 0000000..4d2df27 --- /dev/null +++ b/6-Geometrical_spatial_queries/SQL_geometrical_queries.ipynb @@ -0,0 +1,787 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3f467ad7-4bdd-4ce9-86f6-738f90481072", + "metadata": { + "tags": [] + }, + "source": [ + "# Exercice SQL – Geometrical spatial queries\n", + "\n", + "The questions in this exercise are about the syntax of SQL queries.\n", + "\n", + "We will use:\n", + "* a QSpatiaLite database with agricultural data for a region in India, including the layers \"Cultivated_Fields\", \"Roads\", \"Streams\", \"Village\" and \"Wells\";\n", + "* SQL to write queries on this database;\n", + "* the programming language Python to query this database (execute the queries).\n", + "\n", + "
\n", + "
\n", + " How to use this notebook?
\n", + " This notebook consists of text cells and code cells (Python). The code cells must be executed in order to see the output of the program.
To execute a cell, select it and then simply click on the \"play\" button () in the toolbar at the top of the notebook, or type shift + enter.
It is important to run the code cells in the same order as they appear in the notebook.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "a08280d1-df23-4e8a-ae6f-64b55897c079", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "70ad2bc7-4005-4396-a79b-e0b68a22c255", + "metadata": { + "tags": [] + }, + "source": [ + "# Accessing the database with Python\n", + "\n", + "In this first part, we show you with an example how to write and execute your queries on the stork database with Python." + ] + }, + { + "cell_type": "markdown", + "id": "796db845-18a1-4d53-960b-62cf7075c172", + "metadata": { + "tags": [] + }, + "source": [ + "## Importing Python libraries\n", + "To manipulate a database with Python we need the `sqlite3` and `pandas` libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b4afb6f6-cfdd-41d8-b0a8-7c38e17b9a1a", + "metadata": {}, + "outputs": [], + "source": [ + "# Importing libraries\n", + "import sqlite3\n", + "import pandas" + ] + }, + { + "cell_type": "markdown", + "id": "c3936e46-a816-435a-b46b-4064e77d1a81", + "metadata": {}, + "source": [ + "## Connection to the database \n", + "First, a connection to the database is established:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2ae2db69-8429-4dea-949f-8457ccf0b6e1", + "metadata": {}, + "outputs": [], + "source": [ + "# Establish the connection to the database\n", + "con = sqlite3.connect(\"india.sqlite\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3b7fba08-8930-4bd0-8ebd-767560d268ae", + "metadata": {}, + "outputs": [], + "source": [ + "# Load spatialite extension\n", + "con.enable_load_extension(True)\n", + "con.load_extension(\"mod_spatialite\")" + ] + }, + { + "cell_type": "markdown", + "id": "df01de98-e114-4b75-8ae6-5ecf76a7c688", + "metadata": {}, + "source": [ + "## Executing SQL queries\n", + "\n", + "To execute a SQL query on our database with Python, we use the `read_sql_query` function from the `pandas` library which :\n", + "* executes the query\n", + "* returns the result table\n", + "To display this table, you just have to do a `display`.\n", + "\n", + "Here is an example, which executes the query `select * from wells` and displays the result:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b307566-457f-4237-8985-9ebae37a1f9c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\"\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "5a44ce0f-6e14-43a3-876a-495e538b2978", + "metadata": {}, + "source": [ + "
\n", + "\n", + "**Question** : modify the above query to retrieve the contents of the `streams` table." + ] + }, + { + "cell_type": "markdown", + "id": "bb688437-6065-4cf6-91ea-18b26c8e684b", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "704b31ba-5571-4520-a087-300d905ae1b6", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select * from streams`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "cdef6666-e9c7-4614-8d89-95ea6629d98d", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 1\n", + "**Question** : Enter the query that returns the village's geometry in a column named \"geometry_wkt\"\n", + "\n", + "Here is how to test your query: \n", + "1. Edit the query in the cell below and then execute the cell to see the result. \n", + "You can edit and re-execute the cell as many times as necessary. \n", + "2. Once you are satisfied with your query, you can execute the validation cell below to check if your query is correct or not. \n", + "3. Finally, you can compare your query with the solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6a88264-70ad-4c4d-aea5-0d8661f52a21", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "3f34149a-c207-4813-bfa5-a5127ab5c9f9", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "839e19f7-bdc8-494d-9fae-b41e3035e4a1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_geometrical_queries import *\n", + "question1_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "49f371da-2975-4c5a-b8c3-d62256303315", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "3248b00b-42fc-4043-849d-ce206ef3d94e", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select AsText(geometry) as geometry_wkt from Village`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "5ded8691-5cc7-4ffd-8e7a-8b9e68ec9b28", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 2\n", + "\n", + "**Question** : Enter the query that returns the village's geometry type (Point, Polyline, Polygon, etc.) in a column named geometry_type.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "93d90836-43e4-4a5d-ae9e-b0a32df31d14", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "939cb098-8bd4-4779-89ff-fa89c9b956c8", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b60cc09c-cbe5-4055-bdbf-dff6ef205faa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_geometrical_queries import *\n", + "question2_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "2e5faadb-458b-41d3-a926-e63b39f81748", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "74758991-7c8a-49be-8c23-d99f3505b03b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select GeometryType(geometry) as geometry_type from Village`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "baf8ea9d-52c7-4f09-b2da-f7a2350edb84", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 3\n", + "\n", + "**Question** : Enter the query that returns the list of agricultural fields larger than 4 hectares" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf93c34f-30c6-492d-b464-3773c6a188fe", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "4db09dc0-9958-4c78-bddc-d8b30762bc3e", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "edd89993-4524-4260-ae45-c03dcef23817", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_geometrical_queries import *\n", + "question3_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "e10b54c4-4a7b-44d8-b307-ef4aaa14750f", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e4b270be-9464-4b53-8dc9-ae0b6d45f30f", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select * from Cultivated_Fields where Area(geometry) > 40000`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "2181501d-9609-45db-add3-11fef5627712", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 4\n", + "\n", + "**Question** : Enter the query that returns the viilage's centroid (in a column named geom_centr) and display this point in QGIS" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2b7497b2-d1ba-4040-b939-2d118330025f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "49c3f7a9-e080-4944-bf63-a589fd8cddbc", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f6fa35d-2308-4d9f-91ad-3b849f122479", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_geometrical_queries import *\n", + "question4_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "fcb72363-5b18-4cd1-a2d1-4d659e85511c", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "4b0e5608-1b34-47ca-9e4e-eac6f9fb944b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select Centroid(geometry) as geom_centr from Village`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "43863949-1e0d-4dee-9f08-f560700677a5", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 5\n", + "\n", + "**Question** : Enter the query that returns the total length of the secondary roads (of type \"simple\") in a column named \"length\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87d476b6-d562-425b-9e16-6e9cea306092", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "0dff4adf-dff5-4458-9e9f-9a303fe6dc62", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4aaf1326-f94a-4527-b334-266e5989238f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_geometrical_queries import *\n", + "question5_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "1fc6941d-ec9f-4f95-88cb-feb3cbe33957", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e3030f35-7b38-4be0-852b-b1dff7635117", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`Select sum(GLength(Geometry)) as longueur From Roads Where Type like 'simple'`\n", + "\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "8db43a73-7383-4f1c-a875-434742cb7001", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 6\n", + "\n", + "**Question** : Enter the query that returns the number of wells located south-west of the village's centroid (use the metacharacter in the aggregation function)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8e90b3e-0970-4486-9c80-b042d194ca45", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "65ef6110-3ea2-4467-ba16-5ddcc00c788c", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf911d59-4dfd-45e4-a780-27ef0d62c0ba", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_geometrical_queries import *\n", + "question6_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "07f02dc6-36ec-48f2-ba39-eaf5d1667148", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "ea940598-7455-4e12-9164-25393a14c08f", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`SELECT COUNT(*) AS num_SW FROM Wells, Village WHERE X(Wells.Geometry)\n" + ] + }, + { + "cell_type": "markdown", + "id": "a09fd9b4-7dc2-440a-8114-fe4113ecc51b", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 7\n", + "\n", + "**Question** : Enter the query that allows to check, in a column named \"EPSG\", that all features in the \"Wells\" layer do share the same projection system\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3302661-2578-4978-8c42-d55d84713b64", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "8162aa85-1b2b-43e6-8a23-564fb8144ce9", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5562a720-c7f2-43af-b711-666afbe530c9", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_geometrical_queries import *\n", + "question7_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "93513f04-fac1-4ab6-92e1-72505d7f4e3f", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "941f4861-f978-4538-a9e4-828730f58bce", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`Select SRID(Geometry) as EPSG From Wells`\n", + "\n", + " \n", + "
\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/6-Geometrical_spatial_queries/SQL_geometrical_queries.py b/6-Geometrical_spatial_queries/SQL_geometrical_queries.py new file mode 100644 index 0000000..b4d8822 --- /dev/null +++ b/6-Geometrical_spatial_queries/SQL_geometrical_queries.py @@ -0,0 +1,271 @@ +import sqlite3 +import pandas +import hashlib +import pickle +from pandas._testing import assert_frame_equal + +""" +Utility functions +""" +def run_query(q): + con = sqlite3.connect("india.sqlite") + con.enable_load_extension(True) + con.load_extension("mod_spatialite") + return pandas.read_sql_query(q, con) + +def generate_hash(dataframe): + return hashlib.md5(pickle.dumps(dataframe)).hexdigest() + +def check_hash(dataframe, dataframehash): + return generate_hash(dataframe) == dataframehash + + +""" +Validation function for Question 1 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question1_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = 'e9615a5915bb7d1b32f4028c5e4cd977' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 2 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question2_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = 'db93725980882289f5860ab67d8e834f' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 3 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question3_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = 'ec83fa5ea746982b4527ef311471e862' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 4 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question4_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '78b62222924bd032a5336a349e1db8a6' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + """ +Validation function for Question 5 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question5_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '265f265a18c2cb6bfd9c1c81c1dac805' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + +""" +Validation function for Question 6 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question6_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '63a6dc8c30c7e8213f63b240ef9b49b2' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + +""" +Validation function for Question 7 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question7_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '1fabe794369f17e66458156b5f71e12b' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + \ No newline at end of file diff --git a/6-Geometrical_spatial_queries/india.sqlite b/6-Geometrical_spatial_queries/india.sqlite new file mode 100644 index 0000000..f6939f0 Binary files /dev/null and b/6-Geometrical_spatial_queries/india.sqlite differ diff --git a/7-Topological_spatial_queries/SQL_topological_queries.ipynb b/7-Topological_spatial_queries/SQL_topological_queries.ipynb new file mode 100644 index 0000000..ceb0838 --- /dev/null +++ b/7-Topological_spatial_queries/SQL_topological_queries.ipynb @@ -0,0 +1,532 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3f467ad7-4bdd-4ce9-86f6-738f90481072", + "metadata": {}, + "source": [ + "# Exercice SQL – Topological spatial queries\n", + "\n", + "The questions in this exercise are about the syntax of SQL queries.\n", + "\n", + "We will use:\n", + "* a QSpatiaLite database with agricultural data for a region in India, including the layers \"Cultivated_Fields\", \"Roads\", \"Streams\", \"Village\" and \"Wells\";\n", + "* SQL to write queries on this database;\n", + "* the programming language Python to query this database (execute the queries).\n", + "\n", + "
\n", + "
\n", + " How to use this notebook?
\n", + " This notebook consists of text cells and code cells (Python). The code cells must be executed in order to see the output of the program.
To execute a cell, select it and then simply click on the \"play\" button () in the toolbar at the top of the notebook, or type shift + enter.
It is important to run the code cells in the same order as they appear in the notebook.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "a08280d1-df23-4e8a-ae6f-64b55897c079", + "metadata": {}, + "source": [ + "---\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "87e805db-56a7-4961-88ac-188f654c49bb", + "metadata": { + "tags": [] + }, + "source": [ + "# Accessing the database with Python\n", + "\n", + "In this first part, we show you with an example how to write and execute your queries on the stork database with Python." + ] + }, + { + "cell_type": "markdown", + "id": "ebd61a3d-fadf-4b0a-aaea-749f36a4f65b", + "metadata": { + "tags": [] + }, + "source": [ + "## Importing Python libraries\n", + "To manipulate a database with Python we need the `sqlite3` and `pandas` libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7a9341b1-f624-4d3e-9c28-eaf5e1dc1cdd", + "metadata": {}, + "outputs": [], + "source": [ + "# Importing libraries\n", + "import sqlite3\n", + "import pandas" + ] + }, + { + "cell_type": "markdown", + "id": "207f63fb-4a3c-448c-b8f4-d576ee14f934", + "metadata": {}, + "source": [ + "## Connection to the database \n", + "First, a connection to the database is established:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2ae2db69-8429-4dea-949f-8457ccf0b6e1", + "metadata": {}, + "outputs": [], + "source": [ + "# Establish the connection to the database:\n", + "con = sqlite3.connect(\"india.sqlite\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3b7fba08-8930-4bd0-8ebd-767560d268ae", + "metadata": {}, + "outputs": [], + "source": [ + "# Load spatialite extension\n", + "con.enable_load_extension(True)\n", + "con.load_extension(\"mod_spatialite\")" + ] + }, + { + "cell_type": "markdown", + "id": "61d93076-5f19-4368-b9bb-857509c4a1eb", + "metadata": {}, + "source": [ + "## Executing SQL queries\n", + "\n", + "To execute a SQL query on our database with Python, we use the `read_sql_query` function from the `pandas` library which :\n", + "* executes the query\n", + "* returns the result table\n", + "To display this table, you just have to do a `display`.\n", + "\n", + "Here is an example, which executes the query `select * from wells` and displays the result:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0349a05d-05a0-4475-95e5-29b6c40187b1", + "metadata": {}, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\"\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "1e67177d-5d7c-4c7c-87dc-411c813d283a", + "metadata": {}, + "source": [ + "
\n", + "\n", + "**Question** : modify the above query to retrieve the contents of the `streams` table." + ] + }, + { + "cell_type": "markdown", + "id": "ccc0e712-9c7b-436a-8c39-30a203e7bda9", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "de659433-88aa-4037-8e22-5733cfcd889d", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select * from streams`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "cdef6666-e9c7-4614-8d89-95ea6629d98d", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 1\n", + "**Question** : Enter the query that returns the list of the id's of the wells that completely inside Mr. Vajubhai Madha's fields\n", + "\n", + "Here is how to test your query: \n", + "1. Edit the query in the cell below and then execute the cell to see the result. \n", + "You can edit and re-execute the cell as many times as necessary. \n", + "2. Once you are satisfied with your query, you can execute the validation cell below to check if your query is correct or not. \n", + "3. Finally, you can compare your query with the solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a3e9aa00-348b-4318-b189-6198e1d76828", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "cd727d93-26be-4065-8240-b602a58feb2f", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "839e19f7-bdc8-494d-9fae-b41e3035e4a1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_topological_queries import *\n", + "question1_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "b5484b5a-a09d-40d3-b6e9-9b291f6de612", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "3248b00b-42fc-4043-849d-ce206ef3d94e", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`SELECT DISTINCT Cultivated_Fields.ID FROM Wells, Cultivated_Fields WHERE intersects(Wells.Geometry, Cultivated_Fields.Geometry) AND Cultivated_Fields.owner like 'Bhikhabhai Lakhman'`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "5ded8691-5cc7-4ffd-8e7a-8b9e68ec9b28", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Question 2\n", + "\n", + "**Question** : Enter the query that allows to create the QGIS layer hosting the agricultural fields that are in contact with a river while making certain that each selected features only appears once (use a grouping clause).\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2abb3246-aa76-4bea-bca9-d7210aa84f01", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "fd52bda4-1f3a-44c3-bc39-d2b4abfcd7c6", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b60cc09c-cbe5-4055-bdbf-dff6ef205faa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_topological_queries import *\n", + "question2_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "7308e5d5-dd56-4544-acf7-adb12ac238d5", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "74758991-7c8a-49be-8c23-d99f3505b03b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select Cultivated_Fields.Geometry from Cultivated_Fields , Streams where intersects(Streams.Geometry, Cultivated_Fields.Geometry) group by Cultivated_Fields.Geometry`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "baf8ea9d-52c7-4f09-b2da-f7a2350edb84", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 3\n", + "\n", + "**Question** : Enter the query that returns the list of features located at less than 10 meters from a primary road (of type \"double\") without using a GROUP BY clause." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "72269faf-b502-4ded-b36d-c2f1a8b238a2", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "8a4b8988-8304-45cd-9d33-5a121da3db8e", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "edd89993-4524-4260-ae45-c03dcef23817", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_topological_queries import *\n", + "question3_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "b3a41747-a321-40dd-b6c2-5501eeb0d7f4", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "e4b270be-9464-4b53-8dc9-ae0b6d45f30f", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`SELECT DISTINCT Cultivated_Fields.ID FROM Cultivated_Fields, Roads WHERE Distance(Cultivated_Fields.Geometry, Roads.Geometry) <10 and Roads.Type='double'`\n", + " \n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "2181501d-9609-45db-add3-11fef5627712", + "metadata": { + "tags": [] + }, + "source": [ + "---\n", + "\n", + "## Question 4\n", + "\n", + "**Question** : Enter the query that creates a 20m wide buffer around roads (name this column geom_buffer) and display it in QGIS." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a7586c84-a4a6-45b4-bc46-dc47edf04e08", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Give the query to execute\n", + "query = \"select * from wells\" # TODO: MODIFY THE QUERY HERE\n", + "\n", + "# Execute the query and get the result table\n", + "resulttable = pandas.read_sql_query(query, con)\n", + "\n", + "# Display the result table\n", + "display(resulttable)" + ] + }, + { + "cell_type": "markdown", + "id": "bbcda836-4f22-4401-b2e4-eae1b2f94289", + "metadata": {}, + "source": [ + "**Validation** : Execute the following cell to see if your query is correct or not. \n", + "Read carefully the error messages to correct your query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f6fa35d-2308-4d9f-91ad-3b849f122479", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from SQL_topological_queries import *\n", + "question4_validation(query)" + ] + }, + { + "cell_type": "markdown", + "id": "1141a355-935e-4b08-9728-f999e0a587e2", + "metadata": { + "tags": [] + }, + "source": [ + "**Solution** : Click on the three small dots \"...\" below." + ] + }, + { + "cell_type": "markdown", + "id": "4b0e5608-1b34-47ca-9e4e-eac6f9fb944b", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "source": [ + "
\n", + "\n", + "Here is the correct query:\n", + " \n", + "`select Buffer(geometry,20) as geom_buffer from Streams`\n", + " \n", + "
\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/7-Topological_spatial_queries/SQL_topological_queries.py b/7-Topological_spatial_queries/SQL_topological_queries.py new file mode 100644 index 0000000..bf1fb2e --- /dev/null +++ b/7-Topological_spatial_queries/SQL_topological_queries.py @@ -0,0 +1,165 @@ +import sqlite3 +import pandas +import hashlib +import pickle +from pandas._testing import assert_frame_equal + +""" +Utility functions +""" +def run_query(q): + con = sqlite3.connect("india.sqlite") + con.enable_load_extension(True) + con.load_extension("mod_spatialite") + return pandas.read_sql_query(q, con) + +def generate_hash(dataframe): + return hashlib.md5(pickle.dumps(dataframe)).hexdigest() + +def check_hash(dataframe, dataframehash): + return generate_hash(dataframe) == dataframehash + + +""" +Validation function for Question 1 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question1_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = 'd166c8873454a2de98a40adf81b53d76' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 2 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question2_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '8530058d9604c8cedc237700a53b6bd9' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 3 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question3_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '093be6b2f31ffe3bae30e5abea2959a7' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + + + """ +Validation function for Question 4 +=> We compare with the possible mistakes made by students +=> Then we finally compare with the correct answer + +# Possible mistakes: we check that the result obtained by the student is different from: +# * the union of 2001 and 2003 +""" +def question4_validation(studentquery): + + # Misconception: use of UNION + #query = "select name from migration where departureyear=2001 union select name from migration where departureyear=2003" + #if run_query(studentquery).equals(run_query(query)): + # print('You selected the storks who departed in 2001 AS WELL AS the storks who departed in 2003.' + # +'\nYou should get only the storks who departed BOTH in 2001 and in 2003.' + # +'\nHave you drawn a diagram on paper to determine which logical operator to use?') + + + # Default case that catches all misconceptions (including the mistakes flagged above) + # => we compare with the correct answer + try: + # Compare with the MD5 hash of the result table, which means the student can neither know the correct query nor the correct result table + # (this also means that the error message cannot help students correct their answer) + # NB: the hash of the correct result table has to be generated beforehand (using the utility function) + # generate_hash(run_query(correctquery)) + response_hash = '6ef0fa33660f3f1c0b9446931c9a2d80' + assert check_hash(run_query(studentquery), response_hash), "Incorrect query." + + except AssertionError as ae: + print("Your query didn't pass the validation tests.") + # We raise the exception here to stop execution and prevent the success message to be printed + raise ae + + print("Your query has passed the validation tests and is evaluated as correct, well done!") + diff --git a/7-Topological_spatial_queries/india.sqlite b/7-Topological_spatial_queries/india.sqlite new file mode 100644 index 0000000..f6939f0 Binary files /dev/null and b/7-Topological_spatial_queries/india.sqlite differ