{ "cells": [ { "cell_type": "markdown", "id": "fbb55442-3bb2-4349-a1cf-6a6e31223ac9", "metadata": { "tags": [] }, "source": [ "# SQL exercise – Sorting and aggregating results from an SQL query \n", "\n", "The questions in this exercise are about the syntax of SQL queries.\n", "\n", "We will use:\n", "* the QSpatiaLite stork database that we used earlier;\n", "* SQL to write queries on this database;\n", "* the programming language Python to query this database (execute the queries).\n", "\n", "
\n", "
\n", " How to use this notebook?
\n", " This notebook consists of text cells and code cells (Python). The code cells must be executed in order to see the output of the program.
To execute a cell, select it and then simply click on the \"play\" button () in the toolbar at the top of the notebook, or type shift + enter.
It is important to run the code cells in the same order as they appear in the notebook.\n", "
\n" ] }, { "cell_type": "markdown", "id": "2910727c-4ef6-4b7b-9299-fdf13b6cb97b", "metadata": {}, "source": [ "---\n", "---" ] }, { "cell_type": "markdown", "id": "584c8be5-e7f0-4958-b0c9-605e85cab93c", "metadata": { "tags": [] }, "source": [ "# Accessing the database with Python\n", "\n", "In this first part, we show you with an example how to write and execute your queries on the stork database with Python." ] }, { "cell_type": "markdown", "id": "973aa7ac-4e1e-4f22-98ff-dd7145757f0a", "metadata": { "tags": [] }, "source": [ "## Importing Python libraries\n", "To manipulate a database with Python we need the `sqlite3` and `pandas` libraries:" ] }, { "cell_type": "code", "execution_count": null, "id": "431bc147-6209-407c-9bf5-bffe7599604d", "metadata": {}, "outputs": [], "source": [ "# Importing libraries\n", "import sqlite3\n", "import pandas" ] }, { "cell_type": "markdown", "id": "4d65a6f1-b050-4a3b-a3b1-9c7bf17c7f6f", "metadata": {}, "source": [ "## Connection to the database \n", "First, a connection to the database is established:" ] }, { "cell_type": "code", "execution_count": null, "id": "75265ea6-4977-41bd-9498-043b9421b853", "metadata": {}, "outputs": [], "source": [ "# Establish the connection to the database:\n", "con = sqlite3.connect(\"stork-mooc.sqlite\")" ] }, { "cell_type": "markdown", "id": "2ae60c22-65df-413f-956f-b8009094576a", "metadata": {}, "source": [ "## Executing SQL queries\n", "\n", "To execute a SQL query on our database with Python, we use the `read_sql_query` function from the `pandas` library which :\n", "* executes the query\n", "* returns the result table\n", "To display this table, you just have to do a `display`.\n", "\n", "Here is an example, which executes the query `select * from migration` and displays the result:" ] }, { "cell_type": "code", "execution_count": null, "id": "3d1b8e64-2b57-41fa-812d-bc8f046935f3", "metadata": {}, "outputs": [], "source": [ "# Give the query to execute\n", "query = \"select * from migration\"\n", "\n", "# Execute the query and get the result table\n", "resulttable = pandas.read_sql_query(query, con)\n", "\n", "# Display the result table\n", "display(resulttable)" ] }, { "cell_type": "markdown", "id": "dc7b9759-d7fc-46e9-87b2-3821155e37b6", "metadata": {}, "source": [ "
\n", "\n", "**Question** : modify the above query to retrieve the contents of the `stork` table." ] }, { "cell_type": "markdown", "id": "90bb2c4b-04a8-4756-90c3-def44f6b2c98", "metadata": { "tags": [] }, "source": [ "**Solution** : Click on the three small dots \"...\" below." ] }, { "cell_type": "markdown", "id": "b5c5dfa1-02f1-4708-8db7-c9ec9610e9d4", "metadata": { "jupyter": { "source_hidden": true }, "tags": [] }, "source": [ "
\n", "\n", "Here is the correct query:\n", " \n", "`select * from stork`\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "0c0ae1db-9414-4647-91bf-258224cac621", "metadata": {}, "source": [ "---\n", "---" ] }, { "cell_type": "markdown", "id": "96c8744f-8111-4aaa-92d2-221208bdf724", "metadata": {}, "source": [ "# Exercises with SQL\n", "\n", "For each of the following questions, **give the SQL syntax for the requested queries** and **test your queries.** \n", "\n", "As a reminder, here is the UML diagram of our stork database. \n", "\n", "\"UML" ] }, { "cell_type": "markdown", "id": "cdc19960-8b8d-438e-93a2-af63369532ab", "metadata": {}, "source": [ "---\n", "\n", "## Question 1\n", "**Question** : Select the mean age of the storks in 2010; name the column “age_moyen”.\n", "\n", "Here is how to test your query: \n", "1. Edit the query in the cell below and then execute the cell to see the result. \n", "You can edit and re-execute the cell as many times as necessary. \n", "2. Once you are satisfied with your query, you can execute the validation cell below to check if your query is correct or not. \n", "3. Finally, you can compare your query with the solution." ] }, { "cell_type": "code", "execution_count": null, "id": "205d8ac7-cea8-4f67-959b-a6c9948d0571", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Give the query to execute\n", "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", "\n", "# Execute the query and get the result table\n", "resulttable = pandas.read_sql_query(query, con)\n", "\n", "# Display the result table\n", "display(resulttable)" ] }, { "cell_type": "markdown", "id": "3fccc96a-e289-4881-86c8-3d897e89fe0c", "metadata": {}, "source": [ "**Validation** : Execute the following cell to see if your query is correct or not. \n", "Read carefully the error messages to correct your query." ] }, { "cell_type": "code", "execution_count": null, "id": "839e19f7-bdc8-494d-9fae-b41e3035e4a1", "metadata": { "tags": [] }, "outputs": [], "source": [ "from SQL_aggregation import *\n", "question1_validation(query)" ] }, { "cell_type": "markdown", "id": "37e139f8-003b-4033-85d0-6432fcce10d5", "metadata": { "tags": [] }, "source": [ "**Solution** : Click on the three small dots \"...\" below." ] }, { "cell_type": "markdown", "id": "3248b00b-42fc-4043-849d-ce206ef3d94e", "metadata": { "jupyter": { "source_hidden": true }, "tags": [] }, "source": [ "
\n", "\n", "Here is the correct query:\n", " \n", "`SELECT avg(2010-Stork.yearofbirth) AS age_moyen FROM Stork`\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "5ded8691-5cc7-4ffd-8e7a-8b9e68ec9b28", "metadata": { "tags": [] }, "source": [ "---\n", "\n", "## Question 2\n", "\n", "**Question** : For all years where at least one stork is born, list the number born; name the column “num_stork”." ] }, { "cell_type": "code", "execution_count": null, "id": "5f6ce632-102f-481e-a2af-25a6aecdc889", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Give the query to execute\n", "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", "\n", "# Execute the query and get the result table\n", "resulttable = pandas.read_sql_query(query, con)\n", "\n", "# Display the result table\n", "display(resulttable)" ] }, { "cell_type": "markdown", "id": "edaa1f70-4d8a-4cae-931a-ada4760d354c", "metadata": {}, "source": [ "**Validation** : Execute the following cell to see if your query is correct or not. \n", "Read carefully the error messages to correct your query." ] }, { "cell_type": "code", "execution_count": null, "id": "b60cc09c-cbe5-4055-bdbf-dff6ef205faa", "metadata": { "tags": [] }, "outputs": [], "source": [ "from SQL_aggregation import *\n", "question2_validation(query)" ] }, { "cell_type": "markdown", "id": "413ba9bd-1a59-455d-a430-ce1e77962602", "metadata": { "tags": [] }, "source": [ "**Solution** : Click on the three small dots \"...\" below." ] }, { "cell_type": "markdown", "id": "74758991-7c8a-49be-8c23-d99f3505b03b", "metadata": { "jupyter": { "source_hidden": true }, "tags": [] }, "source": [ "
\n", "\n", "Here is the correct query:\n", " \n", "`SELECT Stork.yearofbirth, count(Stork.yearofbirth) As num_stork FROM Stork GROUP BY Stork.yearofbirth`\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "baf8ea9d-52c7-4f09-b2da-f7a2350edb84", "metadata": { "tags": [] }, "source": [ "---\n", "\n", "## Question 3\n", "\n", "**Question** : Give the name and ID of all sponsors as well as how many storks they sponsor (use the JOIN clause)." ] }, { "cell_type": "code", "execution_count": null, "id": "6b091e2e-1af8-4cc0-bb1c-9ffa05708438", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Give the query to execute\n", "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", "\n", "# Execute the query and get the result table\n", "resulttable = pandas.read_sql_query(query, con)\n", "\n", "# Display the result table\n", "display(resulttable)" ] }, { "cell_type": "markdown", "id": "33b11759-fdd6-4f55-b03f-b3a2004ff564", "metadata": {}, "source": [ "**Validation** : Execute the following cell to see if your query is correct or not. \n", "Read carefully the error messages to correct your query." ] }, { "cell_type": "code", "execution_count": null, "id": "edd89993-4524-4260-ae45-c03dcef23817", "metadata": { "tags": [] }, "outputs": [], "source": [ "from SQL_aggregation import *\n", "question3_validation(query)" ] }, { "cell_type": "markdown", "id": "47f1675f-d9ab-4792-aca1-c3b61fd16f4f", "metadata": { "tags": [] }, "source": [ "**Solution** : Click on the three small dots \"...\" below." ] }, { "cell_type": "markdown", "id": "e4b270be-9464-4b53-8dc9-ae0b6d45f30f", "metadata": { "jupyter": { "source_hidden": true }, "tags": [] }, "source": [ "
\n", "\n", "Here is the correct query:\n", " \n", "`select member.id, member.name, count(*) from Member left join MemberStork on Member.id=MemberStork.memberid group by Member.id, Member.name`\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "2181501d-9609-45db-add3-11fef5627712", "metadata": { "tags": [] }, "source": [ "---\n", "\n", "## Question 4\n", "\n", "**Question** : Give the name and ID of all sponsors who sponsor at least two storks (use the JOIN and HAVING clauses)." ] }, { "cell_type": "code", "execution_count": null, "id": "1bbb2a1b-1d4e-4d1c-8ead-26f8499bc6b3", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Give the query to execute\n", "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", "\n", "# Execute the query and get the result table\n", "resulttable = pandas.read_sql_query(query, con)\n", "\n", "# Display the result table\n", "display(resulttable)" ] }, { "cell_type": "markdown", "id": "5bb0fffd-adab-4f4c-8baa-815f6e5f1db9", "metadata": {}, "source": [ "**Validation** : Execute the following cell to see if your query is correct or not. \n", "Read carefully the error messages to correct your query." ] }, { "cell_type": "code", "execution_count": null, "id": "6f6fa35d-2308-4d9f-91ad-3b849f122479", "metadata": { "tags": [] }, "outputs": [], "source": [ "from SQL_aggregation import *\n", "question4_validation(query)" ] }, { "cell_type": "markdown", "id": "dc8a2378-8a37-4e92-888a-73d433913b45", "metadata": { "tags": [] }, "source": [ "**Solution** : Click on the three small dots \"...\" below." ] }, { "cell_type": "markdown", "id": "4b0e5608-1b34-47ca-9e4e-eac6f9fb944b", "metadata": { "jupyter": { "source_hidden": true }, "tags": [] }, "source": [ "
\n", "\n", "Here is the correct query:\n", " \n", "`SELECT Member.id, Member.name, count(MemberStork.name) FROM Member Left join MemberStork on Member.id = MemberStork.memberid Group by Member.id, Member.name Having count(MemberStork.name) >= 2`\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "43863949-1e0d-4dee-9f08-f560700677a5", "metadata": { "tags": [] }, "source": [ "---\n", "\n", "## Question 5\n", "\n", "**Question** : Select the names of all storks and order them from oldest to youngest (but do not return the year of birth column)." ] }, { "cell_type": "code", "execution_count": null, "id": "c1fe8368-5515-4846-bb3f-7dec58827cdb", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Give the query to execute\n", "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", "\n", "# Execute the query and get the result table\n", "resulttable = pandas.read_sql_query(query, con)\n", "\n", "# Display the result table\n", "display(resulttable)" ] }, { "cell_type": "markdown", "id": "f5a2065f-e7f4-4577-aa04-8729604e3572", "metadata": {}, "source": [ "**Validation** : Execute the following cell to see if your query is correct or not. \n", "Read carefully the error messages to correct your query." ] }, { "cell_type": "code", "execution_count": null, "id": "4aaf1326-f94a-4527-b334-266e5989238f", "metadata": { "tags": [] }, "outputs": [], "source": [ "from SQL_aggregation import *\n", "question5_validation(query)" ] }, { "cell_type": "markdown", "id": "00956dcf-2a10-41c1-9aad-d08166d836ae", "metadata": { "tags": [] }, "source": [ "**Solution** : Click on the three small dots \"...\" below." ] }, { "cell_type": "markdown", "id": "e3030f35-7b38-4be0-852b-b1dff7635117", "metadata": { "jupyter": { "source_hidden": true }, "tags": [] }, "source": [ "
\n", "\n", "Here is the correct query:\n", " \n", "`SELECT Stork.name FROM Stork ORDER by Stork.yearofbirth asc`\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "4c55e7b3-2ab9-46c4-9f88-c9366b3dcd13", "metadata": { "tags": [] }, "source": [ "---\n", "\n", "## Question 6\n", "\n", "**Question** : Return the names of the three oldest storks (use the LIMIT clause)." ] }, { "cell_type": "code", "execution_count": null, "id": "689ba410-4c24-4e0c-ba5d-bf51b362a865", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Give the query to execute\n", "query = \"select * from migration\" # TODO: MODIFY THE QUERY HERE\n", "\n", "# Execute the query and get the result table\n", "resulttable = pandas.read_sql_query(query, con)\n", "\n", "# Display the result table\n", "display(resulttable)" ] }, { "cell_type": "markdown", "id": "85879166-2e4c-4a6e-be7a-e819f6bae96a", "metadata": {}, "source": [ "**Validation** : Execute the following cell to see if your query is correct or not. \n", "Read carefully the error messages to correct your query." ] }, { "cell_type": "code", "execution_count": null, "id": "3551772d-e74e-4d39-ba20-c1712dac29e3", "metadata": { "tags": [] }, "outputs": [], "source": [ "from SQL_aggregation import *\n", "question6_validation(query)" ] }, { "cell_type": "markdown", "id": "daeceac3-3124-492a-b984-682ce74224d1", "metadata": { "tags": [] }, "source": [ "**Solution** : Click on the three small dots \"...\" below." ] }, { "cell_type": "markdown", "id": "720c7da3-26e5-415d-932e-9e0f808c6f51", "metadata": { "jupyter": { "source_hidden": true }, "tags": [] }, "source": [ "
\n", "\n", "Here is the correct query:\n", " \n", "`SELECT Stork.name FROM Stork ORDER by Stork.yearofbirth asc LIMIT 3`\n", " \n", "
\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }