{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"border:1px solid black; padding:10px 10px;\">\n",
    "    <strong>Jupyter Notebooks for Teaching and Learning</strong><br/>\n",
    "    C. Hardebolle, P. Jermann, R. Tormey, CC BY-NC-SA 4.0 Int.<br/>\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-size:xx-large;\">Introduction to hypothesis testing</p>\n",
    "\n",
    "An important part of the scientific process is to make hypotheses about the world or about the results of experiments. These hypotheses need then to be checked by collecting evidence and making comparisons. Hypothesis testing is a step in this process where statistical tools are used to test hypotheses using data.\n",
    "\n",
    "**This notebook is designed for you to learn**:\n",
    "* How to distinguish between \"population\" datasets and \"sample\" datasets when dealing with experimental data\n",
    "* How to compare a sample to a population, test a hypothesis using a statistical test called the \"t-test\" and interpret its results\n",
    "* How to use Python scripts to make statistical analyses on a dataset\n",
    "\n",
    "In the following, we will use an example dataset representing series of measurements on a type of flower called Iris."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "\n",
    "<div  style=\"width:300px;float:right;margin-left:15px;\">\n",
    "    <img src=\"figs/iris-virginica.jpg\" alt=\"iris virginica\"/>\n",
    "\n",
    "###### Iris Virginica (Credit: Frank Mayfield CC BY-SA 2.0)\n",
    "\n",
    "</div>\n",
    "\n",
    "In 1935, an american botanist called Edgar Anderson worked on quantifying the morphologic variation of Iris flowers of three related species, Iris Setosa, Iris Virginica and Iris Versicolor [[1]](#Bibliography). He realized a series of measures of the petal length, petal width, sepal length, sepal width and species.\n",
    "Based on the combination of these four features, a British statistician and biologist named Ronald Fisher developed a model to distinguish the species from each other [[2]](#Bibliography).\n",
    "\n",
    "## Question\n",
    "\n",
    "A recent series of measurements has been carried out at the [Iris Garden of the Vullierens Castle](https://chateauvullierens.ch/en/) near Lausanne, on a sample of 50 flowers of the Iris Virginica species. \n",
    "**How similar (or different) is the Iris sample from the Vullierens Castle compared to the Iris Virginica population documented by Edgar Anderson?**\n",
    "\n",
    "## Instructions\n",
    "\n",
    "This notebook will guide you in the use of Python tools for analyzing this experimental dataset and perform statistical tests which are widely used in hypothesis testing. \n",
    "It includes:\n",
    "* **explanations to read** about how to analyze experimental data to answer a research question,\n",
    "* **code to execute** to illustrate how to perform data analysis using Python.\n",
    "* **questions** to help you think about what you learn along the way.\n",
    "\n",
    "\n",
    "**Solutions** of all the questions are available [in this file](./solution/StatisticsNotebook-solution.ipynb), we recommend you to **check your answer** after each question, before moving to the next piece of content."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"border:1px solid red; padding:10px 10px;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">How to use this notebook?</span><br/>\n",
    "    <ul>\n",
    "        <li>To <strong>execute</strong> the code in this notebook, simply click on the cell containing the code and then click on the \"play\" button (<span style=\"font: bold 12px/30px Arial, serif;\">&#9658;</span>) in the tool bar just above the notebook, or type <code>shift + enter</code>.<br/>It is important to execute the code cells in their order of appearance in the notebook.</li>\n",
    "        <li>You can <strong>change the content</strong> of all the code cells of this notebook, and also <strong>add new cells</strong> to the notebook by clicking on the \"plus\" button (<span style=\"font: bold 12px/30px Arial, serif;\">+</span>) in the tool bar just above the notebook.<br/>\n",
    "        By default, cells you add to the notebook are made to contain code.<br/>\n",
    "        If you want a new cell to contain text, select \"Markdown\" in the drop down menu in the same tool bar.</li>\n",
    "    </ul>\n",
    "</div>\n",
    "<br/>\n",
    "\n",
    "While using the notebook, you can also **take notes on a piece of paper** if you feel this is helpful.\n",
    "\n",
    "&nbsp;\n",
    "\n",
    "\n",
    "--- "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting started"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python tools for stats\n",
    "Python comes with a number of libraries for processing data and computing statistics.\n",
    "To use these tool you first have to load them using the `import` keyword.  \n",
    "The role of the code cell just below is to load the tools that we use in the rest of the notebook. It is important to execute this cell *prior to executing any other cell in the notebook*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plotting and display tools\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "plt.style.use('seaborn-whitegrid') # global style for plotting\n",
    "\n",
    "from IPython.display import display, set_matplotlib_formats\n",
    "set_matplotlib_formats('svg') # vector format for graphs\n",
    "\n",
    "# data computation tools\n",
    "import numpy as np \n",
    "import pandas as pan\n",
    "import math\n",
    "\n",
    "# statistics tools\n",
    "import scipy.stats as stats\n",
    "from lib.dataanalysis import *  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data available on the Anderson population\n",
    "\n",
    "Anderson has published summary statistics of his dataset.  \n",
    "You have the **mean petal length of the Iris Virginica species** documented by Anderson: $\\mu = 5.552$ cm, which we define in the code below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5.552"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define mu as mean petal length of Iris Virginica species from Anderson\n",
    "mu = 5.552\n",
    "\n",
    "# Display mu\n",
    "mu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    What does the first line of code above do? And what is the role of the second line of code?<br/>\n",
    "    How would you do to define another value in the code, for instance the mean petal length of Iris Versicolor $\\mu_{versicolor}= 4.26$ cm?<br/>\n",
    "    Type your code using the cell below and execute it to test the result. \n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can check your answer by clicking on the \"...\" below.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define mu_versicolor here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    The first line of code defines a variable called <code>mu</code> and sets its value to <code>5.552</code>.<br/>\n",
    "    The role of the second line of code is to display the value of <code>mu</code><br/>\n",
    "    Based on the same model, below is the code to define <code>mu_versicolor</code> with a value of <code>4.26</code> and display it. \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4.26"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define mu_versicolor here\n",
    "mu_versicolor = 4.26\n",
    "\n",
    "# Display beta\n",
    "mu_versicolor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data available on the Vullierens sample\n",
    "\n",
    "You have the raw data collected on the petal length and petal width of the Vullierens sample, which is stored in the file `iris-sample-vullierens.csv` that you can see in the file explorer in the left pane.  \n",
    "If you double click on the file it will open in a new tab and you can look at what is inside.\n",
    "\n",
    "Now to analyze the data using Python you have to read the file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>petal_length</th>\n",
       "      <th>petal_width</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5.090981</td>\n",
       "      <td>1.787443</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5.224431</td>\n",
       "      <td>2.259538</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7.251620</td>\n",
       "      <td>2.055940</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5.607932</td>\n",
       "      <td>2.311074</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>6.118801</td>\n",
       "      <td>1.997534</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   petal_length  petal_width\n",
       "0      5.090981     1.787443\n",
       "1      5.224431     2.259538\n",
       "2      7.251620     2.055940\n",
       "3      5.607932     2.311074\n",
       "4      6.118801     1.997534"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Read the Vullierens sample data from the CSV file\n",
    "sample_data = pan.read_csv('iris-sample-vullierens.csv')\n",
    "\n",
    "# Display the first few lines of the dataset\n",
    "sample_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After reading the file, its content is stored in the variable `sample_data`, which is a kind of table. The output above shows us an extract of the table, limited to the first 5 lines. We see above that each line of the table is given an index number to identify it. We also see that, appart from the index, the table contains two columns, called `\"petal_length\"` and `\"petal_width\"`, which contains all the measurements made on the Vullierens Irises.\n",
    "\n",
    "To get the complete list of all the values stored in one specific column such as `\"petal_length\"`, you can use the following syntax: `sample_data[\"petal_length\"]`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     5.090981\n",
       "1     5.224431\n",
       "2     7.251620\n",
       "3     5.607932\n",
       "4     6.118801\n",
       "5     6.352507\n",
       "6     4.896926\n",
       "7     5.220964\n",
       "8     6.235352\n",
       "9     6.200244\n",
       "10    5.422812\n",
       "11    5.296983\n",
       "12    4.694441\n",
       "13    5.911687\n",
       "14    5.958683\n",
       "15    5.764169\n",
       "16    6.035653\n",
       "17    6.848299\n",
       "18    6.286982\n",
       "19    5.117292\n",
       "20    4.918408\n",
       "21    5.663514\n",
       "22    6.056574\n",
       "23    6.075641\n",
       "24    5.619982\n",
       "25    6.091000\n",
       "26    5.621478\n",
       "27    5.207927\n",
       "28    5.410302\n",
       "29    5.714093\n",
       "30    5.601681\n",
       "31    5.706329\n",
       "32    5.536061\n",
       "33    5.742188\n",
       "34    5.496693\n",
       "35    5.520262\n",
       "36    4.736357\n",
       "37    5.445666\n",
       "38    5.818557\n",
       "39    6.115245\n",
       "40    6.010444\n",
       "41    5.692231\n",
       "42    5.477746\n",
       "43    5.620406\n",
       "44    5.936960\n",
       "45    6.194876\n",
       "46    6.349760\n",
       "47    4.781601\n",
       "48    5.692977\n",
       "49    6.260550\n",
       "Name: petal_length, dtype: float64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# All values stored in the \"petal_length\" column of the \"sample_data\" table\n",
    "sample_data[\"petal_length\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    How would you access the data stored in the other column of this table, named <code>\"petal_width\"</code>?<br/>\n",
    "    Type and test your code using the cell below.\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can check your answer by clicking on the \"...\" below.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Access the values stored in the \"petal_width\" column of the \"sample_data\" table\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    Below is the code to access the data stored in the <code>\"petal_width\"</code> column of the table: we simply change the name of the column we want to access. \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     1.787443\n",
       "1     2.259538\n",
       "2     2.055940\n",
       "3     2.311074\n",
       "4     1.997534\n",
       "5     2.086726\n",
       "6     2.173142\n",
       "7     2.633884\n",
       "8     1.991561\n",
       "9     2.046800\n",
       "10    2.241896\n",
       "11    1.792723\n",
       "12    1.664687\n",
       "13    2.337676\n",
       "14    2.060855\n",
       "15    2.323527\n",
       "16    1.792844\n",
       "17    1.704919\n",
       "18    2.105049\n",
       "19    1.822861\n",
       "20    2.160682\n",
       "21    1.870989\n",
       "22    1.932335\n",
       "23    2.265946\n",
       "24    2.180749\n",
       "25    1.613862\n",
       "26    2.010236\n",
       "27    2.251614\n",
       "28    2.079616\n",
       "29    2.264294\n",
       "30    1.624387\n",
       "31    2.113045\n",
       "32    2.461176\n",
       "33    2.025708\n",
       "34    2.001345\n",
       "35    1.716347\n",
       "36    1.731154\n",
       "37    1.897109\n",
       "38    2.023749\n",
       "39    2.093593\n",
       "40    1.494173\n",
       "41    2.264155\n",
       "42    1.936023\n",
       "43    1.779750\n",
       "44    2.212744\n",
       "45    2.294535\n",
       "46    2.147301\n",
       "47    1.484636\n",
       "48    1.961032\n",
       "49    1.979474\n",
       "Name: petal_width, dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access the values stored in the \"petal_width\" column of the \"sample_data\" table\n",
    "sample_data[\"petal_width\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&nbsp;\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# First look at the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Descriptive statistics\n",
    "\n",
    "A first important step in analyzing data is to get an idea of its basic characteristics using **descriptive statistics** such as the **mean** (i.e. the average value or \"moyenne\" in French) and the **standard deviation** (\"écart-type\" in French, generally abreviated <em>std</em> in English). \n",
    "So let's compute some simple descriptive statistics on the Vullierens sample data. The `describe()` function gives us right away a number of useful descriptive statistics for all the columns in our data table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>petal_length</th>\n",
       "      <th>petal_width</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>50.000000</td>\n",
       "      <td>50.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>5.713045</td>\n",
       "      <td>2.021249</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>0.518940</td>\n",
       "      <td>0.248203</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>4.694441</td>\n",
       "      <td>1.484636</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>5.428526</td>\n",
       "      <td>1.834893</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>5.692604</td>\n",
       "      <td>2.036254</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>6.070874</td>\n",
       "      <td>2.204745</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>7.251620</td>\n",
       "      <td>2.633884</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       petal_length  petal_width\n",
       "count     50.000000    50.000000\n",
       "mean       5.713045     2.021249\n",
       "std        0.518940     0.248203\n",
       "min        4.694441     1.484636\n",
       "25%        5.428526     1.834893\n",
       "50%        5.692604     2.036254\n",
       "75%        6.070874     2.204745\n",
       "max        7.251620     2.633884"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Compute the descriptive stats\n",
    "sample_stats = sample_data.describe()\n",
    "\n",
    "# Display the result\n",
    "sample_stats"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    From the table above, what is the mean value of the petal length in the Vullierens sample?<br/>\n",
    "    And the standard deviation (<em>std</em>) of the petal length in the Vullierens sample?\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can check your answer by clicking on the \"...\" below.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    From the table above, we can read in the first column, second line that the mean value of the petal length of the Vullierens sample is <code>5.713045 cm</code>.<br/>\n",
    "    We can read in the first column, third line that the standard deviation of the petal length is <code>0.518940 cm</code>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can access individual elements of the `sample_stats` table using the corresponding names for the line and column of the value.  \n",
    "The following cell illustrates how to get the **sample size** (named `count` in the table above):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50.0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Extract the sample mean from the descriptive stats\n",
    "sample_size = sample_stats.loc[\"count\",\"petal_length\"]\n",
    "\n",
    "# Display the result\n",
    "sample_size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another interesting information to extract from these descriptive statistics is the **mean value of the petal length** in the sample:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5.713045387181936"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Extract the sample mean of the petal length from the descriptive stats\n",
    "sample_mean = sample_stats.loc[\"mean\",\"petal_length\"]\n",
    "\n",
    "# Display the result\n",
    "sample_mean"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    How could you access the value of the standard deviation of the petal length in the <code>sample_stats</code> table?<br/>\n",
    "    Type and test your code using the cell below.\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can check your answer by clicking on the \"...\" below.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Extract the sample standard deviation of the petal length from the descriptive stats\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    Below is the code to access the value of the standard deviation of the petal length in the <code>sample_stats</code> table: we use the name of the line containing the value, <code>std</code>, and we store the result in a variable called <code>sample_std</code>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Extract the sample standard deviation of the petal length from the descriptive stats\n",
    "sample_std = sample_stats.loc[\"std\",\"petal_length\"]\n",
    "\n",
    "# Display the result\n",
    "sample_std"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualization\n",
    "\n",
    "After having looked at simple descriptive statistics, another important step is to **visualize the data**, to better identify its characteristics.  \n",
    "Histograms are useful to visualize the [frequency distribution](https://en.wikipedia.org/wiki/Frequency_distribution) of the sample values: the horizontal axis displays intervals of the variable we are looking at, in our case the petal length, and the vertical axis indicates the number of samples in each interval."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       "  \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Created with matplotlib (https://matplotlib.org/) -->\n",
       "<svg height=\"245.018125pt\" version=\"1.1\" viewBox=\"0 0 365.425 245.018125\" width=\"365.425pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       " <defs>\n",
       "  <style type=\"text/css\">\n",
       "*{stroke-linecap:butt;stroke-linejoin:round;}\n",
       "  </style>\n",
       " </defs>\n",
       " <g id=\"figure_1\">\n",
       "  <g id=\"patch_1\">\n",
       "   <path d=\"M -0 245.018125 \n",
       "L 365.425 245.018125 \n",
       "L 365.425 0 \n",
       "L -0 0 \n",
       "z\n",
       "\" style=\"fill:#ffffff;\"/>\n",
       "  </g>\n",
       "  <g id=\"axes_1\">\n",
       "   <g id=\"patch_2\">\n",
       "    <path d=\"M 23.425 224.64 \n",
       "L 358.225 224.64 \n",
       "L 358.225 7.2 \n",
       "L 23.425 7.2 \n",
       "z\n",
       "\" style=\"fill:#ffffff;\"/>\n",
       "   </g>\n",
       "   <g id=\"matplotlib.axis_1\">\n",
       "    <g id=\"xtick_1\">\n",
       "     <g id=\"line2d_1\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 27.402451 224.64 \n",
       "L 27.402451 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_2\"/>\n",
       "     <g id=\"text_1\">\n",
       "      <!-- 4.6 -->\n",
       "      <defs>\n",
       "       <path d=\"M 37.796875 64.3125 \n",
       "L 12.890625 25.390625 \n",
       "L 37.796875 25.390625 \n",
       "z\n",
       "M 35.203125 72.90625 \n",
       "L 47.609375 72.90625 \n",
       "L 47.609375 25.390625 \n",
       "L 58.015625 25.390625 \n",
       "L 58.015625 17.1875 \n",
       "L 47.609375 17.1875 \n",
       "L 47.609375 0 \n",
       "L 37.796875 0 \n",
       "L 37.796875 17.1875 \n",
       "L 4.890625 17.1875 \n",
       "L 4.890625 26.703125 \n",
       "z\n",
       "\" id=\"DejaVuSans-52\"/>\n",
       "       <path d=\"M 10.6875 12.40625 \n",
       "L 21 12.40625 \n",
       "L 21 0 \n",
       "L 10.6875 0 \n",
       "z\n",
       "\" id=\"DejaVuSans-46\"/>\n",
       "       <path d=\"M 33.015625 40.375 \n",
       "Q 26.375 40.375 22.484375 35.828125 \n",
       "Q 18.609375 31.296875 18.609375 23.390625 \n",
       "Q 18.609375 15.53125 22.484375 10.953125 \n",
       "Q 26.375 6.390625 33.015625 6.390625 \n",
       "Q 39.65625 6.390625 43.53125 10.953125 \n",
       "Q 47.40625 15.53125 47.40625 23.390625 \n",
       "Q 47.40625 31.296875 43.53125 35.828125 \n",
       "Q 39.65625 40.375 33.015625 40.375 \n",
       "z\n",
       "M 52.59375 71.296875 \n",
       "L 52.59375 62.3125 \n",
       "Q 48.875 64.0625 45.09375 64.984375 \n",
       "Q 41.3125 65.921875 37.59375 65.921875 \n",
       "Q 27.828125 65.921875 22.671875 59.328125 \n",
       "Q 17.53125 52.734375 16.796875 39.40625 \n",
       "Q 19.671875 43.65625 24.015625 45.921875 \n",
       "Q 28.375 48.1875 33.59375 48.1875 \n",
       "Q 44.578125 48.1875 50.953125 41.515625 \n",
       "Q 57.328125 34.859375 57.328125 23.390625 \n",
       "Q 57.328125 12.15625 50.6875 5.359375 \n",
       "Q 44.046875 -1.421875 33.015625 -1.421875 \n",
       "Q 20.359375 -1.421875 13.671875 8.265625 \n",
       "Q 6.984375 17.96875 6.984375 36.375 \n",
       "Q 6.984375 53.65625 15.1875 63.9375 \n",
       "Q 23.390625 74.21875 37.203125 74.21875 \n",
       "Q 40.921875 74.21875 44.703125 73.484375 \n",
       "Q 48.484375 72.75 52.59375 71.296875 \n",
       "z\n",
       "\" id=\"DejaVuSans-54\"/>\n",
       "      </defs>\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(19.450888 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-52\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-54\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_2\">\n",
       "     <g id=\"line2d_3\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 51.207099 224.64 \n",
       "L 51.207099 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_4\"/>\n",
       "     <g id=\"text_2\">\n",
       "      <!-- 4.8 -->\n",
       "      <defs>\n",
       "       <path d=\"M 31.78125 34.625 \n",
       "Q 24.75 34.625 20.71875 30.859375 \n",
       "Q 16.703125 27.09375 16.703125 20.515625 \n",
       "Q 16.703125 13.921875 20.71875 10.15625 \n",
       "Q 24.75 6.390625 31.78125 6.390625 \n",
       "Q 38.8125 6.390625 42.859375 10.171875 \n",
       "Q 46.921875 13.96875 46.921875 20.515625 \n",
       "Q 46.921875 27.09375 42.890625 30.859375 \n",
       "Q 38.875 34.625 31.78125 34.625 \n",
       "z\n",
       "M 21.921875 38.8125 \n",
       "Q 15.578125 40.375 12.03125 44.71875 \n",
       "Q 8.5 49.078125 8.5 55.328125 \n",
       "Q 8.5 64.0625 14.71875 69.140625 \n",
       "Q 20.953125 74.21875 31.78125 74.21875 \n",
       "Q 42.671875 74.21875 48.875 69.140625 \n",
       "Q 55.078125 64.0625 55.078125 55.328125 \n",
       "Q 55.078125 49.078125 51.53125 44.71875 \n",
       "Q 48 40.375 41.703125 38.8125 \n",
       "Q 48.828125 37.15625 52.796875 32.3125 \n",
       "Q 56.78125 27.484375 56.78125 20.515625 \n",
       "Q 56.78125 9.90625 50.3125 4.234375 \n",
       "Q 43.84375 -1.421875 31.78125 -1.421875 \n",
       "Q 19.734375 -1.421875 13.25 4.234375 \n",
       "Q 6.78125 9.90625 6.78125 20.515625 \n",
       "Q 6.78125 27.484375 10.78125 32.3125 \n",
       "Q 14.796875 37.15625 21.921875 38.8125 \n",
       "z\n",
       "M 18.3125 54.390625 \n",
       "Q 18.3125 48.734375 21.84375 45.5625 \n",
       "Q 25.390625 42.390625 31.78125 42.390625 \n",
       "Q 38.140625 42.390625 41.71875 45.5625 \n",
       "Q 45.3125 48.734375 45.3125 54.390625 \n",
       "Q 45.3125 60.0625 41.71875 63.234375 \n",
       "Q 38.140625 66.40625 31.78125 66.40625 \n",
       "Q 25.390625 66.40625 21.84375 63.234375 \n",
       "Q 18.3125 60.0625 18.3125 54.390625 \n",
       "z\n",
       "\" id=\"DejaVuSans-56\"/>\n",
       "      </defs>\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(43.255537 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-52\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-56\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_3\">\n",
       "     <g id=\"line2d_5\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 75.011747 224.64 \n",
       "L 75.011747 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_6\"/>\n",
       "     <g id=\"text_3\">\n",
       "      <!-- 5.0 -->\n",
       "      <defs>\n",
       "       <path d=\"M 10.796875 72.90625 \n",
       "L 49.515625 72.90625 \n",
       "L 49.515625 64.59375 \n",
       "L 19.828125 64.59375 \n",
       "L 19.828125 46.734375 \n",
       "Q 21.96875 47.46875 24.109375 47.828125 \n",
       "Q 26.265625 48.1875 28.421875 48.1875 \n",
       "Q 40.625 48.1875 47.75 41.5 \n",
       "Q 54.890625 34.8125 54.890625 23.390625 \n",
       "Q 54.890625 11.625 47.5625 5.09375 \n",
       "Q 40.234375 -1.421875 26.90625 -1.421875 \n",
       "Q 22.3125 -1.421875 17.546875 -0.640625 \n",
       "Q 12.796875 0.140625 7.71875 1.703125 \n",
       "L 7.71875 11.625 \n",
       "Q 12.109375 9.234375 16.796875 8.0625 \n",
       "Q 21.484375 6.890625 26.703125 6.890625 \n",
       "Q 35.15625 6.890625 40.078125 11.328125 \n",
       "Q 45.015625 15.765625 45.015625 23.390625 \n",
       "Q 45.015625 31 40.078125 35.4375 \n",
       "Q 35.15625 39.890625 26.703125 39.890625 \n",
       "Q 22.75 39.890625 18.8125 39.015625 \n",
       "Q 14.890625 38.140625 10.796875 36.28125 \n",
       "z\n",
       "\" id=\"DejaVuSans-53\"/>\n",
       "       <path d=\"M 31.78125 66.40625 \n",
       "Q 24.171875 66.40625 20.328125 58.90625 \n",
       "Q 16.5 51.421875 16.5 36.375 \n",
       "Q 16.5 21.390625 20.328125 13.890625 \n",
       "Q 24.171875 6.390625 31.78125 6.390625 \n",
       "Q 39.453125 6.390625 43.28125 13.890625 \n",
       "Q 47.125 21.390625 47.125 36.375 \n",
       "Q 47.125 51.421875 43.28125 58.90625 \n",
       "Q 39.453125 66.40625 31.78125 66.40625 \n",
       "z\n",
       "M 31.78125 74.21875 \n",
       "Q 44.046875 74.21875 50.515625 64.515625 \n",
       "Q 56.984375 54.828125 56.984375 36.375 \n",
       "Q 56.984375 17.96875 50.515625 8.265625 \n",
       "Q 44.046875 -1.421875 31.78125 -1.421875 \n",
       "Q 19.53125 -1.421875 13.0625 8.265625 \n",
       "Q 6.59375 17.96875 6.59375 36.375 \n",
       "Q 6.59375 54.828125 13.0625 64.515625 \n",
       "Q 19.53125 74.21875 31.78125 74.21875 \n",
       "z\n",
       "\" id=\"DejaVuSans-48\"/>\n",
       "      </defs>\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(67.060185 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-53\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-48\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_4\">\n",
       "     <g id=\"line2d_7\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 98.816396 224.64 \n",
       "L 98.816396 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_8\"/>\n",
       "     <g id=\"text_4\">\n",
       "      <!-- 5.2 -->\n",
       "      <defs>\n",
       "       <path d=\"M 19.1875 8.296875 \n",
       "L 53.609375 8.296875 \n",
       "L 53.609375 0 \n",
       "L 7.328125 0 \n",
       "L 7.328125 8.296875 \n",
       "Q 12.9375 14.109375 22.625 23.890625 \n",
       "Q 32.328125 33.6875 34.8125 36.53125 \n",
       "Q 39.546875 41.84375 41.421875 45.53125 \n",
       "Q 43.3125 49.21875 43.3125 52.78125 \n",
       "Q 43.3125 58.59375 39.234375 62.25 \n",
       "Q 35.15625 65.921875 28.609375 65.921875 \n",
       "Q 23.96875 65.921875 18.8125 64.3125 \n",
       "Q 13.671875 62.703125 7.8125 59.421875 \n",
       "L 7.8125 69.390625 \n",
       "Q 13.765625 71.78125 18.9375 73 \n",
       "Q 24.125 74.21875 28.421875 74.21875 \n",
       "Q 39.75 74.21875 46.484375 68.546875 \n",
       "Q 53.21875 62.890625 53.21875 53.421875 \n",
       "Q 53.21875 48.921875 51.53125 44.890625 \n",
       "Q 49.859375 40.875 45.40625 35.40625 \n",
       "Q 44.1875 33.984375 37.640625 27.21875 \n",
       "Q 31.109375 20.453125 19.1875 8.296875 \n",
       "z\n",
       "\" id=\"DejaVuSans-50\"/>\n",
       "      </defs>\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(90.864833 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-53\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-50\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_5\">\n",
       "     <g id=\"line2d_9\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 122.621044 224.64 \n",
       "L 122.621044 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_10\"/>\n",
       "     <g id=\"text_5\">\n",
       "      <!-- 5.4 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(114.669481 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-53\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-52\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_6\">\n",
       "     <g id=\"line2d_11\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 146.425692 224.64 \n",
       "L 146.425692 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_12\"/>\n",
       "     <g id=\"text_6\">\n",
       "      <!-- 5.6 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(138.474129 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-53\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-54\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_7\">\n",
       "     <g id=\"line2d_13\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 170.23034 224.64 \n",
       "L 170.23034 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_14\"/>\n",
       "     <g id=\"text_7\">\n",
       "      <!-- 5.8 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(162.278778 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-53\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-56\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_8\">\n",
       "     <g id=\"line2d_15\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 194.034988 224.64 \n",
       "L 194.034988 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_16\"/>\n",
       "     <g id=\"text_8\">\n",
       "      <!-- 6.0 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(186.083426 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-54\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-48\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_9\">\n",
       "     <g id=\"line2d_17\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 217.839637 224.64 \n",
       "L 217.839637 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_18\"/>\n",
       "     <g id=\"text_9\">\n",
       "      <!-- 6.2 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(209.888074 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-54\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-50\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_10\">\n",
       "     <g id=\"line2d_19\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 241.644285 224.64 \n",
       "L 241.644285 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_20\"/>\n",
       "     <g id=\"text_10\">\n",
       "      <!-- 6.4 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(233.692722 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-54\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-52\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_11\">\n",
       "     <g id=\"line2d_21\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 265.448933 224.64 \n",
       "L 265.448933 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_22\"/>\n",
       "     <g id=\"text_11\">\n",
       "      <!-- 6.6 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(257.497371 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-54\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-54\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_12\">\n",
       "     <g id=\"line2d_23\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 289.253581 224.64 \n",
       "L 289.253581 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_24\"/>\n",
       "     <g id=\"text_12\">\n",
       "      <!-- 6.8 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(281.302019 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-54\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-56\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_13\">\n",
       "     <g id=\"line2d_25\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 313.05823 224.64 \n",
       "L 313.05823 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_26\"/>\n",
       "     <g id=\"text_13\">\n",
       "      <!-- 7.0 -->\n",
       "      <defs>\n",
       "       <path d=\"M 8.203125 72.90625 \n",
       "L 55.078125 72.90625 \n",
       "L 55.078125 68.703125 \n",
       "L 28.609375 0 \n",
       "L 18.3125 0 \n",
       "L 43.21875 64.59375 \n",
       "L 8.203125 64.59375 \n",
       "z\n",
       "\" id=\"DejaVuSans-55\"/>\n",
       "      </defs>\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(305.106667 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-55\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-48\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"xtick_14\">\n",
       "     <g id=\"line2d_27\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 336.862878 224.64 \n",
       "L 336.862878 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_28\"/>\n",
       "     <g id=\"text_14\">\n",
       "      <!-- 7.2 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(328.911315 235.738437)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-55\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-46\"/>\n",
       "       <use x=\"95.410156\" xlink:href=\"#DejaVuSans-50\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "   </g>\n",
       "   <g id=\"matplotlib.axis_2\">\n",
       "    <g id=\"ytick_1\">\n",
       "     <g id=\"line2d_29\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 23.425 224.64 \n",
       "L 358.225 224.64 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_30\"/>\n",
       "     <g id=\"text_15\">\n",
       "      <!-- 0 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(13.5625 228.439219)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-48\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_2\">\n",
       "     <g id=\"line2d_31\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 23.425 195.056327 \n",
       "L 358.225 195.056327 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_32\"/>\n",
       "     <g id=\"text_16\">\n",
       "      <!-- 2 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(13.5625 198.855545)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-50\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_3\">\n",
       "     <g id=\"line2d_33\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 23.425 165.472653 \n",
       "L 358.225 165.472653 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_34\"/>\n",
       "     <g id=\"text_17\">\n",
       "      <!-- 4 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(13.5625 169.271872)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-52\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_4\">\n",
       "     <g id=\"line2d_35\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 23.425 135.88898 \n",
       "L 358.225 135.88898 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_36\"/>\n",
       "     <g id=\"text_18\">\n",
       "      <!-- 6 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(13.5625 139.688198)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-54\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_5\">\n",
       "     <g id=\"line2d_37\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 23.425 106.305306 \n",
       "L 358.225 106.305306 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_38\"/>\n",
       "     <g id=\"text_19\">\n",
       "      <!-- 8 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(13.5625 110.104525)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-56\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_6\">\n",
       "     <g id=\"line2d_39\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 23.425 76.721633 \n",
       "L 358.225 76.721633 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_40\"/>\n",
       "     <g id=\"text_20\">\n",
       "      <!-- 10 -->\n",
       "      <defs>\n",
       "       <path d=\"M 12.40625 8.296875 \n",
       "L 28.515625 8.296875 \n",
       "L 28.515625 63.921875 \n",
       "L 10.984375 60.40625 \n",
       "L 10.984375 69.390625 \n",
       "L 28.421875 72.90625 \n",
       "L 38.28125 72.90625 \n",
       "L 38.28125 8.296875 \n",
       "L 54.390625 8.296875 \n",
       "L 54.390625 0 \n",
       "L 12.40625 0 \n",
       "z\n",
       "\" id=\"DejaVuSans-49\"/>\n",
       "      </defs>\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(7.2 80.520851)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-49\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-48\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_7\">\n",
       "     <g id=\"line2d_41\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 23.425 47.137959 \n",
       "L 358.225 47.137959 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_42\"/>\n",
       "     <g id=\"text_21\">\n",
       "      <!-- 12 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(7.2 50.937178)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-49\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-50\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"ytick_8\">\n",
       "     <g id=\"line2d_43\">\n",
       "      <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 23.425 17.554286 \n",
       "L 358.225 17.554286 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:round;stroke-width:0.8;\"/>\n",
       "     </g>\n",
       "     <g id=\"line2d_44\"/>\n",
       "     <g id=\"text_22\">\n",
       "      <!-- 14 -->\n",
       "      <g style=\"fill:#262626;\" transform=\"translate(7.2 21.353504)scale(0.1 -0.1)\">\n",
       "       <use xlink:href=\"#DejaVuSans-49\"/>\n",
       "       <use x=\"63.623047\" xlink:href=\"#DejaVuSans-52\"/>\n",
       "      </g>\n",
       "     </g>\n",
       "    </g>\n",
       "   </g>\n",
       "   <g id=\"patch_3\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 38.643182 224.64 \n",
       "L 69.079545 224.64 \n",
       "L 69.079545 150.680816 \n",
       "L 38.643182 150.680816 \n",
       "z\n",
       "\" style=\"fill:#008000;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_4\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 69.079545 224.64 \n",
       "L 99.515909 224.64 \n",
       "L 99.515909 195.056327 \n",
       "L 69.079545 195.056327 \n",
       "z\n",
       "\" style=\"fill:#008000;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_5\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 99.515909 224.64 \n",
       "L 129.952273 224.64 \n",
       "L 129.952273 121.097143 \n",
       "L 99.515909 121.097143 \n",
       "z\n",
       "\" style=\"fill:#008000;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_6\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 129.952273 224.64 \n",
       "L 160.388636 224.64 \n",
       "L 160.388636 17.554286 \n",
       "L 129.952273 17.554286 \n",
       "z\n",
       "\" style=\"fill:#008000;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_7\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 160.388636 224.64 \n",
       "L 190.825 224.64 \n",
       "L 190.825 135.88898 \n",
       "L 160.388636 135.88898 \n",
       "z\n",
       "\" style=\"fill:#008000;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_8\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 190.825 224.64 \n",
       "L 221.261364 224.64 \n",
       "L 221.261364 91.513469 \n",
       "L 190.825 91.513469 \n",
       "z\n",
       "\" style=\"fill:#008000;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_9\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 221.261364 224.64 \n",
       "L 251.697727 224.64 \n",
       "L 251.697727 150.680816 \n",
       "L 221.261364 150.680816 \n",
       "z\n",
       "\" style=\"fill:#008000;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_10\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 251.697727 224.64 \n",
       "L 282.134091 224.64 \n",
       "L 282.134091 224.64 \n",
       "L 251.697727 224.64 \n",
       "z\n",
       "\" style=\"fill:#008000;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_11\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 282.134091 224.64 \n",
       "L 312.570455 224.64 \n",
       "L 312.570455 209.848163 \n",
       "L 282.134091 209.848163 \n",
       "z\n",
       "\" style=\"fill:#008000;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_12\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 312.570455 224.64 \n",
       "L 343.006818 224.64 \n",
       "L 343.006818 209.848163 \n",
       "L 312.570455 209.848163 \n",
       "z\n",
       "\" style=\"fill:#008000;\"/>\n",
       "   </g>\n",
       "   <g id=\"line2d_45\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 159.88072 224.64 \n",
       "L 159.88072 7.2 \n",
       "\" style=\"fill:none;stroke:#000000;stroke-dasharray:6.4,1.6,1,1.6;stroke-dashoffset:0;\"/>\n",
       "   </g>\n",
       "   <g id=\"line2d_46\">\n",
       "    <path clip-path=\"url(#pefbb7f0eee)\" d=\"M 140.712576 224.64 \n",
       "L 140.712576 7.2 \n",
       "\" style=\"fill:none;stroke:#000000;stroke-dasharray:1,1.65;stroke-dashoffset:0;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_13\">\n",
       "    <path d=\"M 23.425 224.64 \n",
       "L 23.425 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_14\">\n",
       "    <path d=\"M 358.225 224.64 \n",
       "L 358.225 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_15\">\n",
       "    <path d=\"M 23.425 224.64 \n",
       "L 358.225 224.64 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;\"/>\n",
       "   </g>\n",
       "   <g id=\"patch_16\">\n",
       "    <path d=\"M 23.425 7.2 \n",
       "L 358.225 7.2 \n",
       "\" style=\"fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;\"/>\n",
       "   </g>\n",
       "   <g id=\"legend_1\">\n",
       "    <g id=\"line2d_47\">\n",
       "     <path d=\"M 226.925 20.3 \n",
       "L 246.925 20.3 \n",
       "\" style=\"fill:none;stroke:#000000;stroke-dasharray:6.4,1.6,1,1.6;stroke-dashoffset:0;\"/>\n",
       "    </g>\n",
       "    <g id=\"line2d_48\"/>\n",
       "    <g id=\"text_23\">\n",
       "     <!-- sample mean $m$ -->\n",
       "     <defs>\n",
       "      <path d=\"M 44.28125 53.078125 \n",
       "L 44.28125 44.578125 \n",
       "Q 40.484375 46.53125 36.375 47.5 \n",
       "Q 32.28125 48.484375 27.875 48.484375 \n",
       "Q 21.1875 48.484375 17.84375 46.4375 \n",
       "Q 14.5 44.390625 14.5 40.28125 \n",
       "Q 14.5 37.15625 16.890625 35.375 \n",
       "Q 19.28125 33.59375 26.515625 31.984375 \n",
       "L 29.59375 31.296875 \n",
       "Q 39.15625 29.25 43.1875 25.515625 \n",
       "Q 47.21875 21.78125 47.21875 15.09375 \n",
       "Q 47.21875 7.46875 41.1875 3.015625 \n",
       "Q 35.15625 -1.421875 24.609375 -1.421875 \n",
       "Q 20.21875 -1.421875 15.453125 -0.5625 \n",
       "Q 10.6875 0.296875 5.421875 2 \n",
       "L 5.421875 11.28125 \n",
       "Q 10.40625 8.6875 15.234375 7.390625 \n",
       "Q 20.0625 6.109375 24.8125 6.109375 \n",
       "Q 31.15625 6.109375 34.5625 8.28125 \n",
       "Q 37.984375 10.453125 37.984375 14.40625 \n",
       "Q 37.984375 18.0625 35.515625 20.015625 \n",
       "Q 33.0625 21.96875 24.703125 23.78125 \n",
       "L 21.578125 24.515625 \n",
       "Q 13.234375 26.265625 9.515625 29.90625 \n",
       "Q 5.8125 33.546875 5.8125 39.890625 \n",
       "Q 5.8125 47.609375 11.28125 51.796875 \n",
       "Q 16.75 56 26.8125 56 \n",
       "Q 31.78125 56 36.171875 55.265625 \n",
       "Q 40.578125 54.546875 44.28125 53.078125 \n",
       "z\n",
       "\" id=\"DejaVuSans-115\"/>\n",
       "      <path d=\"M 34.28125 27.484375 \n",
       "Q 23.390625 27.484375 19.1875 25 \n",
       "Q 14.984375 22.515625 14.984375 16.5 \n",
       "Q 14.984375 11.71875 18.140625 8.90625 \n",
       "Q 21.296875 6.109375 26.703125 6.109375 \n",
       "Q 34.1875 6.109375 38.703125 11.40625 \n",
       "Q 43.21875 16.703125 43.21875 25.484375 \n",
       "L 43.21875 27.484375 \n",
       "z\n",
       "M 52.203125 31.203125 \n",
       "L 52.203125 0 \n",
       "L 43.21875 0 \n",
       "L 43.21875 8.296875 \n",
       "Q 40.140625 3.328125 35.546875 0.953125 \n",
       "Q 30.953125 -1.421875 24.3125 -1.421875 \n",
       "Q 15.921875 -1.421875 10.953125 3.296875 \n",
       "Q 6 8.015625 6 15.921875 \n",
       "Q 6 25.140625 12.171875 29.828125 \n",
       "Q 18.359375 34.515625 30.609375 34.515625 \n",
       "L 43.21875 34.515625 \n",
       "L 43.21875 35.40625 \n",
       "Q 43.21875 41.609375 39.140625 45 \n",
       "Q 35.0625 48.390625 27.6875 48.390625 \n",
       "Q 23 48.390625 18.546875 47.265625 \n",
       "Q 14.109375 46.140625 10.015625 43.890625 \n",
       "L 10.015625 52.203125 \n",
       "Q 14.9375 54.109375 19.578125 55.046875 \n",
       "Q 24.21875 56 28.609375 56 \n",
       "Q 40.484375 56 46.34375 49.84375 \n",
       "Q 52.203125 43.703125 52.203125 31.203125 \n",
       "z\n",
       "\" id=\"DejaVuSans-97\"/>\n",
       "      <path d=\"M 52 44.1875 \n",
       "Q 55.375 50.25 60.0625 53.125 \n",
       "Q 64.75 56 71.09375 56 \n",
       "Q 79.640625 56 84.28125 50.015625 \n",
       "Q 88.921875 44.046875 88.921875 33.015625 \n",
       "L 88.921875 0 \n",
       "L 79.890625 0 \n",
       "L 79.890625 32.71875 \n",
       "Q 79.890625 40.578125 77.09375 44.375 \n",
       "Q 74.3125 48.1875 68.609375 48.1875 \n",
       "Q 61.625 48.1875 57.5625 43.546875 \n",
       "Q 53.515625 38.921875 53.515625 30.90625 \n",
       "L 53.515625 0 \n",
       "L 44.484375 0 \n",
       "L 44.484375 32.71875 \n",
       "Q 44.484375 40.625 41.703125 44.40625 \n",
       "Q 38.921875 48.1875 33.109375 48.1875 \n",
       "Q 26.21875 48.1875 22.15625 43.53125 \n",
       "Q 18.109375 38.875 18.109375 30.90625 \n",
       "L 18.109375 0 \n",
       "L 9.078125 0 \n",
       "L 9.078125 54.6875 \n",
       "L 18.109375 54.6875 \n",
       "L 18.109375 46.1875 \n",
       "Q 21.1875 51.21875 25.484375 53.609375 \n",
       "Q 29.78125 56 35.6875 56 \n",
       "Q 41.65625 56 45.828125 52.96875 \n",
       "Q 50 49.953125 52 44.1875 \n",
       "z\n",
       "\" id=\"DejaVuSans-109\"/>\n",
       "      <path d=\"M 18.109375 8.203125 \n",
       "L 18.109375 -20.796875 \n",
       "L 9.078125 -20.796875 \n",
       "L 9.078125 54.6875 \n",
       "L 18.109375 54.6875 \n",
       "L 18.109375 46.390625 \n",
       "Q 20.953125 51.265625 25.265625 53.625 \n",
       "Q 29.59375 56 35.59375 56 \n",
       "Q 45.5625 56 51.78125 48.09375 \n",
       "Q 58.015625 40.1875 58.015625 27.296875 \n",
       "Q 58.015625 14.40625 51.78125 6.484375 \n",
       "Q 45.5625 -1.421875 35.59375 -1.421875 \n",
       "Q 29.59375 -1.421875 25.265625 0.953125 \n",
       "Q 20.953125 3.328125 18.109375 8.203125 \n",
       "z\n",
       "M 48.6875 27.296875 \n",
       "Q 48.6875 37.203125 44.609375 42.84375 \n",
       "Q 40.53125 48.484375 33.40625 48.484375 \n",
       "Q 26.265625 48.484375 22.1875 42.84375 \n",
       "Q 18.109375 37.203125 18.109375 27.296875 \n",
       "Q 18.109375 17.390625 22.1875 11.75 \n",
       "Q 26.265625 6.109375 33.40625 6.109375 \n",
       "Q 40.53125 6.109375 44.609375 11.75 \n",
       "Q 48.6875 17.390625 48.6875 27.296875 \n",
       "z\n",
       "\" id=\"DejaVuSans-112\"/>\n",
       "      <path d=\"M 9.421875 75.984375 \n",
       "L 18.40625 75.984375 \n",
       "L 18.40625 0 \n",
       "L 9.421875 0 \n",
       "z\n",
       "\" id=\"DejaVuSans-108\"/>\n",
       "      <path d=\"M 56.203125 29.59375 \n",
       "L 56.203125 25.203125 \n",
       "L 14.890625 25.203125 \n",
       "Q 15.484375 15.921875 20.484375 11.0625 \n",
       "Q 25.484375 6.203125 34.421875 6.203125 \n",
       "Q 39.59375 6.203125 44.453125 7.46875 \n",
       "Q 49.3125 8.734375 54.109375 11.28125 \n",
       "L 54.109375 2.78125 \n",
       "Q 49.265625 0.734375 44.1875 -0.34375 \n",
       "Q 39.109375 -1.421875 33.890625 -1.421875 \n",
       "Q 20.796875 -1.421875 13.15625 6.1875 \n",
       "Q 5.515625 13.8125 5.515625 26.8125 \n",
       "Q 5.515625 40.234375 12.765625 48.109375 \n",
       "Q 20.015625 56 32.328125 56 \n",
       "Q 43.359375 56 49.78125 48.890625 \n",
       "Q 56.203125 41.796875 56.203125 29.59375 \n",
       "z\n",
       "M 47.21875 32.234375 \n",
       "Q 47.125 39.59375 43.09375 43.984375 \n",
       "Q 39.0625 48.390625 32.421875 48.390625 \n",
       "Q 24.90625 48.390625 20.390625 44.140625 \n",
       "Q 15.875 39.890625 15.1875 32.171875 \n",
       "z\n",
       "\" id=\"DejaVuSans-101\"/>\n",
       "      <path id=\"DejaVuSans-32\"/>\n",
       "      <path d=\"M 54.890625 33.015625 \n",
       "L 54.890625 0 \n",
       "L 45.90625 0 \n",
       "L 45.90625 32.71875 \n",
       "Q 45.90625 40.484375 42.875 44.328125 \n",
       "Q 39.84375 48.1875 33.796875 48.1875 \n",
       "Q 26.515625 48.1875 22.3125 43.546875 \n",
       "Q 18.109375 38.921875 18.109375 30.90625 \n",
       "L 18.109375 0 \n",
       "L 9.078125 0 \n",
       "L 9.078125 54.6875 \n",
       "L 18.109375 54.6875 \n",
       "L 18.109375 46.1875 \n",
       "Q 21.34375 51.125 25.703125 53.5625 \n",
       "Q 30.078125 56 35.796875 56 \n",
       "Q 45.21875 56 50.046875 50.171875 \n",
       "Q 54.890625 44.34375 54.890625 33.015625 \n",
       "z\n",
       "\" id=\"DejaVuSans-110\"/>\n",
       "      <path d=\"M 89.796875 33.015625 \n",
       "L 83.40625 0 \n",
       "L 74.421875 0 \n",
       "L 80.71875 32.71875 \n",
       "Q 81.109375 34.8125 81.296875 36.328125 \n",
       "Q 81.5 37.84375 81.5 38.921875 \n",
       "Q 81.5 43.3125 79.046875 45.75 \n",
       "Q 76.609375 48.1875 72.21875 48.1875 \n",
       "Q 65.671875 48.1875 60.546875 43.28125 \n",
       "Q 55.421875 38.375 53.90625 30.515625 \n",
       "L 47.90625 0 \n",
       "L 38.921875 0 \n",
       "L 45.3125 32.71875 \n",
       "Q 45.703125 34.515625 45.890625 36.046875 \n",
       "Q 46.09375 37.59375 46.09375 38.8125 \n",
       "Q 46.09375 43.265625 43.65625 45.71875 \n",
       "Q 41.21875 48.1875 36.921875 48.1875 \n",
       "Q 30.28125 48.1875 25.140625 43.28125 \n",
       "Q 20.015625 38.375 18.5 30.515625 \n",
       "L 12.5 0 \n",
       "L 3.515625 0 \n",
       "L 14.203125 54.6875 \n",
       "L 23.1875 54.6875 \n",
       "L 21.484375 46.1875 \n",
       "Q 25.140625 50.984375 30.046875 53.484375 \n",
       "Q 34.96875 56 40.578125 56 \n",
       "Q 46.53125 56 50.359375 52.875 \n",
       "Q 54.203125 49.75 54.984375 44.1875 \n",
       "Q 59.078125 49.953125 64.46875 52.96875 \n",
       "Q 69.875 56 75.875 56 \n",
       "Q 82.90625 56 86.734375 51.953125 \n",
       "Q 90.578125 47.90625 90.578125 40.484375 \n",
       "Q 90.578125 38.875 90.375 36.9375 \n",
       "Q 90.1875 35.015625 89.796875 33.015625 \n",
       "z\n",
       "\" id=\"DejaVuSans-Oblique-109\"/>\n",
       "     </defs>\n",
       "     <g style=\"fill:#262626;\" transform=\"translate(254.925 23.8)scale(0.1 -0.1)\">\n",
       "      <use transform=\"translate(0 0.015625)\" xlink:href=\"#DejaVuSans-115\"/>\n",
       "      <use transform=\"translate(52.099609 0.015625)\" xlink:href=\"#DejaVuSans-97\"/>\n",
       "      <use transform=\"translate(113.378906 0.015625)\" xlink:href=\"#DejaVuSans-109\"/>\n",
       "      <use transform=\"translate(210.791016 0.015625)\" xlink:href=\"#DejaVuSans-112\"/>\n",
       "      <use transform=\"translate(274.267578 0.015625)\" xlink:href=\"#DejaVuSans-108\"/>\n",
       "      <use transform=\"translate(302.050781 0.015625)\" xlink:href=\"#DejaVuSans-101\"/>\n",
       "      <use transform=\"translate(363.574219 0.015625)\" xlink:href=\"#DejaVuSans-32\"/>\n",
       "      <use transform=\"translate(395.361328 0.015625)\" xlink:href=\"#DejaVuSans-109\"/>\n",
       "      <use transform=\"translate(492.773438 0.015625)\" xlink:href=\"#DejaVuSans-101\"/>\n",
       "      <use transform=\"translate(554.296875 0.015625)\" xlink:href=\"#DejaVuSans-97\"/>\n",
       "      <use transform=\"translate(615.576172 0.015625)\" xlink:href=\"#DejaVuSans-110\"/>\n",
       "      <use transform=\"translate(678.955078 0.015625)\" xlink:href=\"#DejaVuSans-32\"/>\n",
       "      <use transform=\"translate(710.742188 0.015625)\" xlink:href=\"#DejaVuSans-Oblique-109\"/>\n",
       "     </g>\n",
       "    </g>\n",
       "    <g id=\"line2d_49\">\n",
       "     <path d=\"M 226.925 35 \n",
       "L 246.925 35 \n",
       "\" style=\"fill:none;stroke:#000000;stroke-dasharray:1,1.65;stroke-dashoffset:0;\"/>\n",
       "    </g>\n",
       "    <g id=\"line2d_50\"/>\n",
       "    <g id=\"text_24\">\n",
       "     <!-- population mean $\\mu$ -->\n",
       "     <defs>\n",
       "      <path d=\"M 30.609375 48.390625 \n",
       "Q 23.390625 48.390625 19.1875 42.75 \n",
       "Q 14.984375 37.109375 14.984375 27.296875 \n",
       "Q 14.984375 17.484375 19.15625 11.84375 \n",
       "Q 23.34375 6.203125 30.609375 6.203125 \n",
       "Q 37.796875 6.203125 41.984375 11.859375 \n",
       "Q 46.1875 17.53125 46.1875 27.296875 \n",
       "Q 46.1875 37.015625 41.984375 42.703125 \n",
       "Q 37.796875 48.390625 30.609375 48.390625 \n",
       "z\n",
       "M 30.609375 56 \n",
       "Q 42.328125 56 49.015625 48.375 \n",
       "Q 55.71875 40.765625 55.71875 27.296875 \n",
       "Q 55.71875 13.875 49.015625 6.21875 \n",
       "Q 42.328125 -1.421875 30.609375 -1.421875 \n",
       "Q 18.84375 -1.421875 12.171875 6.21875 \n",
       "Q 5.515625 13.875 5.515625 27.296875 \n",
       "Q 5.515625 40.765625 12.171875 48.375 \n",
       "Q 18.84375 56 30.609375 56 \n",
       "z\n",
       "\" id=\"DejaVuSans-111\"/>\n",
       "      <path d=\"M 8.5 21.578125 \n",
       "L 8.5 54.6875 \n",
       "L 17.484375 54.6875 \n",
       "L 17.484375 21.921875 \n",
       "Q 17.484375 14.15625 20.5 10.265625 \n",
       "Q 23.53125 6.390625 29.59375 6.390625 \n",
       "Q 36.859375 6.390625 41.078125 11.03125 \n",
       "Q 45.3125 15.671875 45.3125 23.6875 \n",
       "L 45.3125 54.6875 \n",
       "L 54.296875 54.6875 \n",
       "L 54.296875 0 \n",
       "L 45.3125 0 \n",
       "L 45.3125 8.40625 \n",
       "Q 42.046875 3.421875 37.71875 1 \n",
       "Q 33.40625 -1.421875 27.6875 -1.421875 \n",
       "Q 18.265625 -1.421875 13.375 4.4375 \n",
       "Q 8.5 10.296875 8.5 21.578125 \n",
       "z\n",
       "M 31.109375 56 \n",
       "z\n",
       "\" id=\"DejaVuSans-117\"/>\n",
       "      <path d=\"M 18.3125 70.21875 \n",
       "L 18.3125 54.6875 \n",
       "L 36.8125 54.6875 \n",
       "L 36.8125 47.703125 \n",
       "L 18.3125 47.703125 \n",
       "L 18.3125 18.015625 \n",
       "Q 18.3125 11.328125 20.140625 9.421875 \n",
       "Q 21.96875 7.515625 27.59375 7.515625 \n",
       "L 36.8125 7.515625 \n",
       "L 36.8125 0 \n",
       "L 27.59375 0 \n",
       "Q 17.1875 0 13.234375 3.875 \n",
       "Q 9.28125 7.765625 9.28125 18.015625 \n",
       "L 9.28125 47.703125 \n",
       "L 2.6875 47.703125 \n",
       "L 2.6875 54.6875 \n",
       "L 9.28125 54.6875 \n",
       "L 9.28125 70.21875 \n",
       "z\n",
       "\" id=\"DejaVuSans-116\"/>\n",
       "      <path d=\"M 9.421875 54.6875 \n",
       "L 18.40625 54.6875 \n",
       "L 18.40625 0 \n",
       "L 9.421875 0 \n",
       "z\n",
       "M 9.421875 75.984375 \n",
       "L 18.40625 75.984375 \n",
       "L 18.40625 64.59375 \n",
       "L 9.421875 64.59375 \n",
       "z\n",
       "\" id=\"DejaVuSans-105\"/>\n",
       "      <path d=\"M -1.3125 -20.796875 \n",
       "L 13.375 54.6875 \n",
       "L 22.40625 54.6875 \n",
       "L 15.765625 20.65625 \n",
       "Q 15.578125 19.625 15.421875 18.359375 \n",
       "Q 15.28125 17.09375 15.28125 15.828125 \n",
       "Q 15.28125 11.28125 18.140625 8.828125 \n",
       "Q 21 6.390625 26.3125 6.390625 \n",
       "Q 33.546875 6.390625 37.984375 10.484375 \n",
       "Q 42.4375 14.59375 44 22.796875 \n",
       "L 50.203125 54.6875 \n",
       "L 59.1875 54.6875 \n",
       "L 51.03125 12.640625 \n",
       "Q 50.828125 11.71875 50.75 11.03125 \n",
       "Q 50.6875 10.359375 50.6875 9.8125 \n",
       "Q 50.6875 8.296875 51.296875 7.59375 \n",
       "Q 51.90625 6.890625 53.21875 6.890625 \n",
       "Q 53.71875 6.890625 54.5625 7.125 \n",
       "Q 55.421875 7.375 56.984375 8.015625 \n",
       "L 55.609375 0.78125 \n",
       "Q 53.46875 -0.296875 51.515625 -0.859375 \n",
       "Q 49.5625 -1.421875 47.703125 -1.421875 \n",
       "Q 44.484375 -1.421875 42.65625 0.625 \n",
       "Q 40.828125 2.6875 40.828125 6.296875 \n",
       "Q 38.09375 2.390625 34.296875 0.484375 \n",
       "Q 30.515625 -1.421875 25.390625 -1.421875 \n",
       "Q 20.84375 -1.421875 17.453125 0.671875 \n",
       "Q 14.0625 2.78125 12.984375 6.203125 \n",
       "L 7.71875 -20.796875 \n",
       "z\n",
       "\" id=\"DejaVuSans-Oblique-956\"/>\n",
       "     </defs>\n",
       "     <g style=\"fill:#262626;\" transform=\"translate(254.925 38.5)scale(0.1 -0.1)\">\n",
       "      <use transform=\"translate(0 0.015625)\" xlink:href=\"#DejaVuSans-112\"/>\n",
       "      <use transform=\"translate(63.476562 0.015625)\" xlink:href=\"#DejaVuSans-111\"/>\n",
       "      <use transform=\"translate(124.658203 0.015625)\" xlink:href=\"#DejaVuSans-112\"/>\n",
       "      <use transform=\"translate(188.134766 0.015625)\" xlink:href=\"#DejaVuSans-117\"/>\n",
       "      <use transform=\"translate(251.513672 0.015625)\" xlink:href=\"#DejaVuSans-108\"/>\n",
       "      <use transform=\"translate(279.296875 0.015625)\" xlink:href=\"#DejaVuSans-97\"/>\n",
       "      <use transform=\"translate(340.576172 0.015625)\" xlink:href=\"#DejaVuSans-116\"/>\n",
       "      <use transform=\"translate(379.785156 0.015625)\" xlink:href=\"#DejaVuSans-105\"/>\n",
       "      <use transform=\"translate(407.568359 0.015625)\" xlink:href=\"#DejaVuSans-111\"/>\n",
       "      <use transform=\"translate(468.75 0.015625)\" xlink:href=\"#DejaVuSans-110\"/>\n",
       "      <use transform=\"translate(532.128906 0.015625)\" xlink:href=\"#DejaVuSans-32\"/>\n",
       "      <use transform=\"translate(563.916016 0.015625)\" xlink:href=\"#DejaVuSans-109\"/>\n",
       "      <use transform=\"translate(661.328125 0.015625)\" xlink:href=\"#DejaVuSans-101\"/>\n",
       "      <use transform=\"translate(722.851562 0.015625)\" xlink:href=\"#DejaVuSans-97\"/>\n",
       "      <use transform=\"translate(784.130859 0.015625)\" xlink:href=\"#DejaVuSans-110\"/>\n",
       "      <use transform=\"translate(847.509766 0.015625)\" xlink:href=\"#DejaVuSans-32\"/>\n",
       "      <use transform=\"translate(879.296875 0.015625)\" xlink:href=\"#DejaVuSans-Oblique-956\"/>\n",
       "     </g>\n",
       "    </g>\n",
       "   </g>\n",
       "  </g>\n",
       " </g>\n",
       " <defs>\n",
       "  <clipPath id=\"pefbb7f0eee\">\n",
       "   <rect height=\"217.44\" width=\"334.8\" x=\"23.425\" y=\"7.2\"/>\n",
       "  </clipPath>\n",
       " </defs>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot the histogram representing the distribution of the samples\n",
    "plt.hist(sample_data[\"petal_length\"], color=\"green\")\n",
    "plt.xticks(np.arange(4.6, 7.2, 0.2))\n",
    "\n",
    "# Add a vertical line for the sample mean\n",
    "plt.axvline(x=sample_mean, color='black', linestyle='-.', linewidth=1, label=\"sample mean $m$\")\n",
    "\n",
    "# Add a vertical line for the population mean\n",
    "plt.axvline(x=mu, color='black', linestyle=':', linewidth=1, label=\"population mean $\\mu$\")\n",
    "\n",
    "# Add a legend\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    From the graph above, how many irises from the Vullierens sample have a petal length between 4.7 and 4.95 cm?<br/>\n",
    "    How is the mean petal length of the Vullierens sample represented? And the mean of the Anderson population?<br/>\n",
    "    How close are they to each other?\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can check your answer by clicking on the \"...\" below.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    The irises with a petal length between 4.7 and 4.95 cm are represented by the first bar of the histogram (counting from the left) and we can read on the vertical axis that there are 5 irises represented in this bar.<br/>\n",
    "    According to the legend, the mean petal length of the Vullierens sample is represented by a vertical dash-dotted line (-&centerdot;-&centerdot;-) and the mean of the Anderson population by a vertical dotted line (&centerdot;&centerdot;&centerdot;&centerdot;&centerdot;).<br/>\n",
    "    These two means seem to be quite close to each other, with a difference of around 0.15 cm.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Interpretation and hypothesis\n",
    "\n",
    "The simple analyses we have made so far allow us to have a preliminary idea about how the Irises from Vullierens compare to those observed by Anderson. One feature to look at for the comparison is their respective mean petal length. We see above that the mean petal length $m$ of the Vullierens sample is quite close to the mean $\\mu$ reported by Anderson. However, we also see that there is some variability in our sample, meaning that some irises in our sample actually have a petal length quite far from that of the Anderson population. So are the two means really that close to each other?\n",
    "\n",
    "Let's formulate this as an **hypothesis** which we state as: the sample mean $m$ is similar to the mean  of the reference population $\\mu$, which we will note $m = \\mu$ (in this notation, the equal symbol should not be interpreted literally). This hypothesis is noted $H_0$ and called the \"null\" hypothesis because it states that there is no difference between the sample and the population. \n",
    "The \"alternate\" hypothesis $H_a$ is that the sample mean is not similar to the mean of the reference population, $m \\neq \\mu$.\n",
    "\n",
    "How can we test our hypothesis? In the following, we use a **statistical test** to answer this question.\n",
    "\n",
    "&nbsp;\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Testing our hypothesis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In our hypothesis we compare the mean of one sample to a reference value. To test this hypothesis we can use a statistical test called a **one-sample t-test**.  \n",
    "\n",
    "But what does it mean when we test the hypothesis that a sample mean is potentially equal to a given value?  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sample versus population\n",
    "\n",
    "<div style=\"width:500px;float:right;margin-left:15px;align:center;\">\n",
    "<img src=\"figs/diagram-samples.png\" style=\"width:300px;margin-left:auto;margin-right:auto;display:block;\"/>\n",
    "<span style=\"width:500px;display:block;text-align:center;\">Figure 1. Population and samples.</span>\n",
    "<img src=\"figs/diagram-normalcurveAB.png\" style=\"display:block;margin-top:20px;\"/>\n",
    "<span style=\"width:500px;display:block;text-align:center;\">Figure 2. Distribution of the means of all possible samples coming from a given population (Anderson's population in this case).</span>\n",
    "</div>\n",
    "\n",
    "To understand this, it is useful to start by thinking about a population, in this case our population of Irises which has a mean petal length of $\\mu = 5.552$ cm, illustrated by the big black circle on Figure 1 on the right.\n",
    "\n",
    "Now imagine you take a sample of (i.e. a subset of), say, 50 flowers from this population, represented by the green circle on Figure 1. The mean petal length of this sample is $m_1 = 6.234$ cm. You then take a second sample of 50 flowers (another subset, in blue on Figure 1), which ends up having a mean petal length of $m_2 = 5.874$ cm.  You then take a third sample of 50 which gives you a mean petal length of $m_3 = 5.349$ cm, in yellow on Figure 1.\n",
    "\n",
    "If you keep taking samples from this population, you will start to notice a pattern: while some of the samples will give a mean petal length which is not at all close to the population mean, most of the mean petal lengths are reasonably close to the population mean of 5.552 cm. Furthermore, the mean of the mean petal length of the samples will be the same as that of the population as a whole i.e. 5.552 cm.  \n",
    "\n",
    "In fact, if we keep taking samples from this population, it turns out that the distribution of the mean of these samples will take a very particular pattern that looks like a normal curve, as illustrated by Figure 2 on the right. Actually, if you take bigger sample sizes (say 130 instead of 50) the distribution will get closer and closer to being a normal curve for which the mean is equal to the mean of the population. For these smaller samples, the distribution is called the **[Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution)** (actually it is a family of distributions, which depend on the sample size).\n",
    "\n",
    "\n",
    "This is useful because it allows us to rephrase our question as to how similar or different our sample from Vullierens Castle is to the population of Irises as described by Edgar Anderson. \n",
    "**What we have from the Vullierens Castle is a sample**. We want to know if it is a sample that might have come from a population like that described by Edgar Anderson. We now know the shape (more or less a normal distribution) and the mean (5.552 cm) of all of the samples that could be taken from the population described by Edgar Anderson. **So our question becomes \"where does our sample fall on the distribution of all such sample means?\"**.  \n",
    "If our mean is in position A on the figure on the right, then it is plausible that our sample came from a population like that of Edgar Anderson.  If our mean is in position B, then it is less plausible to believe that our sample came from a population like Anderson’s.\n",
    "\n",
    "<div style=\"clear:both;\"></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Significance level and cutoff point\n",
    "\n",
    "<div style=\"width:500px;float:right;margin-left:15px;align:center;\">\n",
    "<img src=\"figs/diagram-normalcurveAlpha.png\" style=\"display:block;margin-top:25px;\"/>\n",
    "<span style=\"width:500px;display:block;text-align:center;\">Figure 3. Distribution of the means of of all possible samples coming from Anderson's population with zones defined by the significance level $\\alpha=0.05$.</span><br/>\n",
    "</div>\n",
    "\n",
    "\n",
    "You might be wondering, how far away is far enough away for us to think it is implausible that our sample comes from a population like Anderson’s. The answer is, it depends on how sure you want to be.  \n",
    "\n",
    "One common answer to this question is to be 95% sure - meaning that a sample mean would need to be in the most extreme 5% of cases before we would think it is implausible that our sample comes from a population like Anderson’s. This value of 5% is called **significance level** and it is noted $\\alpha$, with $\\alpha=0.05$. These most extreme 5% cases are represented by the zones in light blue on Figure 3. If the sample mean falls into these most extreme zones, we say that *the difference is \"statistically significant\"*.\n",
    "\n",
    "A second, common answer is 99% sure meaning that a sample mean would need to be in the most extreme 1% of cases before we would think it is implausible that our sample comes from a population like Anderson’s ($\\alpha=0.01$).   \n",
    "\n",
    "In the following, **we will work on the basis of being 95% sure**.<br/>\n",
    "Let's define our significance level $\\alpha=0.05$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define alpha at 0.05\n",
    "alpha05 = 0.05\n",
    "\n",
    "# Display alpha\n",
    "alpha05"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    In the code cell below, create another variable called <code>alpha01</code> to define a significance level of $\\alpha = 0.01$.\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can check your answer by clicking on the \"...\" below.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define alpha at 0.01\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    The cell below defines the variable <code>alpha01</code> and displays it.\n",
    "</div>\n",
    "\n",
    "<pre>\n",
    "# Define alpha at 0.01\n",
    "alpha01 = 0.01\n",
    "\n",
    "# Display alpha\n",
    "alpha01\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If our distribution of sample means is a normal curve then we know that the most extreme 5% of sample means are found above or below ±1.96 standard deviations above and below the mean. In our case, because our sample size is less than 130 (it is 50), our distribution is close to normal but not quite normal. \n",
    "In this case, it is possible to find out the relevant cut off point from [looking it up in statistical tables](https://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values): for a sample size of 50, the most extreme 5% of cases are found above or below approximately 2.01 standard deviations from the mean. \n",
    "\n",
    "The good news is that **Python gives us automatically the value of the cutoff point** based on the value of the significance level $\\alpha$ chosen and the sample size, thanks to the `stats` library which offers useful functions related to many statistical distributions such as Student's t:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the cutoff point for alpha at 0.05\n",
    "cutoff05 = stats.t.isf(alpha05 / 2, sample_size)\n",
    "\n",
    "# Display cutoff\n",
    "cutoff05"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    How would you get the value of the cutoff point for the significance level $\\alpha = 0.01$?<br/>\n",
    "    Type and test your code using the cell below.\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can check your answer by clicking on the \"...\" below.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the cutoff point for alpha at 0.01\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    To get the value of the cutoff point for the significance level of $\\alpha = 0.01$, we can copy the first line of code <code>stats.t.isf(alpha05 / 2, sample_size)</code> and replace <code>alpha05</code> with the variable <code>alpha01</code> that we have previously defined.<br/>\n",
    "    To save the value of this new cutoff point for later, it is good to store it in a new variable <code>cutoff01</code>.<br/>\n",
    "    See the solution code below.\n",
    "</div>\n",
    "\n",
    "<pre>\n",
    "# Get the cutoff point for alpha at 0.01\n",
    "cutoff01 = stats.t.isf(alpha01 / 2, sample_size)\n",
    "\n",
    "# Display cutoff\n",
    "cutoff01\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Error in the distribution of means\n",
    "\n",
    "So far we know a lot that will help us to test the hypothesis that our sample mean is similar to Anderson’s population mean. We know:\n",
    "* Our sample mean $m$\n",
    "* The population mean $\\mu$\n",
    "* The shape of the distribution of the mean of all samples that would come from this population (a normal curve, centred on the population mean)\n",
    "* Our cut off point defined by $\\alpha$ (the most extreme 5% of cases, above or below 2.01 standard deviations from the mean)\n",
    "\n",
    "The last piece of information missing that would enable us to test this hypothesis is the size of the standard deviation of the distribution of sample means from Anderson’s population. \n",
    "It turns out that a good guess for the size of this standard deviation can be obtained from knowing the standard deviation of our sample.\n",
    "If $s$ is the sample standard deviation of our sample and $n$ is the sample size, then the standard deviation of the distribution of sample means is:\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\sigma_{\\overline{X}} = \\frac{s}{\\sqrt{n}}\n",
    "\\end{align}\n",
    "$ \n",
    "\n",
    "This standard deviation of the distribution of sample means is called the **\"standard error of the mean\" (also noted SEM)**.  \n",
    "We can compute it by using the sample size and the standard deviation from the descriptive stats we have computed earlier:  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Extract the sample standard deviation from the descriptive stats\n",
    "sample_std = sample_stats.loc[\"std\",\"petal_length\"]\n",
    "\n",
    "# Compute the estimation of the standard deviation of sample means from Anderson's population (standard error)\n",
    "sem = sample_std / math.sqrt(sample_size)\n",
    "\n",
    "# Display the standard error\n",
    "sem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    In the code above, what function is used to compute the square root of the sample size, $\\sqrt{n}$?<br/>\n",
    "    How would you compute the square root of 2?<br/>\n",
    "    Type and test your code using the cell below.\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can check your answer by clicking on the \"...\" below.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute the square root of 2 and display the result\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    We see in the code making the calculation of the standard error of the mean (<code>sem</code>) above that the way to get $\\sqrt{n}$ in Python is <code>math.sqrt(sample_size)</code>.<br/>\n",
    "    Therefore we can replace <code>sample_size</code> by <code>2</code> to get $\\sqrt{2}$.<br/>\n",
    "    See the solution code below.\n",
    "</div>\n",
    "\n",
    "<pre>\n",
    "# Compute the square root of 2\n",
    "sqrt2 = math.sqrt(2)\n",
    "\n",
    "# Display the result\n",
    "sqrt2\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comparison, and definition of the *t* statistics\n",
    "\n",
    "We can now restate our question in more precise terms: **\"is our sample mean in the most extreme 5% of samples that would be drawn from a population with the same mean as Anderson’s population?\"**.  \n",
    "Or to be even more precise, **\"is the gap between our sample mean and Anderson’s population mean greater than 2.01 times the standard error of the mean?\"**. \n",
    "\n",
    "This would be equivalent to compare\n",
    "$\n",
    "\\begin{align}\n",
    "\\frac{m - \\mu}{\\sigma_{\\overline{X}}}\n",
    "\\end{align}\n",
    "$\n",
    "to our cutoff point of 2.01. \n",
    "\n",
    "That is the **definition of the *t* statistics**: the value $t = $\n",
    "$\n",
    "\\begin{align}\n",
    "\\frac{m - \\mu}{\\sigma_{\\overline{X}}}\n",
    "\\end{align}\n",
    "$ \n",
    " has to be compared to the cutoff point we have chosen to determine if the sample mean falls into the most extreme zones and to be able to say whether the difference is statistically significant or not.<br/>\n",
    "Let's compute $t$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute the t statistics:\n",
    "t = (sample_mean - mu) / sem\n",
    "\n",
    "# Display t\n",
    "t"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can compare $t$ to our cutoff point.  \n",
    "\n",
    "One issue here is that **when $m$ is smaller than $\\mu$, the value of $t$ can be negative**. This is because, just like for the Normal distribution, Student's t-distribution is symmetrical and centred on zero, zero meaning there is no difference between the mean of the sample and the mean of the population. So when comparing $t$ to the cutoff point, either we take its absolute value, which is what we do below, or if $t$ is negative we compare it to the negative value of the cutoff point (i.e. -2.01 for a significance level of 0.05)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compare t with our cutoff point\n",
    "if abs(t) > cutoff05: \n",
    "    print(\"The difference IS statistically significant.\")\n",
    "else: \n",
    "    print(\"The difference is NOT statistically significant.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see in the results above that for our Vullierens sample $|t| < 2.01$, therefore the difference between the two means is not greater than 2.01 times the standard error. In other words, **our sample mean is NOT in the most extremes 5%** of samples that would be drawn from a population with the same mean as Anderson's population.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    How would you compare $|t|$ to the cutoff point corresponding to a significance level of $\\alpha = 0.01$?<br/>\n",
    "    Type and test your code using the cell below.\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can check your answer by clicking on the \"...\" below.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compare t to the cutoff point for alpha=0.01\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    To compare the absolute value of $t$ to the cutoff point corresponding to $\\alpha = 0.01$, we can simply replace <code>cutoff05</code> in the code above by the variable <code>cutoff01</code> we have defined earlier with the appropriate value for the cutoff point. See the solution code below.<br/>\n",
    "    In this case, the comparison would tell us if our sample mean is in the most extremes 1% of samples that would be drawn from a population with the same mean as Anderson's population. Since we already know that our sample mean is NOT in the most extremes 5%, it is no surprise that it is not either in the most extremes 1%.  \n",
    "</div>\n",
    "\n",
    "<pre>\n",
    "# Compare t to the cutoff point for alpha=0.01\n",
    "if abs(t) > cutoff01: \n",
    "    print(\"The difference IS statistically significant.\")\n",
    "else: \n",
    "    print(\"The difference is NOT statistically significant.\")\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The statistical test we have just performed here, where we compare our sample mean to the mean of a population, is called a **one-sample t-test**: *one-sample* because we compare a sample to the mean of a population, and *t-test* because the distribution of all the possible sample means of the population follows a distribution called *Student's t-distribution*. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualization of *t*\n",
    "\n",
    "Using Python we can visualize what the t-test means graphically by plotting the t-distribution of all the possible sample means that would be drawn from a population with the same mean as Anderson's population and showing where `t` is in the distribution compared to the zone defined by our $\\alpha$ of 5%.\n",
    "\n",
    "It the *t* statistics falls outside of the rejection zone defined by $\\alpha$, then that means that the difference between our sample mean and the population mean is not statistically significant. If it falls into the rejection zone, then the difference is statistically significant and the sample should not be considered as coming from the Anderson population under the significance level we have chosen.\n",
    "\n",
    "The cell below uses an external library to generate a graphical visualization of the result of the t-test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize graphically the result of the t-test with alpha at 0.05\n",
    "visualize_ttest(sample_size, alpha05, t)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    What happens to the rejection zone in red on the figure when we choose an $\\alpha$ of 1%?<br/>\n",
    "    Type and test your code using the cell below.\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can <a href=\"./solution/StatisticsNotebook-solution.ipynb\">check your answer here</a>.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize graphically the result of the t-test with alpha at 0.01\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    To visualize the rejection zone for an $\\alpha$ of 1%, we can simply replace <code>alpha05</code> in the code above by the variable <code>alpha01</code> we have defined earlier. See the solution code below.<br/>\n",
    "    By comparing the two visualizations, we see that the rejection zone for $\\alpha=0.01$ is much smaller than for $\\alpha=0.05$, which means we want to reject only samples that have a mean extremely different from the mean of the Anderson population.\n",
    "</div>\n",
    "\n",
    "<pre>\n",
    "# Visualize graphically the result of the t-test with alpha at 0.01\n",
    "visualize_ttest(sample_size, alpha01, t)\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "What can we conclude from there? What the one sample t-test tells us is that we don't have evidence which would lead us to think that the sample doesn't come from an Anderson like population. Therefore we **cannot reject our hypothesis $H_0$**. However this is not the same to say that *it is* the same as the Anderson population. This is one of the **important limits of the t-test**: like many other statistical tests, **it can be used only to reject an hypothesis** (the null hypothesis), not to confirm it.\n",
    "\n",
    "Now there are other limitations to keep in mind when using the one sample t-test, that we will explore in the section below.\n",
    "\n",
    "&nbsp;\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Influence of the sample size\n",
    "\n",
    "Above, we have seen that $t = $\n",
    "$\n",
    "\\begin{align}\n",
    "\\frac{m - \\mu}{\\sigma_{\\overline{X}}}\n",
    "\\end{align}\n",
    "$ and that $\\sigma_{\\overline{X}} = $\n",
    "$\n",
    "\\begin{align}\n",
    "\\frac{s}{\\sqrt{n}}\n",
    "\\end{align}\n",
    "$.\n",
    "\n",
    "Therefore we can rewrite the *t* statistics as:\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "t = \\frac{m - \\mu}{\\frac{s}{\\sqrt{n}}}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "This means that *t* is actually:\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "t = \\frac{m - \\mu}{s}\\sqrt{n}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "From there, we see that the **sample size $n$ influences the value of $t$**: all else being equal (i.e. sample mean, sample standard deviation and population mean), **a larger sample would result in a higher value of $t$** and therefore more chances to find a significant result for the t-test.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    For our irises from the Vullierens Castle, which sample size would make the value of $t$ reach our cutoff point of 2.01, all else being equal (i.e. with identical sample mean, sample standard deviation and population mean)?<br/>\n",
    "    Use the code cell below to write your answer in Python and test it.\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can <a href=\"./solution/StatisticsNotebook-solution.ipynb\">check your answer here</a>.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make your calculation in Python here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "We want to find the value of $n$ that would make $t$ at least equal to 2.01.<br/>\n",
    "We know that $t = $\n",
    "$\n",
    "\\begin{align}\n",
    "\\frac{m - \\mu}{s}\\sqrt{n}\n",
    "\\end{align}\n",
    "$.\n",
    "<br/>In other words, we are looking for the value of $n$ such as:\n",
    "$\n",
    "\\begin{align}\n",
    "\\frac{m - \\mu}{s}\\sqrt{n} = 2.01\n",
    "\\end{align}\n",
    "$.<br/>\n",
    "We can rewrite this expression to find $n$, which gives: \n",
    "$\n",
    "\\begin{align}\n",
    "n = \\left(\\frac{2.01 s}{m - \\mu}\\right)^2\n",
    "\\end{align}\n",
    "$ with $s$ the sample standard deviation, $m$ the sample mean and $\\mu$ the population mean.<br/>\n",
    "<br/>\n",
    "Then we have to write this in Python using the variables we have defined earlier and display the result. For squaring, either we just replace by a multiplication, which is what we have done below, or we use the Python operator <code>**</code> for power raising ($x^2$ is then written <code>x ** 2</code>). See the solution code below, in which we have used the variable <code>cutoff05</code> instead of the raw number <code>2.01</code> but that would work too.<br/>\n",
    "We obtain <code>n = 143.53</code>, which means that a sample size of 144 flowers or more with the same mean and standard deviation for the petal length would make the t-statistic above our cutoff point.\n",
    "</div>\n",
    "\n",
    "<pre>\n",
    "# Make your calculation in Python here\n",
    "n = ((cutoff05 * sample_std) / (sample_mean - mu)) * ((cutoff05 * sample_std) / (sample_mean - mu))\n",
    "\n",
    "# Display the result\n",
    "n\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So for instance, for our irises from the Vullierens Castle, **a sample of 144 flowers instead of 50** with exactly the same mean and standard deviation for the petal length would be considered as statistically different from the Anderson population. \n",
    "\n",
    "This is why when doing experiments, researchers generally try to get samples as large as possible - but of course this has a cost and is not always possible!\n",
    "\n",
    "&nbsp;\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using the *p-value*\n",
    "\n",
    "In scientific studies, researchers use frequently the t-test but they generally report not only the t-statistic but also **another result of the t-test which is called the p-value**. In the following, we explore what is the p-value, how it relates to the t-statistic and how it can be used."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Testing our hypothesis using a predefined Python function\n",
    "\n",
    "So far we have made the computations by hand but Python comes with a number of libraries with interesting statistical tools. \n",
    "In particular, the `stats` library includes a function for doing a **one-sample t-test** as we have done above.  \n",
    "\n",
    "Let's now use it and then look at what information it gives us."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute the t-test\n",
    "t, p = stats.ttest_1samp(sample_data[\"petal_length\"], mu)\n",
    "\n",
    "# Display the result\n",
    "print(\"t = {:.3f}\".format(t))\n",
    "print(\"p = {:.3f}\".format(p))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the predefined Python function for doing the one-sample t-test gives us the same value for the $t$ statistic as the calculations we have made by hand: $t = 1.185$.  \n",
    "In addition, we see that it also returns another value, $p = 0.242$. \n",
    "\n",
    "Actually, the two values `t` and `p` returned by the function say the same thing but in two different ways:\n",
    "* `t` tells us where our sample mean falls on the distribution of all the possible sample means for the Anderson population ;<br/>\n",
    "    `t` has to be compared to the cutoff value (2.01) to know if our sample mean is in the most extremes 5%.\n",
    "* `p` is **called the \"p-value\"** and is the **probability to get a more extreme sample mean** than the one we observe ;<br/>\n",
    "    `p` has to be compared to $\\alpha$ (0.05) to know if our sample mean is in the most extremes 5%.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    How does <code>t</code> compare to the cutoff value (2.01)?<br/>\n",
    "    And how does <code>p</code> compare to $\\alpha$ (0.05)?<br/>\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can <a href=\"./solution/StatisticsNotebook-solution.ipynb\">check your answer here</a>.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    \n",
    "We see above that:\n",
    "* $t = 1.185$ therefore $|t| < 2.01$, which means that the difference between the two means is smaller than 2.01 times the standard error \n",
    "* and $p = 0.242$ therefore $p > 0.05$, which means that the probability of getting more extreme sample mean than the one we observe is higher than 5% so our sample mean cannot be considered as one of the 5% most extreme possible values. \n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "As expected from the calculations we have made by hand above, the test using the predefined Python function confirms that the difference between the mean petal length of the Vullierens sample and the mean petal length of Anderson's population is **not statistically significant**.\n",
    "\n",
    "As we have just seen, **you can use either `t` or `p` to interpret the result of the t-test.** In practice, most people use the p-value because it can be directly compared to $\\alpha$ without having to look for the cutoff value in tables. However, as we will see more in details below, **`t` and `p` do not provide exactly the same information about the result of the test**, and it is important to understand how they differ."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualization of the p-value\n",
    "\n",
    "Using Python we can visualize what the t-test graphically by plotting the t-distribution of all the possible sample means that would be drawn from a population with the same mean as Anderson's population and showing where `t` is in the distribution compared to the zone defined by our $\\alpha$ of 5%.\n",
    "\n",
    "In addition to displaying the value of *t*, the visualization below also **shows the *p-value*** (represented by the hatched zone), which is the **area under the curve of the t-distribution** representing the probability of getting a more extreme sample mean than the one we observe. When this area is larger than the rejection zone defined by the $\\alpha$ we have chosen, then that means the difference between the sample mean and the population mean is not statistically significant."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize graphically the result of the t-test and the p-value with alpha at 0.05\n",
    "visualize_ttest_pvalue(sample_size, alpha05, t, p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    What values does the function <code>visualize_ttest_pvalue</code> need to generate a visualization of the result of a t-test?<br/>\n",
    "    In the code cell below, use this function to generate the visualization of a value of $t=-1.702$ and $p=0.095$ with the same sample size and same value for $\\alpha$ as in the example above.<br/>\n",
    "    What can you observe with this negative value for $t$?<br/>\n",
    "    How would you say that the p-value (the hatched zone) evolves when $|t|$ gets bigger?<br/>\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can <a href=\"./solution/StatisticsNotebook-solution.ipynb\">check your answer here</a>.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize graphically the result of a t-test of t=-1.702 and p=0.095\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    The <code>visualize_ttest_pvalue</code> function needs 4 different values to generate the visualization: the sample size, the significance level $\\alpha$, the value of $t$ and the value of $p$.<br/>\n",
    "    To generate a new visualization, we can simply copy-paste the <code>visualize_ttest_pvalue</code> function of the code cell above and replace <code>t</code> by the value <code>-1.702</code> and <code>p</code> by the value <code>0.095</code>, see the solution code below.<br/>\n",
    "    Because the value of <code>t</code> is negative, the bar representing $t$ now appears on the left side of the curve representing the t-distribution.<br/>\n",
    "    When comparing with the previous graph, we see that when $|t|$ is bigger, the hatched zone representing $p$ is smaller.\n",
    "    More generally, using the visualization, we can see that <strong>the bigger $|t|$ we get</strong>, the smaller the hatched zone we get and therefore <strong>the smaller the p-value we get</strong>. We therefore see that $t$ and $p$ evolve in opposite directions.    \n",
    "</div>\n",
    "\n",
    "<pre>\n",
    "# Visualize graphically the result of a t-test of t=-1.702 and p=0.095\n",
    "visualize_ttest_pvalue(sample_size, alpha05, -1.702, 0.095)\n",
    "</pre>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Thanks to the visualization above, we see that one important difference between the t-statistic and the p-value is that that $|t|$ and $p$ evolve in opposite directions:  <strong>the bigger $|t|$ is, the smaller$p$ is.</strong>\n",
    "\n",
    "Another important difference, is that **the t-statistic tells us whether the sample mean $m$ is greater or smaller than the population mean $\\mu$** whereas this is impossible to know with the p-value only: since the p-value corresponds to the area under the curve of the t-distribution, it is always positive. \n",
    "As we have seen earlier, the t-distribution is centred on zero, with zero meaning $m = \\mu$ and:\n",
    "* when $t > 0$ (i.e. $t$ is on the *right* side of the distribution on the visualization above) it means that $m > \\mu$ ;\n",
    "* when $t < 0$ (i.e. $t$ is on the *left* side of the distribution on the visualization above) it means that  $m < \\mu$.\n",
    "\n",
    "\n",
    "\n",
    "&nbsp;\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Importance of the choice of $\\alpha$\n",
    "\n",
    "So far we have seen two important points to keep in mind when using the t-test to compare a sample to a population: first the size of the sample matters and second the t-test provides us with two pieces of information, the t-statistic and the p-value, which are both useful but in different ways. In this section, we look at a third important point to keep in mind when doing statistical testing: the **influence of the choice of $\\alpha$**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Let's compare our Vullierens sample to another population\n",
    "\n",
    "<div  style=\"width:300px;float:right;margin-left:15px;\">\n",
    "    <img src=\"figs/iris-ensata.jpg\" alt=\"Iris Ensata\"/>\n",
    "\n",
    "###### Iris Ensata (Credit: Laitche CC BY-SA 3.0)\n",
    "\n",
    "</div>\n",
    "\n",
    "Let's imagine we want to know how our Vullierens sample compares to another iris population, for instance a Japanese iris species called Iris Ensata with a mean petal length of $\\mu_{ensata} = 5.832$ cm. We can apply the hypothesis testing approach that we have just learned and use a one-sample t-test to do the comparison, which we do in 4 steps:\n",
    "\n",
    "1. Get an idea about how the sample compares to the population:  \n",
    "    At first sight, the sample petal mean of our Vullierens sample, $m= 5.646$ cm, is again quite close to the mean petal length of the Ensata population, $\\mu_{ensata} = 5.832$ cm.  \n",
    "\n",
    "2. Formulate the hypotheses we want to test:\n",
    "    * First let's state our null hypothesis, which is that the Vullierens sample is similar to the Iris Ensata population, $H_0: m = \\mu_{ensata}$.  \n",
    "        This is the hypothesis we want to know whether we can reject or not.\n",
    "    * And then state the alternate hypothesis $H_a: m \\neq \\mu_{ensata}$.\n",
    "\n",
    "\n",
    "3. Choose a significance level:  \n",
    "    Let's choose $\\alpha=0.05$ as previously, which means we want to be 95% sure.\n",
    "\n",
    "4. Compute the result of the t-test by using the predefined Python function <code>stats.ttest_1samp</code> :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the mean petal length of the Ensata population\n",
    "mu_ensata = 5.832\n",
    "\n",
    "# Compute the t-test comparing the Vullierens sample petal length to the Ensata population mean\n",
    "t, p = stats.ttest_1samp(sample_data[\"petal_length\"], mu_ensata)\n",
    "\n",
    "# Display the result\n",
    "print(\"t = {:.3f}\".format(t))\n",
    "print(\"p = {:.3f}\".format(p))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result of the t-test gives $t = -2.346$ and  $p = 0.023$.\n",
    "\n",
    "With $\\alpha=0.05$, the cutoff value is 2.01. We see that $|t| > 2.01$ and $p < 0.05$. Therefore, the test tells us that the difference between the mean petal length of the Vullierens sample and the mean petal length of the Ensata population <strong>IS statistically significant</strong>. In other words, we <strong>can reject</strong> the hypothesis that the Vullierens sample is similar to the Ensata population."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is the role of $\\alpha$ in this result?\n",
    "\n",
    "Now let's ask ourselves **what would have been the conclusion of the test if we had chosen a significance level of $\\alpha=0.01$**, i.e. if we wanted to be 99% sure?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"padding: 10px;border:1px solid red;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Question</span><br/>\n",
    "    In the code cell below, use the function <code>visualize_ttest_pvalue</code> to generate two visualizations of the result of this t-test: one with $\\alpha=0.05$ and then another with $\\alpha=0.01$.<br/>\n",
    "    How do you interpret the results?<br/>\n",
    "    <p style=\"text-align:right;margin-bottom:0px;font-style:italic;\">You can <a href=\"./solution/StatisticsNotebook-solution.ipynb\">check your answer here</a>.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize graphically the result of the t-test with alpha05\n",
    "\n",
    "# Visualize graphically the result of the t-test with alpha01\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- SolutionToRemove -->\n",
    "<div style=\"padding: 10px;border:1px solid blue;\">\n",
    "    <span style=\"text-decoration:underline;font-weight:bold;\">Solution</span><br/>\n",
    "    We copy-paste the code of the cell above with the <code>visualize_ttest_pvalue</code> function twice: in one we use <code>cutoff05</code> and in the other we use <code>cutoff01</code>, the other parameters remaining the same, see the code cell below.<br/>\n",
    "    When comparing the two visualizations we see that, while $t$ is at the same place in both, the rejection zone is much smaller in the second visualization i.e. when $\\alpha=0.01$, reflecting the fact that we want to be sure that the sample is <em>extremely different</em> from the population (i.e. is part of the 1% most extreme possible samples for a population with this mean petal length) before rejecting our null hypothesis $H_0$.<br/>\n",
    "    As a consequence, $t$ falls into the rejection zone in the first visualization but not in the second, which means that when choosing $\\alpha = 0.01$, we <strong>cannot reject anymore our null hypothesis $H_0$</strong>. The conclusion in this case is that, with $\\alpha=0.01$, the difference between the mean petal length of the Vullierens sample and the mean petal length the Ensata population cannot be considered as statistically significant.\n",
    "</div>\n",
    "\n",
    "<pre>\n",
    "# Visualize graphically the result of the t-test with alpha05\n",
    "visualize_ttest_pvalue(sample_size, alpha05, t, p)\n",
    "\n",
    "# Visualize graphically the result of the t-test with alpha01\n",
    "visualize_ttest_pvalue(sample_size, alpha01, t, p)\n",
    "</pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For $\\alpha = 0.01$, the cutoff value which we get from the tables is 2.67. With this choice of $\\alpha$, we see that $|t| < 2.67$ and $p > 0.01$. This means that when choosing $\\alpha = 0.01$, the test tells us that the difference between the mean petal length of the Vullierens sample and the mean petal length the Ensata population <strong>is NOT statistically significant anymore</strong>. In other words, if we want to be sure at the 1% level, we <strong>cannot reject anymore</strong> the hypothesis that the Vullierens sample is similar to the Ensata population.\n",
    "\n",
    "This illustrates the <strong>importance of the choice of $\\alpha$</strong> when testing an hypothesis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&nbsp;\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summary\n",
    "\n",
    "In this notebook, you have seen how to compare a sample to a population using an approach called **hypothesis testing** and using a statistical test called a **one-sample t-test**.\n",
    "\n",
    "To summarize, to compare the mean of a sample to a reference value from a population, you have to proceed in four main steps:\n",
    "1. Look at descriptive statistics and visualizations of the sample you have to get an idea about how it compares to the population\n",
    "1. Formulate the hypothese you want to test: the null hypothesis $H_0: m = \\mu$ and its alternate $H_a: m \\neq \\mu$ \n",
    "1. Choose a significance level for being sure, usually $\\alpha = 0.05$ or $\\alpha = 0.01$, or even $\\alpha = 0.001$ \n",
    "1. Compute the result of the t-test and interpret the result - in particular if the p-value is *below* the significance level you have chosen, $p \\lt \\alpha$, then it means $H_0$ should probably be rejected"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&nbsp;\n",
    "\n",
    "---\n",
    "\n",
    "<h1 id=\"Bibliography\">Bibliography</h1>\n",
    "\n",
    "[1] E. Anderson (1935). \"The Irises of the Gaspe Peninsula.\" Bulletin of the American Iris Society 59: 2–5.\n",
    "\n",
    "[2] R. A. Fisher (1936). \"The use of multiple measurements in taxonomic problems\". Annals of Eugenics. 7 (2): 179–188. doi:10.1111/j.1469-1809.1936.tb02137.x\n",
    "\n",
    "More about the Iris Dataset on Wikipedia: https://en.wikipedia.org/wiki/Iris_flower_data_set\n",
    "\n",
    "*Please note that the datasets used in this notebook have been generated using a random generator, it does not come from real measurement and cannot be used for any research purpose.*"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  },
  "toc-autonumbering": true
 },
 "nbformat": 4,
 "nbformat_minor": 4
}