{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Solution of the exercise\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**A. Getting familiar with the code.**\n",
    "1. In the code cells above, where is the t-test computed using the predefined Python function?  \n",
    "&nbsp;&#129094;&nbsp;The predefined Python function for computing a one-sample t-test is `stats.ttest_1samp`. You can search for it in the notebook by typing `ctrl + F` or using the top menu `Edit` > `Find`.\n",
    "\n",
    "1. What are the two parameters that the t-test function takes as input?  \n",
    "&nbsp;&#129094;&nbsp;The two input parameters of the `stats.ttest_1samp` function are: a) the list of all the sample values, in our case `sample_data[\"petal_length\"]`, and b) the population reference value for comparison, in our case `mu`.\n",
    "\n",
    "1. If you wanted to change the population mean to a different value, like $\\mu = 5.4$ cm for instance, in which cell would you change it?  \n",
    "Then what would you need to do to update the result of the analysis in the notebook?  \n",
    "&nbsp;&#129094;&nbsp;You could change it in two different places: in the cell where `mu` is defined, at the top of the notebook, or you could replace `mu` with the value `5.4` in the cell where the function `stats.ttest_1samp` is called.  \n",
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Then you would have to *re-execute the cells* of the notebook which do the analysis to update the results.\n",
    "\n",
    "1. What is the result of the t-test if you compare the Vullierens sample to a population mean of $\\mu = 5.4$ cm?  \n",
    "&nbsp;&#129094;&nbsp;The difference becomes statistically significant with $t=3.025$ and $p=0.004 \\lt \\alpha$, in which case we would reject our null hypothesis and conclude that our sample is probably from a different population.  \n",
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;However, the effect size remains below the medium threshold, $d=0.428$.\n",
    "\n",
    "*Change the value of $\\mu$ back to 5.552 before working on the following questions.*\n",
    "\n",
    "**B. Analyzing another dataset.**\n",
    "\n",
    "A researcher from Tokyo sends you the results of a series of measurements she has done on the Irises of the [Meiji Jingu Imperial Gardens](http://www.meijijingu.or.jp/english/nature/2.html). The dataset can be found in the `iris-sample2-meiji.csv` file.  \n",
    "How similar (or different) is the Meiji sample compared to the Iris virginica population documented by Edgar Anderson?  \n",
    "The following questions are designed to guide you in analyzing this new dataset using this notebook.\n",
    "\n",
    "1. Which of the code cells above loads the data from the file containing the Vullierens dataset? Modify it to load the Meiji dataset.  \n",
    "&nbsp;&#129094;&nbsp;The function to read a CSV file in Python is `pan.read_csv`. To load the Meiji dataset, you have to replace its input parameter with the name of the file containing the second dataset, `'iris-sample2-meiji.csv'`.\n",
    "\n",
    "1. Do you need to modify anything else in the code to analyze this new dataset?   \n",
    "&nbsp;&#129094;&nbsp;There is nothing else to change in the code to analyse this new dataset since the new sample values are simply stored in the `sample_data` variable, in replacement of the previous values.  \n",
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;However, you have to *re-execute all the code cells* which compute the analysis so that all the calculations are made with the new values.  \n",
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Note that you can use the top menu `Run` > `Run All Cells` to execute all the code cells of the notebook all at once.\n",
    "\n",
    "1. What can you conclude about the Meiji sample from this analysis?  \n",
    "&nbsp;&#129094;&nbsp;The difference in petal length mean between the Meiji sample and the Anderson population is statistically significant at the 5% level with $t=2.352$ and $p=0.023$, but a relatively small effect size $d=0.333$.  \n",
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;In this case we could reject our null hypothesis and conclude that the Meiji sample is probably from a different population.\n",
    "\n",
    "**C. Going a bit further in the interpretation of the t-test.**\n",
    "1. In the code cells above, where is the cut-off point $\\alpha$ defined? Change its value to 0.01 and re-execute the notebook.  \n",
    "&nbsp;&#129094;&nbsp;Search for the variable `alpha` in the notebook and modify its assignment in `alpha = 0.01`. Then re-execute the notebook.\n",
    "\n",
    "1. How does this affect the result of the t-test for the Meiji sample?  \n",
    "&nbsp;&#129094;&nbsp;When choosing a significance level of 1% ($\\alpha = .01$), the difference in petal length mean between the Meiji sample and the Anderson population cannot be considered statistically significant.   \n",
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;This means that the evidence we have is not strong enough if we want to be 99% sure. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}