{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Introduction to hypothesis testing
\n", "\n", "An important part of the scientific process is to make hypotheses about the world or about the results of experiments. These hypotheses need then to be checked by collecting evidence and making comparisons. Hypothesis testing is a step in this process where statistical tools are used to test hypotheses using data.\n", "\n", "**This notebook is designed for you to learn**:\n", "* How to distinguish between \"population\" datasets and \"sample\" datasets when dealing with experimental data\n", "* How to compare a sample to a population, test a hypothesis using a statistical test called the \"t-test\" and interpret its results\n", "* How to use Python scripts to make statistical analyses on a dataset\n", "\n", "In the following, we will use an example dataset representing series of measurements on a type of flower called Iris." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "shift + enter
.You can check your answer here.
\n", "You can check your answer here.
\n", "t
compare to the cutoff value (2.01)?p
compare to $\\alpha$ (0.05)?You can check your answer here.
\n", "stats.ttest_1samp
:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Define the mean petal length of the Ensata population\n",
"mu_ensata = 5.832\n",
"\n",
"# Compute the t-test comparing the Vullierens sample petal length to the Ensata population mean\n",
"t, p = stats.ttest_1samp(sample_data[\"petal_length\"], mu_ensata)\n",
"\n",
"# Display the result\n",
"print(\"t = {:.3f}\".format(t))\n",
"print(\"p = {:.3f}\".format(p))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result of the t-test gives $t = -2.346$ and $p = 0.023$.\n",
"\n",
"With $\\alpha=0.05$, the cutoff value is 2.01. We see that $|t| > 2.01$ and $p < 0.05$. Therefore, the test tells us that the difference between the mean petal length of the Vullierens sample and the mean petal length of the Ensata population IS statistically significant. In other words, we can reject the hypothesis that the Vullierens sample is similar to the Ensata population."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is the role of $\\alpha$ in this result?\n",
"\n",
"Now let's ask ourselves **what would have been the conclusion of the test if we had chosen a significance level of $\\alpha=0.01$**, i.e. if we wanted to be 99% sure?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For $\\alpha = 0.01$, the cutoff value which we get from the tables is 2.67. With this choice of $\\alpha$, now we get that $|t| < 2.67$ and $p > 0.01$. This means that when choosing $\\alpha = 0.01$, the test tells us that the difference between the mean petal length of the Vullierens sample and the mean petal length the Ensata population is NOT statistically significant anymore. In other words, if we want to be sure at the 1% level, we cannot reject anymore the hypothesis that the Vullierens sample is similar to the Ensata population.\n",
"\n",
"This illustrates the importance of the choice of $\\alpha$ when testing an hypothesis."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Summary\n",
"\n",
"In this notebook, you have seen how to compare a sample to a population using an approach called **hypothesis testing** and using a statistical test called a **one-sample t-test**.\n",
"\n",
"To summarize, to compare the mean of a sample to a reference value from a population, you have to proceed in four main steps:\n",
"1. Look at descriptive statistics and visualizations of the sample you have to get an idea about how it compares to the population\n",
"1. Formulate the hypothese you want to test: the null hypothesis $H_0: m = \\mu$ and its alternate $H_a: m \\neq \\mu$ \n",
"1. Choose a significance level for being sure, usually $\\alpha = 0.05$ or $\\alpha = 0.01$, or even $\\alpha = 0.001$ \n",
"1. Compute the result of the t-test and interpret the result - in particular if the p-value is *below* the significance level you have chosen, $p \\lt \\alpha$, then it means $H_0$ should probably be rejected"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
"\n",
"---\n",
"\n",
"