{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Introduction to hypothesis testing
\n", "\n", "An important part of the scientific process is to make hypotheses about the world or about the results of experiments. These hypotheses need then to be checked by collecting evidence and making comparisons. Hypothesis testing is a step in this process where statistical tools are used to test hypotheses using data.\n", "\n", "**This notebook is designed for you to learn**:\n", "* How to distinguish between \"population\" datasets and \"sample\" datasets when dealing with experimental data\n", "* How to compare a sample to a population, test a hypothesis using a statistical test called the \"t-test\" and interpret its results\n", "* How to use Python scripts to make statistical analyses on a dataset\n", "\n", "In the following, we will use an example dataset representing series of measurements on a type of flower called Iris." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "shift + enter
.You can check your answer by clicking on the \"...\" below.
\n", "5.713045 cm
.0.518940 cm
.\n",
"You can check your answer by clicking on the \"...\" below.
\n", "t
compare to the cutoff value (2.01)?p
compare to $\\alpha$ (0.05)?You can check your answer by clicking on the \"...\" below.
\n", "stats.ttest_1samp
:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Define the mean petal length of the Ensata population\n",
"mu_ensata = 5.832\n",
"\n",
"# Compute the t-test comparing the Vullierens sample petal length to the Ensata population mean\n",
"t, p = stats.ttest_1samp(sample_data[\"petal_length\"], mu_ensata)\n",
"\n",
"# Display the result\n",
"print(\"t = {:.3f}\".format(t))\n",
"print(\"p = {:.3f}\".format(p))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result of the t-test gives $t = -1.621$ and $p = 0.111$.\n",
"\n",
"With $\\alpha=0.05$, the cutoff value is 2.01. We see that $|t| < 2.01$ and $p > 0.05$. Therefore, the test tells us that the difference between the mean petal length of the Vullierens sample and the mean petal length of the Ensata population IS NOT statistically significant. In other words, we cannot reject the hypothesis that the Vullierens sample is similar to the Ensata population."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is the role of $\\alpha$ in this result?\n",
"\n",
"Now let's ask ourselves **what would have been the conclusion of the test if we had chosen a significance level of $\\alpha=0.01$**, i.e. if we wanted to be 99% sure?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For $\\alpha = 0.01$, the cutoff value which we get from the tables is 2.67. With this choice of $\\alpha$, we see that $|t| < 2.67$ and $p > 0.01$. This means that when choosing $\\alpha = 0.01$, the test tells us that the difference between the mean petal length of the Vullierens sample and the mean petal length the Ensata population is NOT statistically significant either. This is quite obvious, since the $t$ we have to \"beat\" is event larger with $\\alpha = 0.01$ than for $\\alpha = 0.05$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Summary\n",
"\n",
"In this notebook, you have seen how to compare a sample to a population using an approach called **hypothesis testing** and using a statistical test called a **one-sample t-test**.\n",
"\n",
"To summarize, to compare the mean of a sample to a reference value from a population, you have to proceed in four main steps:\n",
"1. Look at descriptive statistics and visualizations of the sample you have to get an idea about how it compares to the population\n",
"1. Formulate the hypothese you want to test: the null hypothesis $H_0: m = \\mu$ and its alternate $H_a: m \\neq \\mu$ \n",
"1. Choose a significance level for being sure, usually $\\alpha = 0.05$ or $\\alpha = 0.01$, or even $\\alpha = 0.001$ \n",
"1. Compute the result of the t-test and interpret the result - in particular if the p-value is *below* the significance level you have chosen, $p \\lt \\alpha$, then it means $H_0$ should probably be rejected"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
"\n",
"---\n",
"\n",
"