{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Introduction to hypothesis testing
\n", "\n", "An important part of the scientific process is to make hypotheses about the world or about the results of experiments. These hypotheses need then to be checked by collecting evidence and making comparisons. Hypothesis testing is a step in this process where statistical tools are used to test hypotheses using data.\n", "\n", "**This notebook is designed for you to learn**:\n", "* How to distinguish between \"population\" datasets and \"sample\" datasets when dealing with experimental data\n", "* How to compare a sample to a population, test a hypothesis using a statistical test called the \"t-test\" and interpret its results\n", "* How to use Python scripts to make statistical analyses on a dataset\n", "\n", "In the following, we will use an example dataset representing series of measurements on a type of flower called Iris." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "shift + enter
.You can check your answer here.
\n", "\"petal_width\"
?You can check your answer here.
\n", "You can check your answer here.
\n", "sample_stats
table?You can check your answer here.
\n", "You can check your answer here.
\n", "alpha01
to define a significance level of $\\alpha = 0.01$.\n",
" You can check your answer here.
\n", "You can check your answer here.
\n", "You can check your answer here.
\n", "You can check your answer here.
\n", "You can check your answer here.
\n", "You can check your answer here.
\n", "t
compare to the cutoff value (2.01)?p
compare to $\\alpha$ (0.05)?You can check your answer here.
\n", "visualize_ttest_pvalue
need to generate a visualization of the result of a t-test?You can check your answer here.
\n", "stats.ttest_1samp
:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Define the mean petal length of the Ensata population\n",
"mu_ensata = 5.832\n",
"\n",
"# Compute the t-test comparing the Vullierens sample petal length to the Ensata population mean\n",
"t, p = stats.ttest_1samp(sample_data[\"petal_length\"], mu_ensata)\n",
"\n",
"# Display the result\n",
"print(\"t = {:.3f}\".format(t))\n",
"print(\"p = {:.3f}\".format(p))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result of the t-test gives $t = -2.346$ and $p = 0.023$.\n",
"\n",
"With $\\alpha=0.05$, the cutoff value is 2.01. We see that $|t| > 2.01$ and $p < 0.05$. Therefore, the test tells us that the difference between the mean petal length of the Vullierens sample and the mean petal length of the Ensata population IS statistically significant. In other words, we can reject the hypothesis that the Vullierens sample is similar to the Ensata population."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is the role of $\\alpha$ in this result?\n",
"\n",
"Now let's ask ourselves **what would have been the conclusion of the test if we had chosen a significance level of $\\alpha=0.01$**, i.e. if we wanted to be 99% sure?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"visualize_ttest_pvalue
to generate two visualizations of the result of this t-test: one with $\\alpha=0.05$ and then another with $\\alpha=0.01$.You can check your answer here.
\n", "