diff --git a/src/ridge_regression_and_lasso.ipynb b/src/ridge_regression_and_lasso.ipynb
new file mode 100644
index 0000000..bf079e7
--- /dev/null
+++ b/src/ridge_regression_and_lasso.ipynb
@@ -0,0 +1,450 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Ridge Regression and the Lasso"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "New R commands:\n",
+ "* `glmnet(x, y, alpha = , lambda = )`\n",
+ "* `model.matrix(formula, data)`\n",
+ "\n",
+ "Useful R commands to remember:\n",
+ "* `seq(from, to, length = )` create a list of a given `length` of linearly spaced numbers between `from` and `to`.\n",
+ " \n",
+ "\n",
+ "This section follows the Lab 2 in chapter 6 from the textbook.\n",
+ "\n",
+ "We will use the `glmnet` package in order to perform ridge regression and\n",
+ "the lasso. The main function in this package is `glmnet()`, which can be used\n",
+ "to fit ridge regression models, lasso models, and more. This function has\n",
+ "slightly different syntax from other model-fitting functions that we have\n",
+ "encountered thus far in this book. In particular, we must pass in an x\n",
+ "matrix as well as a y vector, and we do not use the y ∼ x syntax. We will\n",
+ "now perform ridge regression and the lasso in order to predict Salary on\n",
+ "the Hitters data."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Ridge regression"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "library(ISLR)\n",
+ "names(Hitters)\n",
+ "Hitters = na.omit(Hitters)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The `model.matrix()` function is particularly useful for creating x; not only\n",
+ "does it produce a matrix corresponding to the 19 predictors but it also\n",
+ "automatically transforms any qualitative variables into dummy variables.\n",
+ "The latter property is important because `glmnet()` can only take numerical,\n",
+ "quantitative inputs. We remove the first column returned by `model.matrix()`,\n",
+ "since it only contains ones that are only needed when fitting an intercept with other methods. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "x = model.matrix(Salary~., Hitters)[,-1]\n",
+ "y = Hitters$Salary"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The `glmnet()` function has an `alpha` argument that determines what type\n",
+ "of model is fit. If `alpha=0` then a ridge regression model is fit, and if `alpha=1`\n",
+ "then a lasso model is fit. We first fit a ridge regression model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "library(glmnet)\n",
+ "grid = 10^seq(10, -2, length = 100)\n",
+ "ridge.mod = glmnet(x, y, alpha = 0, lambda = grid)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "By default the `glmnet()` function performs ridge regression for an automatically selected range of λ values. However, here we have chosen to implement\n",
+ "the function over a grid of values ranging from $\\lambda = 10^{10}$ to $\\lambda = 10^{-2}$.\n",
+ "Note that by default, the `glmnet()` function standardizes the\n",
+ "variables so that they are on the same scale. To turn off this default setting,\n",
+ "use the argument `standardize=FALSE`.\n",
+ "\n",
+ "Associated with each value of λ is a vector of ridge regression coefficients,\n",
+ "stored in a matrix that can be accessed by `coef()`. In this case, it is a 20×100\n",
+ "matrix, with 20 rows (one for each predictor, plus an intercept) and 100\n",
+ "columns (one for each value of λ).\n",
+ "We expect the coefficient estimates to be much smaller, in terms of $\\ell_2$ norm,\n",
+ "when a large value of λ is used, as compared to when a small value of λ is\n",
+ "used. \n",
+ "\n",
+ "Check that this is the case by repeating the following steps for another index than 50."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ridge.mod$lambda[50]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "coef(ridge.mod)[,50]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Next we exclude the intercept and compute the $\\ell_2$ norm of the coefficients."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sqrt(sum(coef(ridge.mod)[-1,50]^2))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can use the `predict()` function for a number of purposes. For instance,\n",
+ "we can obtain the ridge regression coefficients for a new value of λ, say 50:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "predict(ridge.mod, s = 50, type = \"coefficients\")[1:20,]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We now split the samples into a training set and a test set in order\n",
+ "to estimate the test error of ridge regression and the lasso.\n",
+ "There are two\n",
+ "common ways to randomly split a data set. The first is to produce a random\n",
+ "vector of TRUE, FALSE elements and select the observations corresponding to\n",
+ "TRUE for the training data. The second is to randomly choose a subset of\n",
+ "numbers between 1 and n; these can then be used as the indices for the\n",
+ "training observations. The two approaches work equally well."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "set.seed(1)\n",
+ "train = sample(1:nrow(x), nrow(x)/2)\n",
+ "test = (-train)\n",
+ "y.test = y[test]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Next we fit a ridge regression model on the training set, and evaluate\n",
+ "its MSE on the test set, using λ = 4. Note the use of the `predict()`\n",
+ "function again. This time we get predictions for a test set, by replacing\n",
+ "`type=\"coefficients\"` with the `newx` argument."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ridge.mod = glmnet(x[train,], y[train], alpha = 0, lambda = grid)\n",
+ "ridge.pred = predict(ridge.mod, s = 4, newx = x[test,])\n",
+ "mean((ridge.pred - y.test)^2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Remember that in the extreme case of λ approaching infinite, all coefficients are shrunk to zero,\n",
+ "which corresponds to fitting a model with only an intercept.\n",
+ "Let us check that this is the case for our fit."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ridge.pred = predict(ridge.mod, s = 10^200, newx = x[test,])\n",
+ "mean((ridge.pred - y.test)^2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "mean((mean(y[train]) - y.test)^2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "So fitting a ridge regression model with λ = 4 leads to a much lower test\n",
+ "MSE than fitting a model with just an intercept.\n",
+ "\n",
+ "We now check whether\n",
+ "there is any benefit to performing ridge regression with λ = 4 instead of\n",
+ "just performing least squares regression. Recall that least squares is simply\n",
+ "ridge regression with λ = 0. While the two methods yield very similar results,\n",
+ "it is unclear to me why there are still small differences."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ridge.pred = predict(ridge.mod, s=0, newx=x[test,], exact=T, x=x[train,], y=y[train])\n",
+ "mean((ridge.pred-y.test)^2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "lm.pred = predict(lm(Salary ~ ., data = Hitters, subset=train), Hitters[test,])\n",
+ "mean((lm.pred-y.test)^2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In general, instead of arbitrarily choosing λ = 4, it would be better to\n",
+ "use cross-validation to choose the tuning parameter λ. We can do this using\n",
+ "the built-in cross-validation function, `cv.glmnet()`. By default, the function\n",
+ "performs ten-fold cross-validation, though this can be changed using the\n",
+ "argument `nfolds`. Note that we set a random seed first so our results will\n",
+ "be reproducible, since the choice of the cross-validation folds is random."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "set.seed(1)\n",
+ "cv.out=cv.glmnet(x[train,],y[train],alpha=0)\n",
+ "plot(cv.out)\n",
+ "bestlam=cv.out$lambda.min\n",
+ "bestlam"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Take the best lambda `bestlam` and compute the mean squared error on the test set.\n",
+ "\n",
+ "Is there a further improvement in test error over the arbitrary choice of λ = 4?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Finally, we refit our ridge regression model on the full data set,\n",
+ "using the value of λ chosen by cross-validation, and examine the coefficient\n",
+ "estimates."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "out = glmnet(x, y, alpha=0)\n",
+ "predict(out, type=\"coefficients\", s=bestlam)[1:20,]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## The Lasso\n",
+ "\n",
+ "We saw that ridge regression with a wise choice of λ can outperform least\n",
+ "squares as well as the null model on the Hitters data set. We now ask\n",
+ "whether the lasso can yield either a more accurate or a more interpretable\n",
+ "model than ridge regression. In order to fit a lasso model, we once again\n",
+ "use the `glmnet()` function; however, this time we use the argument `alpha=1`.\n",
+ "Other than that change, we proceed just as we did in fitting a ridge model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)\n",
+ "plot(lasso.mod)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Perform now cross-validation and compute the associated test error."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Finally, refit the lasso on the full data set,\n",
+ "using the value of λ chosen by cross-validation, examine the coefficient\n",
+ "estimates and compare them to the coefficients found by ridge regression.\n",
+ "Use again the seed 1 to get reproducible results for the cross-validation."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now you should be ready to do the following quiz."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 88,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "IRdisplay::display_html('')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Bayesian Interpretation for Ridge Regression and the Lasso (Optional)\n",
+ "\n",
+ "Read in the textbook on page 226 about the nice Bayesian interpretation of ridge regression and the lasso and have a look at exercises 7 on page 262."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.6.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}