diff --git a/src/lecture11.ipynb b/src/lecture11.ipynb
new file mode 100644
index 0000000..9e8932e
--- /dev/null
+++ b/src/lecture11.ipynb
@@ -0,0 +1,787 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Lecture 11\n",
+ "\n",
+ "## Principal Component Analysis: Directions of Highest Variance"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "IRdisplay::display_html('')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "Let us first load the wine data and look at it with the `pairs` function."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "wine <- read.csv(\"http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data\", sep=\",\")\n",
+ "colnames(wine) <- c(\"Cultivar\",\"Alcohol\",\"Malic acid\",\"Ash\",\"Alcalinity of ash\", \"Magnesium\", \"Total phenols\", \"Flavanoids\", \"Nonflavanoid phenols\", \"Proanthocyanins\", \"Color intensity\", \"Hue\", \"OD280/OD315 of diluted wines\", \"Proline\")\n",
+ "pairs(wine[,-1], cex = .5) # the first column contains the label of the cultivar; we pretend here that we don't know it."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "We will scale the data first with `scale` and perform a principal component\n",
+ "analysis on this data set with the function `prcomp` and plot the scores of the\n",
+ "data onto the first two components."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "outputs": [],
+ "source": [
+ "wine.sc <- scale(wine[,-1])\n",
+ "pca <- prcomp(wine.sc, scale = F)\n",
+ "# pca <- prcomp(wine[,-1], scale = T) # is an alternative, where we don't scale manually\n",
+ "plot(pca$x[,1:2])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The function `prcomp` returns the scores in the variable `x` and the loadings in\n",
+ "variable `rotation`. We can verify that there are $p$ loading vectors of length\n",
+ "$p$ that have norm 1 and are orthogonal."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dim(pca$rotation)\n",
+ "norm(pca$rotation[,1], \"2\") # use the standard Euclidean 2-norm\n",
+ "sum(pca$rotation[,1] * pca$rotation[,2]) # the scalar product is approximately 0"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "Let us now produce the `biplot` from the slides"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "outputs": [],
+ "source": [
+ "biplot(pca, col = c('gray', 'red'), scale = 0)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "You can verify by looking at `pca$x[,1:2]` and `pca$rotation[,1:2]` that the\n",
+ "texts are placed at the right position.\n",
+ "\n",
+ "To produce a similar figure to the one in the slides without scaling we can use the code"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "biplot(prcomp(wine[,-1], scale = F), col = c('gray', 'red'), scale = 0)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "Let us check the relation between Singular Value Decomposition and PCA."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "outputs": [],
+ "source": [
+ "s <- svd(wine.sc)\n",
+ "sum((s$v - pca$rotation)^2)\n",
+ "sum((s$u %*% diag(s$d) - pca$x)^2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can see in the output above that the loadings matrix `pca$rotation` is the\n",
+ "same as the $V$ matrix of SVD (up to numerical errors) and the scores `pca$x`\n",
+ "are the same as $U\\Sigma$ where $\\Sigma$ is given by `diag(s$d)`.\n",
+ "\n",
+ "You can now answer the questions on the first page of the\n",
+ "[quiz](https://moodle.epfl.ch/mod/quiz/view.php?id=1117364).\n",
+ "\n",
+ "## Principal Component Analysis: Closest Low-Dimensional Subspace"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "IRdisplay::display_html('')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "In the next cell you can see that we can perfectly reconstruct the original data\n",
+ "$X$ by computing either $Z\\Phi^T$ or $X\\Phi\\Phi^T$."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "wine.sc.rec1 <- pca$x %*% t(pca$rotation)\n",
+ "wine.sc.rec2 <- wine.sc %*% pca$rotation %*% t(pca$rotation)\n",
+ "sum((wine.sc - wine.sc.rec1)^2)\n",
+ "sum((wine.sc - wine.sc.rec2)^2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "Now we reconstruct the data using only the first two principal components and\n",
+ "compare the reconstruction error to the one we obtain by using the first and the\n",
+ "third principal component. You can see that the reconstruction error is higher\n",
+ "in the latter case. No matter which pair of principal components we use to\n",
+ "reconstruct the original data: the reconstruction error will always be higher\n",
+ "than the one with the first two principal components. The first two principal\n",
+ "components find indeed the two-dimensional plane in this 13-dimensional space\n",
+ "that is closest to the original data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "wine.sc.rec.2 <- wine.sc %*% pca$rotation[,1:2] %*% t(pca$rotation[,1:2])\n",
+ "sum((wine.sc - wine.sc.rec.2)^2)\n",
+ "wine.sc.rec.2b <- wine.sc %*% pca$rotation[,c(1, 3)] %*% t(pca$rotation[,c(1, 3)])\n",
+ "sum((wine.sc - wine.sc.rec.2b)^2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You can now answer the questions on the second page of the\n",
+ "[quiz](https://moodle.epfl.ch/mod/quiz/view.php?id=1117364).\n",
+ "\n",
+ "## Proportion Variance Explained"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "IRdisplay::display_html('')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "The standard deviation of the scores of each component are in `pca$sdev`; we can\n",
+ "get the variance by squaring these numbers and the proportion of variance\n",
+ "explained by normalizing with the total variance."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "pca.var <- pca$sdev^2\n",
+ "pca.vare <- pca.var / sum(pca.var)\n",
+ "plot(pca.vare, xlab = \"Principal Component\", ylab = \"Prop. Variance Explained\", col = \"blue\", type = \"b\")\n",
+ "plot(cumsum(pca.vare), xlab = \"Principal Component\", ylab = \"Cumulative Prop. Variance Explained\", col = \"blue\", type = \"b\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "In the following cell we download and plot the image of the boat you saw in the\n",
+ "video."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "library(tiff)\n",
+ "fname <- tempfile()\n",
+ "download.file(\"http://sipi.usc.edu/database/download.php?vol=misc&img=boat.512\", fname)\n",
+ "img <- readTIFF(fname)\n",
+ "image(t(img)[,512:1], col = gray.colors(100, 0, 1, 1), axes = F, asp = 1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "Now, let us scale the data, perform PCA and plot the variance explained, the\n",
+ "original image and the reconstructed image with the first 25 components."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img.s <- scale(img)\n",
+ "pZ <- prcomp(img.s)\n",
+ "library(DMwR) # used for the unscale function\n",
+ "newimg <- unscale(pZ$x[,1:25] %*% t(pZ$rotation[,1:25]), img.s)\n",
+ "par(mfrow = c(1, 3), oma = rep(1, 4), cex = 1)\n",
+ "plot(pZ$sdev^2/sum(pZ$sdev^2), ylab = 'Prop. Variance Explained')\n",
+ "image(t(img)[,512:1], col = gray.colors(100, 0, 1, 1), axes = F, asp = 1)\n",
+ "image(t(newimg)[,512:1], col = gray.colors(100, 0, 1, 1), axes = F, asp = 1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You can now answer the questions on the third page of the\n",
+ "[quiz](https://moodle.epfl.ch/mod/quiz/view.php?id=1117364).\n",
+ "\n",
+ "## Limitations of PCA"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "IRdisplay::display_html('')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "Let us generate below an artificial dataset with 8 point clouds."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "set.seed(517)\n",
+ "library(MASS)\n",
+ "x1 <- mvrnorm(100, c(1, 1, 1), .1*diag(3))\n",
+ "x2 <- mvrnorm(100, c(-1, 1, 1), .1*diag(3))\n",
+ "x3 <- mvrnorm(100, c(1, -1, 1), .1*diag(3))\n",
+ "x4 <- mvrnorm(100, c(-1, -1, 1), .1*diag(3))\n",
+ "x5 <- mvrnorm(100, c(1, 1, -.8), .1*diag(3))\n",
+ "x6 <- mvrnorm(100, c(-1, 1, -.8), .1*diag(3))\n",
+ "x7 <- mvrnorm(100, c(1, -1, -.8), .1*diag(3))\n",
+ "x8 <- mvrnorm(100, c(-1, -1, -.8), .1*diag(3))\n",
+ "x <- rbind(x1, x2, x3, x4, x5, x6, x7, x8)\n",
+ "cols <- c(rep(1, 400), rep(2, 400))\n",
+ "library(plotly)\n",
+ "plot_ly(x = x[,1], y = x[,2], z = x[,3], type = 'scatter3d', mode = 'markers', marker = list(size = 2), color = cols)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "Without scaling the data you get a very similar image as the one shown in the\n",
+ "slides."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "pca <- prcomp(x)\n",
+ "order <- sample(1:800)\n",
+ "plot(pca$x[order, 1:2], col = cols[order], xlab = \"PC1\", ylab = \"PC2\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "With scaling before applying PCA the result looks a bit better, but there are\n",
+ "still overlaps of the point clouds after projection to the lower-dimensional\n",
+ "space."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "x <- scale(x)\n",
+ "pca <- prcomp(x)\n",
+ "order <- sample(1:800)\n",
+ "plot(pca$x[order, 1:2], col = cols[order], xlab = \"PC1\", ylab = \"PC2\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "The result is very different when we apply t-SNE instead."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "library(Rtsne)\n",
+ "tsne <- Rtsne(x)\n",
+ "plot(tsne$Y, col = cols)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Principal Component Regression"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "IRdisplay::display_html('')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "The code in this section may look complicated. Don't worry, if you don't\n",
+ "understand the details. I included the code for transparency: you can see,\n",
+ "how I produced the figures in the slides. It is totally fine if you do not want\n",
+ "to spend time on reading it. The only thing that you may want to remember is\n",
+ "that there is the `library(pls)` that contains the function `pcr` to do\n",
+ "principal component regression; however, the Lasso or Ridge Regression is\n",
+ "usually preferable.\n",
+ "\n",
+ "In the following cell we define a data generator function that samples\n",
+ "potentially low-dimensional, hidden predictors `z` and linearly transforms them\n",
+ "to a higher dimensional representation `x` using transformation matrix `P`.\n",
+ "For the first dataset we leave the transformation matrix `P` the identity\n",
+ "matrix, i.e. `z` is as high-dimensional as `x`, but we will use a sparse `beta`,\n",
+ "i.e. as `beta` with some zero entries. For the second artificial dataset we will\n",
+ "use a low-dimensional `z` and a dense `beta`.\n",
+ "The response `y` is then sampled following the ordinary assumption of multiple\n",
+ "linear regression.\n",
+ "The functions `bias.and.variance`, `glm.fit` and `pcr.fit` will be used to\n",
+ "compute the bias and variance of the different methods and fit the data with the\n",
+ "Lasso or PCR, respectively."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data.generator <- function(beta, n = 50, p = 45, P = diag(rep(1, p))) {\n",
+ " z <- matrix(rnorm(n * nrow(P)), nrow = n)\n",
+ " x <- z %*% P\n",
+ " y = x %*% beta + rnorm(n)\n",
+ " list(x = x, y = y, data = data.frame(X = x, Y = y))\n",
+ "}\n",
+ "bias.and.variance <- function(beta, beta.hats, x0) {\n",
+ " m.beta.hats <- rowMeans(beta.hats)\n",
+ " list(test.error = mean(colMeans((x0 %*% as.matrix(sweep(beta.hats, 1, beta)))^2)),\n",
+ " bias = mean(rowSums(x0 %*% (beta - m.beta.hats))^2),\n",
+ " variance = mean(colMeans(((x0 %*% as.matrix(sweep(beta.hats, 1, m.beta.hats))))^2)))\n",
+ "}\n",
+ "library(glmnet)\n",
+ "glm.fit <- function(data, ...) as.matrix(coef(glmnet(data$x, data$y, intercept = F, ...)))[seq(2, ncol(data$x)+1)]\n",
+ "library(pls)\n",
+ "pcr.fit <- function(data, ...) as.matrix(coef(pcr(Y ~ ., data = data$data, ...)))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "In this cell we fit multiple training sets with the Lasso and different values\n",
+ "for the regularization parameter `lambda` and we repeat the same for PCR and\n",
+ "different number of components. Here we use data with 45-dimensional\n",
+ "predictors and sparse `beta`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "outputs": [],
+ "source": [
+ "set.seed(123)\n",
+ "beta <- rnorm(45)\n",
+ "beta[sample(45, 10)] <- 0 # like this we obtain a sparse beta, where 10 entries are 0\n",
+ "x0 <- data.generator(beta, n = 500)$x # test data\n",
+ "lambdas <- 10^seq(-5, 0, length = 20) # lambdas to try\n",
+ "res1 <- sapply(lambdas, function(lambda) {\n",
+ " bias.and.variance(beta, data.frame(replicate(500, glm.fit(data.generator(beta), alpha = 1, lambda = lambda))), x0) })\n",
+ "res1 <- data.frame(res1)\n",
+ "ncomps <- seq(1, 45, 2)\n",
+ "res2 <- sapply(ncomps, function(ncomp) {\n",
+ " bias.and.variance(beta, data.frame(replicate(500, pcr.fit(data.generator(beta), ncomp = ncomp))), x0) })\n",
+ "res2 <- data.frame(res2)\n",
+ "\n",
+ "par(mfrow = c(1, 2))\n",
+ "plot(log10(lambdas), unlist(res1[1,]), main = \"Lasso\", type = 'l', col = 'red', ylim = c(0, 20), ylab = 'MSE')\n",
+ "lines(log10(lambdas), unlist(res1[2,]), col = 'blue')\n",
+ "lines(log10(lambdas), unlist(res1[3,]))\n",
+ "legend(\"topleft\", c(\"bias^2\", \"variance\", \"test\"),\n",
+ " col = c(\"blue\", \"black\", \"red\"), lty = 1)\n",
+ "\n",
+ "plot(ncomps, unlist(res2[1,]), main = \"PCR\", type = 'l', col = 'red', ylim = c(0, 20), ylab = \"MSE\")\n",
+ "lines(ncomps, unlist(res2[2,]), col = 'blue')\n",
+ "lines(ncomps, unlist(res2[3,]))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "The plots above show typical curves for bias, variance, test error as a function\n",
+ "of the flexibility of the method. The Lasso is most flexible for `lambda = 1` (=\n",
+ "unregularised linear regression) and we see that this is the case where the bias\n",
+ "is the smallest but the variance is highest. PCR is most flexible\n",
+ "when all components are used (= unregularised linear regression) and decreases\n",
+ "in flexibility with decreasing number of components. The lowest test error is\n",
+ "lower for the Lasso than for PCR.\n",
+ "\n",
+ "Next, we run the same analysis for data with 10-dimensional predictors embedded\n",
+ "in a 45-dimensional space (if you want to visualize this situation think of data\n",
+ "points that lie in a plane that is arbitrarily oriented in 3D space)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "outputs": [],
+ "source": [
+ "set.seed(123)\n",
+ "beta <- rnorm(45)\n",
+ "P <- matrix(rnorm(10*45), nrow = 10) # this transforms the predictors from 10 to 45 dimensions.\n",
+ "x0 <- data.generator(beta, n = 500, P = P)$x\n",
+ "lambdas <- 10^seq(-5, 0, length = 20)\n",
+ "res1 <- sapply(lambdas, function(lambda) {\n",
+ " bias.and.variance(beta, data.frame(replicate(500, glm.fit(data.generator(beta, P = P), alpha = 1, lambda = lambda))), x0) })\n",
+ "res1 <- data.frame(res1)\n",
+ "ncomps <- seq(1, 45, 3)\n",
+ "res2 <- sapply(ncomps, function(ncomp) {\n",
+ " bias.and.variance(beta, data.frame(replicate(500, pcr.fit(data.generator(beta, P = P), ncomp = ncomp))), x0) })\n",
+ "res2 <- data.frame(res2)\n",
+ "\n",
+ "par(mfrow = c(1, 2))\n",
+ "plot(log10(lambdas), unlist(res1[1,]), main = \"Lasso\", type = 'l', col = 'red', ylim = c(0, 5), ylab = 'MSE')\n",
+ "lines(log10(lambdas), unlist(res1[2,]), col = 'blue')\n",
+ "lines(log10(lambdas), unlist(res1[3,]))\n",
+ "legend(\"topleft\", c(\"bias^2\", \"variance\", \"test\"),\n",
+ " col = c(\"blue\", \"black\", \"red\"), lty = 1)\n",
+ "\n",
+ "plot(ncomps, unlist(res2[1,]), main = \"PCR\", type = 'l', col = 'red', ylim = c(0, 5), ylab = \"MSE\")\n",
+ "lines(ncomps, unlist(res2[2,]), col = 'blue')\n",
+ "lines(ncomps, unlist(res2[3,]))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "As expected, PCR has zero bias as soon as at least 10 components are included.\n",
+ "Including more than 10 components leads to higher variance.\n",
+ "The Lasso has the same minimal test error for many values of lambda.\n",
+ "\n",
+ "## Exercises\n",
+ "\n",
+ "### Conceptual\n",
+ "\n",
+ "**Q1.**\n",
+ "\n",
+ "(a) Estimate from the figure below the principal component loadings $\\phi_{11}, \\phi_{21}, \\phi_{12}, \\phi_{22}$."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "set.seed(1)\n",
+ "data = data.frame(as.matrix(data.frame(X1 = rnorm(400, sd = 6), X2 = rnorm(400, sd = 4))) %*% matrix(rnorm(4), 2, 2))\n",
+ "data = scale(data, scale = F)\n",
+ "plot(data, xlim = c(-30, 30), ylim = c(-15, 15))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "(b) In the figure below you see the scores of $n = 40$ measuments and the loadings of a principal component analysis. How many features $p$ were measured in this data set?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "set.seed(913)\n",
+ "data = data.frame(as.matrix(data.frame(X1 = rnorm(40, sd = 6), X2 = rnorm(40, sd = 4), X3 = rnorm(40, sd = 2), X4 = rnorm(40, sd = 2))) %*% matrix(rnorm(16), 4, 4))\n",
+ "pca = prcomp(data)\n",
+ "biplot(pca, scale = 0)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "(c) Estimate from the figure in (b) the loadings of the first two principal components and the scores of data point 28.\n",
+ "\n",
+ "(d) In which direction does the data in the figure in (b) vary the most?\n",
+ "\n",
+ "**Q2.** In this exercise you will explore the connection between PCA and SVD.\n",
+ "\n",
+ "(a) In the lecture we said that the first principal component is the eigenvector $\\phi_1$ with the largest eigenvalue $\\lambda_1$ of the matrix $X^T X$. From linear algebra you know that a real, symmetric matrix $A$ can be diagonalized such that $A = W \\Lambda W^T$ holds, where $\\Lambda$ is a diagonal matrix that contains the eigenvalues and $W$ is an orthogonal matrix that satisfies $W^T W = I$, where $I$ is the identity matrix. The columns of $W$ are the eigenvectors of $A$. Let us assume we have found the diagonalization $X^T X = W \\Lambda W^T$. Multiply this equation from the right with $W$ and show that the columns of $W$ contain the eigenvectors of $X^T X$.\n",
+ "\n",
+ "(b) The singular value decomposition (SVD) of $X$ is $X = U\\Sigma V^T$ where $U$ and $V$ are orthogonal matrices and $\\Sigma$ is a diagonal matrix. Use the singular value decomposition to express the right-hand-side of $X^TX = ...$ in terms of $U$, $V$ and $\\Sigma$, show how $U, V, \\Sigma, W$ and $\\Lambda$ relate to each other and prove the statements on slide 11 of the lecture. *Hint*: the product of two diagonal matrices is again a diagonal matrix."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Applied\n",
+ "\n",
+ "**Q3.** Look at the following three artificial data sets."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data1 <- data.frame(matrix(rnorm(50*2), nrow = 50) %*% matrix(rnorm(2*3), nrow = 2))\n",
+ "data2 <- data.frame(matrix(rnorm(50*2), nrow = 50) %*% matrix(rnorm(2*3), nrow = 2) + 0.1*rnorm(50*3))\n",
+ "data3 <- data.frame(matrix(rnorm(50*2), nrow = 50) %*% matrix(rnorm(2*10), nrow = 2))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "(a) Plot the datasets `data1` and `data2`. Hint: you can find an example in\n",
+ "section \"Limitations of PCA\" of this notebook.\n",
+ "\n",
+ "(b) Write down your expectations on the curve of proportion of variance\n",
+ "explained for the tree datasets.\n",
+ "\n",
+ "(c) Perform PCA and plot the proportion of variance explained.\n",
+ "\n",
+ "**Q4.** In this exercise we look at a food consumption dataset with the relative\n",
+ "consumption of certain food items in European and Scandinavian countries. The\n",
+ "numbers represent the percentage of the population consuming that food type.\n",
+ "You can load the data with"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data <- read.csv(url(\"https://openmv.net/file/food-consumption.csv\"))\n",
+ "data.sc1 <- scale(na.omit(data[,2:21]))\n",
+ "data.sc2 <- scale(data[,2:21])\n",
+ "data.sc2[is.na(data.sc2)] <- 0"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "(a) Perform PCA on this dataset and produce a two biplots: one with PC1 versus\n",
+ "PC2 and another with PC1 versus PC3. Hint: see `?biplot`.\n",
+ "\n",
+ "(b) Which are the important variables in the first three components?\n",
+ "\n",
+ "(c) What does it mean for a country to have a high score in the first component?\n",
+ "Pick one country that has a high score in the first component and verify that\n",
+ "this country does indeed have this interpretation.\n",
+ "\n",
+ "**Q5.** (optional) PCA has some nice history in human face recognition research\n",
+ "(see for example [here](https://en.wikipedia.org/wiki/Eigenface). In this\n",
+ "exercise you will look at a small dataset of face images and perform a similar\n",
+ "analysis as we did it with the image of the ship in section \"Proportion\n",
+ "Variance Explained\". However, instead of treating the rows of a single image\n",
+ "as individual data points, each face image is treated as a single data point by\n",
+ "transforming the two-dimensional images to one vector (concatenating the rows to\n",
+ "one long vector). You can load the dataset with the following code."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "download.file('https://lcnwww.epfl.ch/bio322/att_faces.zip', 'faces.zip')\n",
+ "unzip('faces.zip')\n",
+ "library(png)\n",
+ "faces <- sapply(list.files('att_faces', full.names = T), readPNG)\n",
+ "data <- t(faces)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 0
+ },
+ "source": [
+ "To look at a single image you can use the following function"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "show.image <- function(x) image(t(matrix(x, nrow = 112))[,112:1], col = gray.colors(256), axes = F, asp = 1)\n",
+ "show.image(data[101,])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "(a) Perform PCA on this dataset and plot the first few principal components by\n",
+ "using the `show.image` function above.\n",
+ "\n",
+ "(b) Plot the curve of the proportion of variance explained.\n",
+ "\n",
+ "(c) Plot some of the original images together with their low-dimensional\n",
+ "reconstruction based on a reasonable number of principal components."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}