diff --git a/ME-390-Homework1/Solutions.ipynb b/ME-390-Homework1/Solutions.ipynb new file mode 100644 index 0000000..076e079 --- /dev/null +++ b/ME-390-Homework1/Solutions.ipynb @@ -0,0 +1,1180 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Homework 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this homework, we apply a linear regression to learn the dynamics of a system. \n", + "\n", + "
\n", + "\n", + "This homework is part of a series of homeworks for the ME-390-EPFL Foundations of Artificial Intelligence course at EPFL. Copyright (c) 2022 [Sycamore](https://www.epfl.ch/labs/sycamore/) lab at EPFL.\n", + "Use of this source code is governed by an MIT-style license that can be found in the LICENSE file or at https://www.opensource.org/licenses/MIT\n", + "\n", + "**Author(s):** Tony Wood, Andreas Schlaginhaufen, Loris di Natale" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Name and SCIPER" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Name:** YOUR NAME HERE\n", + "\n", + "**SCIPER:** YOUR SCIPER HERE" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Homework info \n", + "**Released:** Monday, October 10, 2022 \n", + "**Submission**: Wednesday October 19, 2022 (before 11:59PM) on Moodle \n", + "**Grade weight:** 10% of the overall grade \n", + "**This is an individual assignment, the exchange of code between students is forbidden.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This homework is composed of 4 parts: \n", + "- Part 1: Data loading & pre-processing (2 pts)\n", + "- Part 2: Linear regression (3 pts)\n", + "- Part 3: Regularization (3 pts)\n", + "- Part 4: Analysis and conclusions (2 pts)\n", + "\n", + "**Total:** 10 pts\n", + "\n", + "**Work through this exercise in order, as functions implemented early on can be used for later parts.**\n", + "\n", + "**Instructions:**\n", + "- Work on your own file: create a copy of this notebook. Rename the copy by putting your SCIPER (name it **SCIPER_homework1.ipynb**). To do so, right click on the notebook in the left sidebar and select **\"Rename\"**. For example, if your SCIPER is 123456, then name your notebook *123456_homework1.ipynb* .\n", + "- Use the Table of Contents extension to quickly navigate to each part.\n", + "- In order to ensure that there are no hidden states in your notebook, frequently restart the kernel using: **Kernel -> Restart Kernel and Run all Selected Cells...**\n", + "\n", + "Good luck!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "## Task description\n", + "\n", + "In this homework, you are asked to learn the model of a controlled ground robot from measurement data. The state of the robot is represented by its pose, $\\mathbf{x}(t) := \\begin{bmatrix}p_0(t),p_1(t),\\varphi(t)\\end{bmatrix}^\\top \\in \\mathbb{R}^3$, where $p_0$ and $p_1$ are position coordinates and $\\varphi$ is the orientation of the robot. The model of the robot is given by the following ordinary differential equation, \n", + "\n", + "$$ \\dot{\\mathbf{x}}(t) = \\begin{bmatrix} \\dot{p}_0(t) \\\\ \\dot{p}_1(t) \\\\ \\dot{\\varphi}(t) \\end{bmatrix} = f(\\mathbf{x}(t)) = \\begin{bmatrix} \n", + "f_{p_0}(\\mathbf{x}(t)) \\\\\n", + "f_{p_1}(\\mathbf{x}(t)) \\\\\n", + "f_{\\varphi}(\\mathbf{x}(t)) \n", + "\\end{bmatrix}, $$ \n", + "\n", + "for some unknown $f:\\mathbb{R}^3\\to\\mathbb{R}^3$ that you want to identify. You can assume that each component of $f$ can be represented by a linear feature model $f_i(\\mathbf{x}; {\\mathbf{w}}_i)=\\mathbf{w}_i^\\top\\mathbf{\\Phi}(\\mathbf{x})$ for $i=1,2,3$, where $\\mathbf{w}_i^\\top$ denotes the transpose of the vector $\\mathbf{w}_i$. Moreover, $\\mathbf{\\Phi}(\\mathbf{x})=\\begin{bmatrix}1, p_0,p_1,\\varphi, \\cos(\\varphi), \\sin(\\varphi)\\end{bmatrix}^\\top\\in\\mathbb{R}^{6}$ is the feature vector and $\\mathbf{w}_i\\in\\mathbb{R}^{6}$ for $i=1,2,3$ are the parameters for each component. \n", + "\n", + "Your task is to use linear regression to learn the optimal model parameters $\\mathbf{w}^*_i$ for $i=1,2,3$." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "## Part 0: Imports" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "### 0.1 Imports and helpers for plotting\n", + "Before starting the exercise, we need to generate the data to train our models\n", + "\n", + "Run this code to import packages and customize the settings of matplotlib to create beautiful plots." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib\n", + "import numpy as np\n", + "import pandas as pd\n", + "from typing import Tuple\n", + "\n", + "#this next line is only needed in iPython notebooks\n", + "%matplotlib inline \n", + "import math\n", + "import matplotlib.font_manager as fm\n", + "font = fm.FontProperties(size = 12)\n", + "matplotlib.rcParams['pdf.fonttype'] = 42\n", + "matplotlib.rcParams['ps.fonttype'] = 42" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 0.2 Import helper functions\n", + "\n", + "To help you throughout this homework, we prepared a few helper functions. You can find them in `functions.py`. \n", + "Feel free to browse this file - you shouln't need to edit anything.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from functions import _fix_seed, generate_trajectories, plot_trajectories, generate_X_Y, separate_trajectories, plot_data, predict_and_plot" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 1: Data generation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let us define the state of the robot as $\\mathbf{x} := \\begin{bmatrix}p_0,p_1,\\varphi\\end{bmatrix}^\\top$, where $p_0$ and $p_1$ are position coordinates and $\\varphi$ is the orientation of the robot.\n", + "\n", + "The true behaviour of the differential wheeled robot are given by the following equations, \n", + "$$ \\dot{\\mathbf{x}} = \\begin{bmatrix} \\dot{p}_0 \\\\ \\dot{p}_1 \\\\ \\dot{\\varphi} \\end{bmatrix} = f(\\mathbf{x}) = \\begin{bmatrix} \n", + "f_{p_0}(\\mathbf{x}) \\\\\n", + "f_{p_1}(\\mathbf{x}) \\\\\n", + "f_{\\varphi}(\\mathbf{x}) \n", + "\\end{bmatrix} = \\begin{bmatrix} v \\cdot \\cos(\\varphi) \\\\ v \\cdot \\sin(\\varphi) \\\\ \\omega \\end{bmatrix}$$\n", + "where the speed, $ v$, and the turn rate, $\\omega$, are kept constant. The training data will consist of noisy state measurements sampled along different trajectories of the system.\n", + "\n", + "Note: A trajectory of the robot refers to the sequence of states $p_0(t), p_1(t), \\varphi(t)$ generated from a given initial condition $p_0(0), p_1(0), \\varphi(0)$ over a time interval $t \\in [0, T]$. \n", + "\n", + "The following simulation data is generated with parameter values, \n", + "$$ v = 0.5 \\textrm{m/s}, $$\n", + "$$ \\omega = - \\pi / 3.$$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Question:** What would the trajectories look like in the 2 dimensional space, $p_0$, $p_1$? " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Answer:** They would be a circle." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.1 Simulating the System: train and test data\n", + "\n", + "Using the equations above and the helper functions `generate_trajectories` and `plot_trajectories`, you can generate a train and a test dataset as follows. \n", + "\n", + "Note that we use `_fix_seed` to fix a random seed. This ensures reproducibility, i.e. all random operations can be repeated. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Train data\n", + "We generate trajectories of the robot from different initial conditions. The state variables are measured with some noise. Hence, we have noisy measurements of $p_0(t), p_1(t), \\varphi(t)$ starting from a random initial condition $p_0(0), p_1(0), \\varphi(0)$, over a time interval $t \\in [0,T]$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#####\n", + "_fix_seed(7) # DO NOT CHANGE FOR SUBMISSIONS _fix_seed(7)\n", + "#####\n", + "\n", + "number_trajectories_train = 10\n", + "noise = 0.001\n", + "\n", + "\n", + "# Generate the train data \n", + "# This returns a set of trajectories (it is in the pandas dataframe format and can be converted to numpy)\n", + "dfs_train = generate_trajectories(number_trajectories_train, noise)\n", + "\n", + "print(f'\\nNumber of dataframes:\\t{len(dfs_train)}')\n", + "print(f'Shape of one dataframe:\\t{dfs_train[0].shape}\\n')\n", + "\n", + "display(dfs_train[1].head())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Test data\n", + "\n", + "We generate the trajectories of the robot the same way as we generated the training set but from a different initial condition and different noise sequence. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#####\n", + "_fix_seed(13434252) #DO NOT CHANGE FOR SUBMISSIONS _fix_seed(13434252)\n", + "#####\n", + "\n", + "number_trajectories_test = 10\n", + "noise = 0.001\n", + "\n", + "# Generate test data\n", + "dfs_test = generate_trajectories(number_trajectories_test, noise)\n", + "\n", + "# Plot the last train and test trajectory - just one trajectory for each\n", + "plot_trajectories([dfs_train[-1], dfs_test[-1]], labels=['Train', 'Test'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.2 Generate the targets\n", + "\n", + "As you can see, each trajectory contains information about the time step and state, i.e. $\\lbrace(t^i, \\mathbf{x}^i)\\rbrace_{i=1}^N$ with $ \\mathbf{x}^i:= \\mathbf{x}(t^i)$ sampled at an equal rate $\\Delta t=t_{i+1} - t_{i}$. \n", + "\n", + "However, recall that we want to approximate $f$, i.e. the derivative $\\dot{x}$, which we don't have in the data. Hence, you first need to generate the targets $\\mathbf{y}^i := f(\\mathbf{x}^i) = \\dot{\\mathbf{x}}(t^i) $. \n", + "Since the true $\\dot{\\mathbf{x}}(t^i)$ is unknown, you can approximate it via the finite difference:\n", + "$$\\dot{\\mathbf{x}}(t^i)\\approx \\dfrac{\\mathbf{x}(t^{i+1})-\\mathbf{x}(t^{i})}{t^{i+1}-t^i}.$$\n", + "\n", + "Your first exercise is to complete the function below to generate these targets (you can drop the last data point of each trajectory for simplicity, because you don't know $x^{N+1}$)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def generate_target(df: pd.DataFrame) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:\n", + " \"\"\"\n", + " Generate targets via finite differences.\n", + " \n", + " Args:\n", + " df (pd.DataFrame): Dataframe containing trajectory data\n", + " \n", + " Returns:\n", + " X (np.ndarray): State measurements, Array of shape (trajectory_length-1, 3)\n", + " Y (np.ndarray): Finite differences, Array of shape (trajectory_length-1, 3)\n", + " t (np.ndarray): Time vector, Array of shape (trajectory_length-1,)\n", + " \"\"\"\n", + " # here, we convert the dataframe to numpy format\n", + " X = df[['p_0', 'p_1', 'phi']].to_numpy()\n", + " t = df['t'].to_numpy()\n", + " Y = np.zeros((len(t)-1, 3))\n", + " \n", + " \n", + " ### START CODE HERE ### (≈ 3 lines of code)\n", + " # Hint: review numpy indexing array\n", + " # X = ...\n", + " # Y = ...\n", + " # t = ...\n", + " ### END CODE HERE ###\n", + " \n", + " # Solution:\n", + " X_plus = X[1:, :]\n", + " X = X[:-1, :]\n", + " t = t[:-1]\n", + " Y = (X_plus - X)/(t[1]-t[0])\n", + " \n", + " return X, Y, t" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can now use the provided `generate_X_Y` function to generate the X and Y that we will use later, both for the training and test data. Note that this function merges the 10 training trajectories in a single dataframe. It does the same for the test trajectories. For visualization, later on, we separate the test trajectories into 10 sets. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Generate the data and have a look at the evaluation data we created\n", + "df_train, df_test = generate_X_Y(dfs_train, dfs_test, generate_target)\n", + "\n", + "# We now have the time, the 3 states (X), and the derivatives (Y) together\n", + "print('\\nShape of the test data:', df_test.shape, '\\n')\n", + "df_test.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As expected, this added the needed (approximate) derivatives to the data. This allows us to define X and Y below. \n", + "Note that we use a helper function to separate the trajectories of the test set to provide nicer plots in the rest of the notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Use the entire data for training\n", + "X_train = df_train[['p0', 'p1', 'phi']].to_numpy()\n", + "Y_train = df_train[['dp0dt', 'dp1dt', 'dphidt']].to_numpy()\n", + "X_test = df_test[['p0', 'p1', 'phi']].to_numpy()\n", + "Y_test = df_test[['dp0dt', 'dp1dt', 'dphidt']].to_numpy()\n", + "\n", + "# For plotting, we only consider one trajectory at a time\n", + "# This function separates the test data into separate trajectories\n", + "t_test_trajs, X_test_trajs, Y_test_trajs = separate_trajectories(df_test, number_trajectories_test)\n", + "\n", + "print(f'We have {len(X_test_trajs)} test trajectories')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.3 Data Preview\n", + "We can now check what X and Y look like, for example on the fifth test trajectory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "trajectory = 2\n", + "\n", + "plot_data(t_test_trajs[trajectory], X_test_trajs[trajectory], Y_test_trajs[trajectory])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "## Part 2: Linear Regression: preliminaries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this part, you are required to learn the model parameters via linear regression." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.1 Pre-compute the features \n", + "First, you need to implement the feature map:\n", + "$$\\mathbf{X}:=\\begin{bmatrix} {\\mathbf{x}^1}^\\top \\\\ \\vdots \\\\ {\\mathbf{x}^N}^\\top \\end{bmatrix} \\mapsto \\mathbf{\\Phi}(\\mathbf{X}):=\\begin{bmatrix} \\mathbf{\\Phi}(\\mathbf{x}^1)^\\top \\\\ \\vdots \\\\ \\mathbf{\\Phi}(\\mathbf{x}^N)^\\top \\end{bmatrix}.$$\n", + "Recall, $\\mathbf{x}^\\top$ denotes the transpose of the vector $\\mathbf{x}$, $\\mathbf{X}\\in \\mathbb{R}^{N\\times 3}$ is the input data matrix, and $\\mathbf{\\Phi}(\\mathbf{x})=\\begin{bmatrix}1,p_0,p_1,\\varphi, \\cos(\\varphi), \\sin(\\varphi)\\end{bmatrix}^\\top\\in\\mathbb{R}^{6}$." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Complete the code below (element-wise numpy operations may be useful):\n", + "def feature_map(X: np.ndarray) -> np.ndarray:\n", + " \"\"\" \n", + " Generates feature map from input data matrix.\n", + " \n", + " Args: \n", + " X (np.ndarray): Input data matrix, shape (N,3)\n", + " \n", + " Returns:\n", + " Phi(X) (np.ndarray): Feature matrix, shape (N,6)\n", + " \"\"\"\n", + " \n", + " \n", + " ### START CODE HERE ### (≈ 3 lines of code)\n", + " # ...\n", + " # feature_matrix = ...\n", + " ### END CODE HERE ###\n", + " \n", + " # solution:\n", + " p_0 = X[:,0]\n", + " p_1 = X[:,1]\n", + " phi = X[:,2]\n", + " cos_phi = np.cos(phi)\n", + " sin_phi = np.sin(phi)\n", + " \n", + " return np.stack([np.ones_like(p_0), p_0, p_1, phi, cos_phi, sin_phi], axis=1)\n", + "\n", + "nfeatures = feature_map(X_train).shape[1]\n", + "\n", + "print(f'\\nOriginal shape of X: {X_train.shape}')\n", + "print(f'Shape after the feature map has been applied: {feature_map(X_train).shape}\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "### 2.2 Define the linear model\n", + "\n", + "Now that the feature map is defined, you can use it to define your linear model $f_i(\\mathbf{x}; {\\mathbf{w}}_i)=\\mathbf{w}_i^\\top\\mathbf{\\Phi}(\\mathbf{x})$ for $i=1,2,3$. \n", + "Note that the model is the same for each component, only the parameter $w_i$ differs, so you only need to define one $f$ below.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Question:** Expand the term: $\\mathbf{w}_1^\\top\\mathbf{\\Phi}(\\mathbf{x})$ into its 6 components. You may use $\\mathbf{w}_{1,j}$ to refer to $j$-th element of $\\mathbf{w}_1$." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Answer:** $\\mathbf{w}_1^\\top\\mathbf{\\Phi}(\\mathbf{x}) = \\mathbf{w}_{1,0}1 + \\mathbf{w}_{1,1}p_0 + \\mathbf{w}_{1,2}p_1 + \\mathbf{w}_{1,3} \\varphi + \\mathbf{w}_{1,4} \\cos(\\varphi) + \\mathbf{w}_{1,5} \\sin(\\varphi)$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# define model and parameters (including bias)\n", + "w_1 = np.zeros(nfeatures)\n", + "w_2 = np.zeros(nfeatures)\n", + "w_3 = np.zeros(nfeatures)\n", + "\n", + "# implement the linear model\n", + "def f(X: np.ndarray, w: np.ndarray) -> np.ndarray:\n", + " \"\"\"\n", + " Linear regression model.\n", + " \n", + " Args:\n", + " X (np.ndarray): Input data matrix, shape (N,3)\n", + " w (np.ndarray): Weight vector, shape (6,)\n", + " \n", + " Returns:\n", + " f(X, w) (np.ndarray): Predicted targets, shape (N,)\n", + " \"\"\"\n", + " \n", + " ### START CODE HERE ### (≈ 1 lines of code)\n", + " # ...\n", + " ### END CODE HERE ###\n", + "\n", + " # solution:\n", + " return feature_map(X) @ w\n", + "\n", + "print(f(X_train, w_1).shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You will learn one vector of weigths for each dimension, we hence separate the targets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Split target into the three components to predict separately\n", + "y_1_train = Y_train[:,0]\n", + "y_2_train = Y_train[:,1]\n", + "y_3_train = Y_train[:,2]\n", + "\n", + "y_1_test = Y_test[:,0]\n", + "y_2_test = Y_test[:,1]\n", + "y_3_test = Y_test[:,2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.3 Mean Square Error\n", + "You are now asked to implement a method to compute the Mean Squared Error (MSE) $J$, which is defined as:\n", + "$$\n", + "\\begin{align}\n", + "J(\\mathbf{w}_i) = \\frac{1}{N} (\\mathbf{\\Phi}(\\mathbf{X}) \\mathbf{w}_i-\\mathbf{y})^{T} (\\mathbf{\\Phi}(\\mathbf{X}) \\mathbf{w}_i-\\mathbf{y}).\n", + "\\end{align}$$\n", + "Above $\\mathbf{\\Phi}(\\mathbf{X}) \\in \\mathbb{R}^{N \\times 6}$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def mse_loss(X: np.ndarray, y: np.ndarray, w: np.ndarray) -> float:\n", + " \"\"\"Computes the Mean Square Error (MSE)\n", + " \n", + " Args:\n", + " X (np.ndarray): Dataset of shape (N, 3)\n", + " y (np.ndarray): Labels of shape (N, )\n", + " w (np.ndarray): Weights of shape (6, )\n", + "\n", + " Returns:\n", + " float: the MSE loss\n", + " \"\"\"\n", + " ### START CODE HERE ### (≈ 3 lines of code)\n", + " # ...\n", + " ### END CODE HERE ###\n", + " \n", + " # Solution\n", + " N = y.shape[0]\n", + " y_hat = f(X, w)\n", + " loss = (1/ N) * ((y_hat - y).T @ (y_hat - y))\n", + " \n", + " return loss\n", + "\n", + "# Test it on the first training dimension with the weight initialized as 0:\n", + "print(f'Loss on dimension 1 with w1=0: {mse_loss(X_train, y_1_train, w_1): .3f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.4: Least squares method\n", + "Estimate the optimal parameters $\\mathbf{w}_i^*$ via the least squares method seen in the exercises." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def least_squares(X: np.ndarray, y: np.ndarray) -> np.ndarray:\n", + " \"\"\"Solves linear regression using least squares\n", + "\n", + " Args:\n", + " X: Data of shape (N, 3)\n", + " y: Labels of shape (N, )\n", + "\n", + " Returns:\n", + " Weight parameters of shape (6, )\n", + " \"\"\"\n", + "\n", + " ### START CODE HERE ### (≈ 1 line of code)\n", + " # ...\n", + " ### END CODE HERE ###\n", + " \n", + " # Solution\n", + " w = np.linalg.solve(feature_map(X).T @ feature_map(X), feature_map(X).T @ y)\n", + " \n", + " return w" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use your function to train w1, w2, and w3. You can then use the provided `predict_and_plot` function to check your implementations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "w_1 = least_squares(X_train, y_1_train)\n", + "w_2 = least_squares(X_train, y_2_train)\n", + "w_3 = least_squares(X_train, y_3_train)\n", + "\n", + "# print the train error for each dimension\n", + "print(f'Train error on dimension 1: {mse_loss(X_train, y_1_train, w_1): .5f}')\n", + "print(f'Test error dimension 1: {mse_loss(X_test, y_1_test, w_1): .5f}')\n", + "print(f'Train error on dimension 2: {mse_loss(X_train, y_2_train, w_2): .5f}')\n", + "print(f'Test error dimension 2: {mse_loss(X_test, y_2_test, w_2): .5f}')\n", + "print(f'Train error on dimension 3 : {mse_loss(X_train, y_3_train, w_3): .5f}')\n", + "print(f'Test error dimension 3: {mse_loss(X_test, y_3_test, w_3): .5f}')\n", + "\n", + "# Plot the prediction along the first test trajectory as an example\n", + "print('Test prediction:')\n", + "for i in range(1):\n", + " predict_and_plot(X_test_trajs[i], Y_test_trajs[i], t_test_trajs[i], w_1, w_2, w_3, feature_map)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "**Question:** Is the model overfitting, underfitting or performing well? " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "**Answer:** \n", + "We can make the following two observations:\n", + "- Firstly, our linear model fits the data well, as evident in the plots above and below. Since it can recover the true trajectories accurately, we know the model is not underfitting.\n", + "- On the other hand, we can also observe that the training and testing errors are roughly equivalent. This indicates that our model is not overfitting the training data and still generalizes well enough. \n", + "\n", + "We can hence conclude that the model is performing well, it neither over- nor underfits the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# You can also plot several trajectories together, but it becomes hard to see the difference\n", + "predict_and_plot(X_test_trajs[:5], Y_test_trajs[:5], t_test_trajs[:5], w_1, w_2, w_3, feature_map)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "## Part 3: Ridge regression\n", + "\n", + "Similarly to the classical least square regression above, you are now tasked to implement the Ridge regression" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "**Question:** Explain the motivation for regularization in a supervised learning problem. How would you expect the regularization influence the generalization for this specific problem? " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Answer**: \n", + "In supervised learning problems, regularization helps to avoid overfitting. \n", + "The main idea is to forcing the learned model to be as simple as possible - simpler explanations are indeed better in general. When one implements regularization, the hope is hence to increase the generalization capability of the model, not letting it specialize too much to the training data. \n", + "\n", + "In our case, since the model is performing well - in particular, it is not overfitting -, we don't expect a lot of improvement with the use of regularization. On the contrary, if we try to regularize it too much, we expect the model performance to decrease: if we force the model to be too simple (i.e. the weights to be very small in the case of Ridge regression), it might not be able to fit the data well anymore!\n", + "\n", + "__Note__: Generally, regularization does **not** help avoid underfitting! \n", + "Indeed, if your model is underfitting the data, it means it is not able to capture the true behavior of the system. Since regularization would add additional constraints on the model, forcing it to be even simpler, it can only worsen its performnce in general and lead to even more underfitting. In that case, the only solution is usually to use more complex models to fit the data better. In this exercise, we could for example add additional terms to the feature map to complexify the model and hope to capture the behavior of the robot better. \n", + "This is however not needed since the fit is already good!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "### 3.1 Ridge regularized MSE loss function\n", + "\n", + "**Question:** For a component $i$, write the MSE loss function with ridge regularization\n", + "\n", + "**Answer:**\n", + "$$\n", + "\\begin{align}\n", + "J_i(\\mathbf{w}_i) = \\frac{1}{N} (\\mathbf{\\Phi}(\\mathbf{X}) \\mathbf{w}_i-\\mathbf{y})^{T} (\\mathbf{\\Phi}(\\mathbf{X}) \\mathbf{w}_i-\\mathbf{y}) + \\lambda \\begin{bmatrix}\\mathbf{w}_{i,1}& \\dots &\\mathbf{w}_{i,5} \\end{bmatrix}^\\top\\begin{bmatrix}\\mathbf{w}_{i,1}\\\\ \\vdots \\\\ \\mathbf{w}_{i,5} \\end{bmatrix}.\n", + "\\end{align}$$\n", + "\n", + "Implement the ridge regression in the code below. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def ridge_regression(X: np.ndarray, y: np.ndarray, lambda_: float) -> np.ndarray:\n", + " \"\"\"Solves linear regression using least squares\n", + "\n", + " Args:\n", + " X: Data of shape (N, 3)\n", + " y: Labels of shape (N, )\n", + " lambda_: regularization parameter\n", + "\n", + " Returns:\n", + " Weight parameters of shape (6, )\n", + " \"\"\"\n", + "\n", + " ### START CODE HERE ### (≈ 4 line of code)\n", + " # ...\n", + " ### END CODE HERE ###\n", + " \n", + " # Solution\n", + " N = len(y)\n", + " V = np.eye(nfeatures)\n", + " V[0,0] = 0\n", + " w = np.linalg.solve(feature_map(X).T @ feature_map(X) + N*lambda_*V, feature_map(X).T @ y)\n", + " \n", + " return w\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And train for a set of ridge coefficients $\\lambda$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "lambdas = [1e-4, 1e-2, 1.0, 1e2]\n", + "w_1_ridge = []\n", + "w_2_ridge = []\n", + "w_3_ridge = []\n", + "for i, lambda_ in enumerate(lambdas):\n", + " print('Lambda: ', lambda_)\n", + " \n", + " w_1_ridge.append(ridge_regression(X_train, y_1_train, lambda_))\n", + " w_2_ridge.append(ridge_regression(X_train, y_2_train, lambda_))\n", + " w_3_ridge.append(ridge_regression(X_train, y_3_train, lambda_))\n", + "\n", + " # print the train error for each dimension\n", + " print(f'Train error on dimension 1: {mse_loss(X_train, y_1_train, w_1_ridge[i]): .5f}')\n", + " print(f'Test error dimension 1: {mse_loss(X_test, y_1_test, w_1_ridge[i]): .5f}')\n", + " print(f'Train error on dimension 2: {mse_loss(X_train, y_2_train, w_2_ridge[i]): .5f}')\n", + " print(f'Test error dimension 2: {mse_loss(X_test, y_2_test, w_2_ridge[i]): .5f}')\n", + " print(f'Train error on dimension 3 : {mse_loss(X_train, y_3_train, w_3_ridge[i]): .5f}')\n", + " print(f'Test error dimension 3: {mse_loss(X_test, y_3_test, w_3_ridge[i]): .5f}')\n", + "\n", + " print('Prediction:')\n", + " # Plot prediction along first trajectory\n", + " for j in range(1):\n", + " predict_and_plot(X_test_trajs[j], Y_test_trajs[j], t_test_trajs[j], w_1_ridge[i], w_2_ridge[i], w_3_ridge[i], feature_map)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2 Print learned weight vectors\n", + "\n", + "Compare the learned weight vectors $\\mathbf{w}_i$ for various values of $\\lambda$." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "### Weigt vectors\n", + "\n", + "print('w_1:')\n", + "# w_1\n", + "print('lambda = 0.00: ', w_1)\n", + "for (i, lambda_) in enumerate(lambdas):\n", + " print(f'lambda = {lambda_:.2e}: ', w_1_ridge[i])\n", + "\n", + "print('\\nw_2:')\n", + "# w_2\n", + "print('lambda = 0.00: ', w_2)\n", + "for (i, lambda_) in enumerate(lambdas):\n", + " print(f'lambda = {lambda_:.2e}: ', w_2_ridge[i])\n", + "\n", + "print('\\nw_3:')\n", + "# w_1\n", + "print('lambda = 0.00: ', w_3)\n", + "for (i, lambda_) in enumerate(lambdas):\n", + " print(f'lambda = {lambda_:.2e}: ', w_3_ridge[i])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.3 Norm of parameters\n", + "\n", + "Compare $||\\mathbf{w}_i||_2$ for the different choices of the regularization parameter $\\lambda$." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "### Two norm of weights\n", + "\n", + "print('norm of w_1:')\n", + "# w_1\n", + "print('lambda = 0.00: ', f'{math.sqrt(w_1.T @ w_1):.5f}')\n", + "for (i, lambda_) in enumerate(lambdas):\n", + " print(f'lambda = {lambda_:.2e}: ', f'{math.sqrt(w_1_ridge[i].T @ w_1_ridge[i]):.5f}')\n", + "\n", + "print('\\nnorm of w_2:')\n", + "# w_1\n", + "print('lambda = 0.00: ', f'{math.sqrt(w_2.T @ w_2):.5f}')\n", + "for (i, lambda_) in enumerate(lambdas):\n", + " print(f'lambda = {lambda_:.2e}: ', f'{math.sqrt(w_2_ridge[i].T @ w_2_ridge[i]):.5f}')\n", + " \n", + "print('\\nnorm of w_3:')\n", + "# w_1\n", + "print('lambda = 0.00: ', f'{math.sqrt(w_3.T @ w_3):.5f}')\n", + "for (i, lambda_) in enumerate(lambdas):\n", + " print(f'lambda = {lambda_:.2e}: ', f'{math.sqrt(w_3_ridge[i].T @ w_3_ridge[i]):.5f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 4: Analysis and remarks" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Feel free to add a few cells below to illustrate you answers." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Question**: What is the impact of $\\lambda$ in the Ridge regresion?\n", + "\n", + "**Solution**: \n", + "From the results above, we can draw two main conclusions:\n", + "1. The larger $\\lambda$, the lower the norm of the corresponding $w_1$ and $w_2$ (Section 3.3).\n", + "2. If $\\lambda$ is chosen too large, it constrains the learning too much, leading to poor solutions (Section 3.1).\n", + "\n", + "Both are expected in our case:\n", + "1. The additional penalty in Ridge regression is meant to reduce the norm of the model parameters. With a larger $\\lambda$, the best way for the model to decrease the loss is to decrease the values of $w_1$, $w_2$. This leads to a simpler model with small coefficients.\n", + "2. Since our model was not overfitting in the first place, as mentioned in Section 2.4, regularization cannot help much. This is indeed confirmed by the results above, where we can see the loss (both for the training and testing data!) increase with larger $\\lambda$s. Since we are trying to force the model to be simpler when it was already performing well, this leads to underfitting, and our linear regression is less and less able to capture the true behavior of the data with increasing $\\lambda$s!\n", + "\n", + "In general, $\\lambda$ is a tuning paramter, and the best practice is to try different values and then select the best one. \n", + "In this exercise, it seems that Least Squares performs very well, so we can set $\\lambda=0$, or keep it very small.\n", + "\n", + "__Note__: The third dimension behaves differently here: \n", + "As can be seen in Section 3.2, only $w_{3,0}$ has a value significantly different from $0$, even for $\\lambda=0$. This makes sense if you go back to the system equations presented in Part 1: the evolution of $\\dot\\varphi$ is a constant! \n", + "Because we decided to discard $w_{3,0}$ from the regularization, this means the Ridge regression has essentially no impact on $w_3$ since all the regularized coefficients are already almost $0$! We can indeed check that neither the norm of $w_3$ nor the loss in dimension $3$ changes significantly for different $\\lambda$s." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Bonus Question**: What is the impact of the number of trajectories in the training data? Would a single trajectory suffice to identify the system for the given feature vector? Try to justify your answer experimentally and theoretically.\n", + "\n", + "**Solution**: As plotted below, a single trajectory is not sufficient to fit the data well. \n", + "In fact, this leads to clear overfitting for the classical Least Squares regression (first results and plot)! Indeed, we can see the testing loss skyrocket and be orders of magnitude larger than the training loss. If we look closely at what happens on the plotted test trajectories, we can for example see all the trajectories converging to the same circle in dimension $1$ and $2$ - the one the model was trained on - clearly illustrating overfitting. The model can only predict this training trajectory!\n", + "\n", + "Since we now have a clear case of overfitting, we hope regularization can help - and you can see it indeed does! \n", + "Increasing $\\lambda$ not only decreases the norm of the learned parameters (and even in the third dimension), it now also decreases the testing loss significantly. Forcing the model to be simple helps us in the case. Even if it still overfits the training trajectory (compare the losses with the ones we obtained with 10 training trajectories), the model performs much better with regularization, clearly decreasing the testing loss. As can be seen from the plots, it converges to the training trajetory less aggressively, predicting changes of smaller magnitude ($\\dot{p_0}$ and $\\dot{p_1}$).\n", + "\n", + "Once $\\lambda$ is chosen too big, however, we again see the testing loss increase - the model starts to have troubles fitting the data and understanding the true trajectories are circles. At some point, for a very large regularization coefficient, the model predicts almost no movement of the robots, which is a very simple solution and fits the data fairly well - but is of course not what we expected!\n", + "\n", + "As a conclusion, $\\lambda~=0.01$ seems to be a wise choice if only one training trajectory is provided. Note that the fit is not too bad (the model fits the derivatives $\\dot{p_0}$, $\\dot{p_1}$, and $\\dot\\varphi$) even if the trajectories are a bit all over the place. This happens because we need to integrate the approximated derivatives to predict trajectories, which can amplify small errors.\n", + "\n", + "__Note__: One can theoretically show that the problem is ill-posed if only a single trajectory is used to fit the linear model. Indeed, one can prove that multiple parameters $w$ could fit the data perfectly in that case, which then leads to poor generalization over new initial conditions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "_fix_seed(23855)\n", + "\n", + "# Generate train and test data\n", + "dfs_train = generate_trajectories(number_trajectories=1, noise=0.001)\n", + "dfs_test = generate_trajectories(number_trajectories=10, noise=0.001)\n", + "\n", + "# Put it in the right form\n", + "df_train, df_test = generate_X_Y(dfs_train, dfs_test, generate_target)\n", + "\n", + "X_train = df_train[['p0', 'p1', 'phi']].to_numpy()\n", + "Y_train = df_train[['dp0dt', 'dp1dt', 'dphidt']].to_numpy()\n", + "X_test = df_test[['p0', 'p1', 'phi']].to_numpy()\n", + "Y_test = df_test[['dp0dt', 'dp1dt', 'dphidt']].to_numpy()\n", + "\n", + "# Separate the test data into separate trajectories for plots\n", + "t_train_trajs, X_train_trajs, Y_train_trajs = separate_trajectories(df_train, number_trajectories=1)\n", + "t_test_trajs, X_test_trajs, Y_test_trajs = separate_trajectories(df_test, number_trajectories=10)\n", + "\n", + "# Split target into the three components to predict separately\n", + "y_1_train = Y_train[:,0]\n", + "y_2_train = Y_train[:,1]\n", + "y_3_train = Y_train[:,2]\n", + "y_1_test = Y_test[:,0]\n", + "y_2_test = Y_test[:,1]\n", + "y_3_test = Y_test[:,2]\n", + "\n", + "# Least squares prediction\n", + "w_1 = least_squares(X_train, y_1_train)\n", + "w_2 = least_squares(X_train, y_2_train)\n", + "w_3 = least_squares(X_train, y_3_train)\n", + "\n", + "# Plot the prediction along the first test trajectory as an example\n", + "print('\\nLeast Squares\\n')\n", + "print(f'Norms of the parameters:\\nw_1: {np.linalg.norm(w_1)}\\nw_2: {np.linalg.norm(w_2)}\\nw_3: {np.linalg.norm(w_3)}\\n')\n", + "\n", + "# print the train error for each dimension\n", + "print(f'Train error on dimension 1: \\t{mse_loss(X_train, y_1_train, w_1): .5f}')\n", + "print(f'Test error dimension 1: \\t{mse_loss(X_test, y_1_test, w_1): .5f}')\n", + "print(f'Train error on dimension 2: \\t{mse_loss(X_train, y_2_train, w_2): .5f}')\n", + "print(f'Test error dimension 2: \\t{mse_loss(X_test, y_2_test, w_2): .5f}')\n", + "print(f'Train error on dimension 3 : \\t{mse_loss(X_train, y_3_train, w_3): .5f}')\n", + "print(f'Test error dimension 3: \\t{mse_loss(X_test, y_3_test, w_3): .5f}\\n')\n", + "predict_and_plot(X_test_trajs, Y_test_trajs, t_test_trajs, w_1, w_2, w_3, feature_map)\n", + " \n", + "lambdas = [1e-4, 1e-2, 1.0, 1e2]\n", + "for i, lambda_ in enumerate(lambdas):\n", + " \n", + " w_1_ridge = ridge_regression(X_train, y_1_train, lambda_)\n", + " w_2_ridge = ridge_regression(X_train, y_2_train, lambda_)\n", + " w_3_ridge = ridge_regression(X_train, y_3_train, lambda_)\n", + "\n", + " print('\\nRidge regression with lambda: ', lambda_)\n", + " print(f'\\nNorms of the parameters:\\nw_1: {np.linalg.norm(w_1_ridge)}\\nw_2: {np.linalg.norm(w_2_ridge)}\\nw_3: {np.linalg.norm(w_3_ridge)}\\n')\n", + " print(f'Train error on dimension 1: \\t{mse_loss(X_train, y_1_train, w_1_ridge): .5f}')\n", + " print(f'Test error dimension 1: \\t{mse_loss(X_test, y_1_test, w_1_ridge): .5f}')\n", + " print(f'Train error on dimension 2: \\t{mse_loss(X_train, y_2_train, w_2_ridge): .5f}')\n", + " print(f'Test error dimension 2: \\t{mse_loss(X_test, y_2_test, w_2_ridge): .5f}')\n", + " print(f'Train error on dimension 3 : \\t{mse_loss(X_train, y_3_train, w_3_ridge): .5f}')\n", + " print(f'Test error dimension 3: \\t{mse_loss(X_test, y_3_test, w_3_ridge): .5f}\\n')\n", + "\n", + " predict_and_plot(X_test_trajs, Y_test_trajs, t_test_trajs, w_1_ridge, w_2_ridge, w_3_ridge, feature_map)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "2fb559447c0254c11e787c4153a05d78", + "grade": false, + "grade_id": "cell-5ecf4fbe4a25b341", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "source": [ + "# Submitting your homework" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "49bcca144b5441119af224d342c87015", + "grade": false, + "grade_id": "cell-f78c02ebbbc1e034", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "source": [ + "### Restarting the kernel" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "002df2dced08d4d6980890744e828de5", + "grade": false, + "grade_id": "cell-28682783150761b1", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "Before submitting, make sure that there are no [hidden states](https://github.com/vita-epfl/introML-2021/blob/main/exercises/00-setup/jupyter.md#the-kernel) by restarting your kernel and running all cells. Check that the code runs without errors." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Restart your kernel and run all cells, make sure that you can reach this cell\n", + "print(\"Ran all cells :)\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "9b8dd99dbf07782e20907739dadb30e2", + "grade": false, + "grade_id": "cell-1bc8ed4282108cd2", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "### Renaming this notebook" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Rename this notebook by preprending your SCIPER (name it **SCIPER_homework1.ipynb**). To do so, right click on the notebook in the left sidebar and select **\"Rename\"**.\n", + "\n", + "For example, if your SCIPER is 123456, then name your notebook *123456_homework1.ipynb* ." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "679323ee8edb108338cc57ec7d0ae93a", + "grade": false, + "grade_id": "cell-3e14163d48b58d49", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "source": [ + "### Submitting on Moodle" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "00dbec5763f0b91a8ee009db50502e72", + "grade": false, + "grade_id": "cell-14941fad0cba56b2", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "source": [ + "Finally, make sure to save your notebook (File -> Save Notebook), so that it has your most recent changes. Then:\n", + "- If you are working on a **local** environment, you can directly upload this notebook (i.e. the file you are currently working on) to Moodle.\n", + "- If you are using a **cloud-based** environment such as **EPFL Noto**, you'll first need to get a local copy by right clicking on the notebook in the sidebar and clicking on **\"Download\"**, then upload the downloaded notebook to Moodle." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}