diff --git a/Exercises/03-linear-regression/linear_regression.ipynb b/Exercises/03-linear-regression/linear_regression.ipynb index 5980034..6b8dd8f 100644 --- a/Exercises/03-linear-regression/linear_regression.ipynb +++ b/Exercises/03-linear-regression/linear_regression.ipynb @@ -1,1319 +1,1163 @@ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This notebook is part of a series of exercises for the CIVIL-226 Introduction to Machine Learning for Engineers course at EPFL, and adapted for the ME-390. Copyright (c) 2021 [VITA](https://www.epfl.ch/labs/vita/) lab at EPFL. Use of this source code is governed by an MIT-style license that can be found in the LICENSE file or at https://www.opensource.org/licenses/MIT\n", "\n", "**Author(s):** Tom Winandy and David Mizrahi\n", "
\n" ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Function to align all tables to the left (useful for later on)" ] }, { "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "%%html\n", "" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from typing import Any, Callable\n", "\n", "# Helper file with functions for pre-processing and visualization\n", "import helpers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Intro \n", "\n", "In the first part of the exercise, you're tasked with implementing linear regression with only one variable to predict profits for a restaurant. This is known as **[simple linear regression](https://en.wikipedia.org/wiki/Simple_linear_regression)**, as opposed to **multiple linear regression** (where multiple variables are taken into account for the prediction). You'll see later on that the code implemented here will work just as well for multiple linear regression.\n", "\n", "**Question:** How does a regression problem differ from a classification problem?\n", "\n", "**Answer:** YOUR ANSWER HERE\n", "\n", "*Background: Suppose you're the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has restaurants in various cities and you have data for profits and populations from the cities. You would like to use this data to predict the profit of a restaurant based on where it opens.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Data loading & pre-processing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we'll use a dataset containing 97 restaurants, with the population of the city (in 10'000's of inhabitants) they operate in and their respective profit (in 10'000's of USD). Take a look at the file `restaurant_data.csv` and see how it's loaded by running the cell below." ] }, { "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "There are 97 rows and 2 columns.\n" - ] - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
populationprofit
06.110117.5920
15.52779.1302
28.518613.6620
37.003211.8540
45.85986.8233
\n", - "
" - ], - "text/plain": [ - " population profit\n", - "0 6.1101 17.5920\n", - "1 5.5277 9.1302\n", - "2 8.5186 13.6620\n", - "3 7.0032 11.8540\n", - "4 5.8598 6.8233" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "restaurant_df = pd.read_csv('data/restaurant_data.csv')\n", "\n", "print(f\"There are {restaurant_df.shape[0]} rows and {restaurant_df.shape[1]} columns.\")\n", "# Show the first 5 rows of the data\n", "restaurant_df.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the cell below to get a plot of the data. " ] }, { "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjMAAAGwCAYAAABcnuQpAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA6f0lEQVR4nO3de3xU9Z3/8fcAIRBIhksCSSSGIAEvWDflTmwEu6J0taA+WoytXErVbiEUqQ+17bJi7Qpaxa5x1dpShFJTtytQfquPWlqBKBe5mKiopcEkSBfYXICEJBpCOL8/2Ewzycxk7uecmdfz8cjjYc6cGb5zMs685/v9fL9fh2EYhgAAAGyql9kNAAAACAVhBgAA2BphBgAA2BphBgAA2BphBgAA2BphBgAA2BphBgAA2FofsxsQaRcuXNDx48eVnJwsh8NhdnMAAIAfDMPQ2bNnlZmZqV69fPe9xHyYOX78uLKyssxuBgAACMKxY8c0YsQIn+fEfJhJTk6WdPFipKSkmNwaAADgj8bGRmVlZbk+x32J+TDTMbSUkpJCmAEAwGb8KRExtQB41apVmjhxopKTkzVs2DDNmTNHhw8fdjtnwYIFcjgcbj9TpkwxqcUAAMBqTA0zO3fu1OLFi7V3715t27ZN58+f18yZM9Xc3Ox23k033aQTJ064fl5//XWTWgwAAKzG1GGmP/zhD26/r1u3TsOGDdPBgwdVUFDgOp6YmKj09HS/HrO1tVWtra2u3xsbG8PTWAAAYEmWWmemoaFBkjRkyBC34zt27NCwYcM0ZswY3X333aqpqfH6GKtWrZLT6XT9MJMJAIDY5jAMwzC7EdLF+eSzZ8/W6dOn9dZbb7mOv/LKKxo4cKCys7NVVVWlFStW6Pz58zp48KASExO7PY6nnpmsrCw1NDRQAAwAgE00NjbK6XT69fltmdlMS5Ys0fvvv6+3337b7fjcuXNd/z1u3DhNmDBB2dnZeu2113Tbbbd1e5zExESPIQcAAMQmS4SZoqIibd26VaWlpT0ujJORkaHs7GxVVFREqXUAAMDKTA0zhmGoqKhImzdv1o4dO5STk9Pjferr63Xs2DFlZGREoYUAAMDqTC0AXrx4sTZu3KiXX35ZycnJOnnypE6ePKnPPvtMktTU1KT7779fe/bsUXV1tXbs2KFbbrlFqampuvXWW81sOgAAsAhTC4C9req3bt06LViwQJ999pnmzJmjsrIynTlzRhkZGZoxY4YeffRRv2cpBVJABAAArME2BcA95aj+/fvrjTfeiFJrAABAICprm3T0VItGDh2gnNQBprXDEgXAAADAPs60nNPSknKVVtS6jhXkpqm4ME/OpISot8dSi+YBAADrW1pSrl1H6tyO7TpSp6KSMlPaQ5gBAAB+q6xtUmlFrdq7lIq0G4ZKK2pVVdfs5Z6RQ5gBAAB+O3qqxeft1fWEGQAAYGHZQ5J83j5yaPQLgQkzAADAb6PSBqogN029uyyv0tvhUEFumimzmggzAAAgIMWFecofnep2LH90qooL80xpD1OzAQBAQJxJCdqwaJKq6ppVXd/MOjMAAMCeclLNDTEdGGYCAAC2RpgBAAC2RpgBAAC2RpgBAAC2RpgBAAC2RpgBAAC2RpgBAAC2RpgBAAC2RpgBAAC2xgrAAIC4UFnbpKOnWkxfeh/hR5gBAMS0My3ntLSkXKUVta5jBblpKi7MkzMpwcSWIVwYZgIAxLSlJeXadaTO7diuI3UqKikzqUUIN8IMACBmVdY2qbSiVu2G4Xa83TBUWlGrqrpmk1qGcCLMAABi1tFTLT5vr64nzMQCwgwAIGZlD0nyefvIoRQCxwLCDAAgZo1KG6iC3DT1djjcjvd2OFSQm8asphhBmAEAxLTiwjzlj051O5Y/OlXFhXkmtQjhxtRsAEBMcyYlaMOiSaqqa1Z1fTPrzMQgwgwAIC7kpBJiYhXDTAAAwNbomQEAIMrYWiG8CDMAAEQJWytEBsNMAABECVsrRAZhBgCAKGBrhcghzAAAEAVsrRA5hBkAAKKArRUihzADAEAUsLVC5BBmAACIErZWiAymZgMAECVsrRAZ9MwAABBlOakDNGPsMBmGoe2Ha5jJFCJ6ZgAAiDIWzwsvemYAAIgyFs8LL8IMAABRxOJ54UeYAQAgilg8L/wIMwAARBGL54UfYQYAgChi8bzwI8wAABBlLJ4XXkzNBgAgylg8L7wIMwAAmCQnlRATDgwzAQAAWyPMAAAAWyPMAAAAWzM1zKxatUoTJ05UcnKyhg0bpjlz5ujw4cNu5xiGoZUrVyozM1P9+/fX9OnT9eGHH5rUYgAAYDWmhpmdO3dq8eLF2rt3r7Zt26bz589r5syZam7+++qHTzzxhNasWaNnn31W+/fvV3p6um644QadPXvWxJYDAACrcBhGl80hTFRbW6thw4Zp586dKigokGEYyszM1LJly/Tggw9KklpbWzV8+HA9/vjjuvfee3t8zMbGRjmdTjU0NCglJSXSTwEAAIRBIJ/flqqZaWhokCQNGTJEklRVVaWTJ09q5syZrnMSExN13XXXaffu3R4fo7W1VY2NjW4/AAAgdlkmzBiGoeXLl+vaa6/VuHHjJEknT56UJA0fPtzt3OHDh7tu62rVqlVyOp2un6ysrMg2HAAAmMoyYWbJkiV6//33VVJS0u02R5f9KwzD6Hasww9+8AM1NDS4fo4dOxaR9gIAAGuwxArARUVF2rp1q0pLSzVixAjX8fT0dEkXe2gyMjJcx2tqarr11nRITExUYmJiZBsMAAAsw9SeGcMwtGTJEm3atElvvvmmcnJy3G7PyclRenq6tm3b5jp27tw57dy5U9OmTYt2cwEAgAWZ2jOzePFivfzyy/r973+v5ORkVx2M0+lU//795XA4tGzZMj322GPKzc1Vbm6uHnvsMSUlJenOO+80s+kAAMAiTA0zzz//vCRp+vTpbsfXrVunBQsWSJIeeOABffbZZ/rud7+r06dPa/LkyfrjH/+o5OTkKLcWAABYkaXWmYkE1pkBAMB+bLvODAAAQKAIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNYIMwAAwNb6mN0AAAAQGZW1TTp6qkUjhw5QTuoAs5sTMYQZAABizJmWc1paUq7SilrXsYLcNBUX5smZlGBiyyKDYSYAAGLM0pJy7TpS53Zs15E6FZWUmdSiyCLMAAAQQyprm1RaUat2w3A73m4YKq2oVVVds0ktixzCjI1U1jZp++GamHwhAgDC4+ipFp+3V9fH3mcINTM2EG9jnwCA4GUPSfJ5+8ihsVcITM+MDcTb2CcAIHij0gaqIDdNvR0Ot+O9HQ4V5KbF5KwmwozFxePYJwAgNMWFecofnep2LH90qooL80xqUWQxzGRx/ox9xmLKBgAEz5mUoA2LJqmqrlnV9c2sMwNzxePYJwAgPHJSYzvEdGCYyeLicewTAIBAEGZsIN7GPgEgEljeInYxzGQD8Tb2CQDhxPIWsY+eGRvJSR2gGWOHEWQAIAAsbxH7CDMAgJjF8hbxgTADAIhZ8bi0fzwizAAAYhbLW8QHwgwAIGaxvEV8IMwAAGIay1vEPqZmAwBiGstbxD7CDAAgLsTL0v7xiGEmAABga4QZAABga4QZAABga4QZAABga4QZAABga4QZAABga4QZAABga4QZAABga4QZAABga6wADCDuVNY26eipFpa1B2IEYQZA3DjTck5LS8pVWlHrOlaQm6biwjw5kxJMbBmAUDDMBCBuLC0p164jdW7Hdh2pU1FJmUktAhAOhBkAcaGytkmlFbVqNwy34+2GodKKWlXVNZvUMgChIswAiAtHT7X4vL26njAD2BVhBkBcyB6S5PP2kUMpBAbsijADIC6MShuogtw09XY43I73djhUkJvGrCbAxkwNM6WlpbrllluUmZkph8OhLVu2uN2+YMECORwOt58pU6aY01gAtldcmKf80alux/JHp6q4MM+kFgEIB1OnZjc3N+uaa67RwoULdfvtt3s856abbtK6detcv/ft2zdazQMQY5xJCdqwaJKq6ppVXd/MOjNAjDA1zMyaNUuzZs3yeU5iYqLS09P9fszW1la1tra6fm9sbAy6fQBiU04qIQaIJZavmdmxY4eGDRumMWPG6O6771ZNTY3P81etWiWn0+n6ycrKilJLAYRDZW2Tth+uYao0AL85DKPLogsmcTgc2rx5s+bMmeM69sorr2jgwIHKzs5WVVWVVqxYofPnz+vgwYNKTEz0+DieemaysrLU0NCglJSUSD8NAEFidV4AnTU2NsrpdPr1+W3p7Qzmzp3r+u9x48ZpwoQJys7O1muvvabbbrvN430SExO9Bh0A1uVrdd4NiyaZ1CoAdmD5YabOMjIylJ2drYqKCrObAiCMWJ0XQChsFWbq6+t17NgxZWRkmN0UAGHE6rwAQmHqMFNTU5OOHDni+r2qqkrl5eUaMmSIhgwZopUrV+r2229XRkaGqqur9cMf/lCpqam69dZbTWw1gHBjdV4AoTC1Z+bAgQPKy8tTXt7FBauWL1+uvLw8/eu//qt69+6tDz74QLNnz9aYMWM0f/58jRkzRnv27FFycrKZzQYQZqzOCyAUlpnNFCmBVEMDME9DS5uKSsqYzQRAUgzNZgIQP1idF0CwCDMALIXVeQEEylazmQAAALoizAAAAFsjzAAAAFujZga2UFnbpKOnWigKBQB0Q5iBpbH5IACgJwwzwdJ8bT4Ie6msbdL2wzXsswQg7OiZgWV1bD7YVefNBxlysj561wBEGj0zsCw2H4wN9K4BiDTCDCyLzQftr6N3rb3Lrimde9cAIFSEGVgWmw/aH71rAKKBMANLKy7MU/7oVLdj+aNTVVyYZ1KLEAh61wBEAwXAsDQ2H7S3jt61XUfq3Iaaejscyh+dyt8SQFjQMwNbyEkdoBljh/HhZ0P0rgGINHpmAEQUvWsAIo0wAyAqclIJMQAig2EmAABga4QZAABga4QZAABga4QZAABga4QZAABga4QZAABga4QZAABga6wzg7hTWduko6daWLwNAGIEYQZx40zLOS0tKVdpRa3rWEFumooL8+RMSjCxZbA7AjJgLsIMbCWUD42lJeXadaTO7diuI3UqKinThkWTwtlMxAkCMmANhBlYVufgMjgpIaQPjcraJrf7dmg3DJVW1Kqqrplv1AgYARmwBsIMJFmrm9zTt93BSQlq/KzN7bxAPjSOnmrxeXt1PWEGgSEgA9ZBmIlzVuwm9/Rt93RLW7fzAvnQyB6S5PP2kUP50EFgCMiAdTA1O8756iY3Q8e33XbD8Ps+1fXNPZ4zKm2gCnLT1NvhcDve2+FQQW4aHzoIGAEZsI6gwsyoUaNUX1/f7fiZM2c0atSokBuF6PAWHDr3eERbT992PfH3Q6O4ME/5o1PdjuWPTlVxYV7A/yZAQAasI6hhpurqarW3t3c73traqv/5n/8JuVGIDit2k/f0bbez3g6H8ken+t1GZ1KCNiyapKq6ZlXXN1uiPgj2VlyYp6KSMrdhWgIyEH0BhZmtW7e6/vuNN96Q0+l0/d7e3q4///nPGjlyZNgah8iyYjd5x7fdXUfq3HqMeuliGOlcOxPsh0ZOKiEG4UFABqzBYRj+Fyf06nVxVMrhcKjr3RISEjRy5Eg99dRTuvnmm8PbyhA0NjbK6XSqoaFBKSkpZjfHcuat3dctOHT0eJg1tbShpa3bt92OouRTLef40ACAOBDI53dAYaZDTk6O9u/fr9TU1J5PNhlhxjdfwcHsRb/4tgsA8SviYcZOCDP+ITgAAKwkkM9vv2tmnnnmGd1zzz3q16+fnnnmGZ/nLl261N+HhUVQRwIAsCu/e2ZycnJ04MABDR06VDk5Od4f0OFQZWVl2BoYKnpmAACwn4j0zJSXl7tmL1VVVYXWQgAAgDDxe9G8IUOGqKamRpJ0/fXX68yZM5FqEwAAgN/8DjMDBw50rfq7Y8cOtbV13ysHAAAg2vweZvrHf/xHzZgxQ1dccYUk6dZbb1Xfvn09nvvmm2+Gp3UAAAA98DvMbNy4UevXr9cnn3yinTt36qqrrlJSkv9LzwMAAERCUOvMzJgxQ5s3b9agQYMi0KTwYjYTAAD2E5HZTJ1t377d9d8dWcjRZedYAACAaPC7ALirDRs26Oqrr1b//v3Vv39/feELX9Cvf/3rcLYNAACgR0H1zKxZs0YrVqzQkiVLlJ+fL8MwtGvXLn3nO99RXV2d7rvvvnC3EwAAwKOgN5p85JFHNG/ePLfj69ev18qVKy21qB41MwAA2E8gn99BDTOdOHFC06ZN63Z82rRpOnHiRDAPCQAAEJSgwszo0aP1n//5n92Ov/LKK8rNzQ25UQBiU2Vtk7YfrlFVXbPZTQEQQ4KqmXnkkUc0d+5clZaWKj8/Xw6HQ2+//bb+/Oc/eww53pSWluqnP/2pDh48qBMnTmjz5s2aM2eO63bDMPTII4/oxRdf1OnTpzV58mT9x3/8h6666qpgmg3AJGdazmlpSblKK2pdxwpy01RcmCdnUoKJLQMQC4Lqmbn99tu1b98+paamasuWLdq0aZNSU1O1b98+3XrrrX4/TnNzs6655ho9++yzHm9/4okntGbNGj377LPav3+/0tPTdcMNN+js2bPBNBuASZaWlGvXkTq3Y7uO1KmopMykFgGIJQH3zLS1temee+7RihUrtHHjxpD+8VmzZmnWrFkebzMMQz/72c/0ox/9SLfddpukiwXGw4cP18svv6x77703pH8bQHRU1ja59ch0aDcMlVbUqqquWTmpA0xoGYBYEXDPTEJCgjZv3hyJtripqqrSyZMnNXPmTNexxMREXXfdddq9e7fX+7W2tqqxsdHtB4B5jp5q8Xl7dT31MwBCE9Qw06233qotW7aEuSnuTp48KUkaPny42/Hhw4e7bvNk1apVcjqdrp+srKyIthOAb9lDfO/hNnIovTIAQhNUAfDo0aP16KOPavfu3Ro/frwGDHB/M1q6dGlYGid13ybBMAyfWyf84Ac/0PLly12/NzY2EmgAE41KG6iC3DTtOlKn9k7LWvV2OJQ/OpUhJgAhCyrM/PKXv9SgQYN08OBBHTx40O02h8MRljCTnp4u6WIPTUZGhut4TU1Nt96azhITE5WYmBjyvw8gfIoL81RUUuZWO5M/OlXFhXkmtgpArAgqzHRe4TdSG03m5OQoPT1d27ZtU17exTe8c+fOaefOnXr88cfD+m8BiCxnUoI2LJqkqrpmVdc3a+TQAfTIAAiboDeaXLt2rcaNG6d+/fqpX79+GjdunH75y18G9BhNTU0qLy9XeXm5pIshqby8XJ9++qkcDoeWLVumxx57TJs3b9ahQ4e0YMECJSUl6c477wy22QBMlJM6QDPGDiPIAAiroHpmVqxYoaefflpFRUWaOnWqJGnPnj267777VF1drZ/85Cd+Pc6BAwc0Y8YM1+8dtS7z58/XSy+9pAceeECfffaZvvvd77oWzfvjH/+o5OTkYJoNAABiUFAbTaampqq4uFiFhYVux0tKSlRUVKS6ujov94w+NpoEAMB+Avn8Dqpnpr29XRMmTOh2fPz48Tp//nwwDxk3KmubdPRUCzUDAACESVBh5pvf/Kaef/55rVmzxu34iy++qG984xthaVisYW8aIHh8CQDgS1DDTEVFRdqwYYOysrI0ZcoUSdLevXt17NgxzZs3TwkJf/9w7hp4os0qw0zz1u7zus7GhkWTTGsXYGV8CQDiV8SHmQ4dOqQvfvGLkqRPPvlEkpSWlqa0tDQdOnTIdV64p2vbFXvTAMHxtUElXwIAdAgqzGzfvj3c7Yhp/uxNQ5gB3PElAIC/gl5nBv5jbxogcGxQCcBfhJko6NibpneXYbfeDocKctP4dgl4wJcAAP4izERJcWGe8kenuh1jbxrAO74EAPBXULOZ7MQqs5k6sDcN4L+GlrZuG1QymwmID4F8fhNmAFgeXwKA+BPxqdkAEE05qYQYAN5RMwMAAGyNMAMAAGyNMAMAAGyNmhkgDrFxI4BYQpgB4ggbNwKIRQwzAXHE18aNAGBXhBkgQiprm7T9cI2q6qyxh1DHxo3tXZaW6rxxIwDYEcNMQJhZdSiH3dsBxCp6ZoAws+pQDhs3AohVhBnAT/4MG1l5KIeNGwHEKoaZgB4EMmxk9aGc4sK8bhs3sns7ALsjzAA98DVstGHRJLfjVh/KcSYlaMOiSWzcCCCmMMwE+BDosJFdhnJyUgdoxthhlmkPAISCMGNBVpvSaxYrXAd/ho26Ki7MU/7oVLdjDOUAQOQwzGQhVp3SG21Wug7BDBsxlAMA0UXPjIVYdUpvtFnpOoQybBTOoRwr9FIBgFXRM2MRHbUZXXWuzYiHb/dWvA5mzgCyUi8VAFgVYcYirD6lN1qseB3MHDYKZCYVAMQrwoxFWH1Kb7RE+jpU1jbp6KmWoAJJTmp0a1+s2EsFAFZEmLGIIQP6anBSgk63tLkd7+2Q8kdbZ0qvJ74CQqDhoaNGZdeROrfp0L0dDuWPTg36Opg1XBNKeLJiLxUAWBFhxiKWlpSroUuQkaSU/gmWndLrKyAYMoIOD5GoUYnmcE1lbZM+PNGoDburtb/6tOt4oOGJ3joA8A9hxgK8DSdI0umWNp1qORdQ70EovQGB6GnWUSDhoWubw1mjEq3hGk/hrrNAw1OkeqkAINYQZiwgXMMJ0RxK6SkgeOIpPPhqc7hqVKI1XOMp3HUWTHhiLyUA6BlhxgLCNZwQzaGUngKCL53DQzTaHI3hGl+9a10FEp5YgA8AesaieRYQjv18At1DKFQ9BQRfOsJDtNocjf2SAgl3wYQn9lICAO8IMxYR6n4+wewhFIqeAoI/4SGabY70fkn+hDurbTYJALGCYSaLCHU4wYyZLz3Vc/RU6xHNNkd6uMZbsW5n1LoAQGQ4DMPLO2+MaGxslNPpVENDg1JSUsxuTkTNW7vP68yXSK4W6ysg9BQezGpzJDS0tHULcBOzB2vBtJG68hInPTIAEIBAPr8JMzHE04fphOzBWmjhD1NPbbb73kMU6wJA6AgzncRTmOlQVdesQ8cbQl60LZoIAACAzgL5/KYAOAblpA7Q7/b/Te8ePeN2vPOCdlbDbB0AQLAIMzEo2tO0w6mytknbD9dYuo0AAGthNlMIorVtQKDsuEGhWRtBAgDsjzATBKt/8Npxg8Jorl4MAIgtDDMFoacNFqOt69BMNFa8DSc7D4sBAMxHz0yAorUDsz989RDZaYNCOw6LAQCsgzATICt98PrqIVr51Su18NqRursgR+cvGH7X9ZhRB2SVYTGr1kBFQjw9VwCxjzATICt98PrqIbr+qZ2uYx29Nb6YWQfkbSuAjpWAI/1ha/UaqHCKp+cKIH5QMxMgq9SjBLJLsz/1PGbXAUV6I0hfzH7u0RRPzxVA/KBnJgie6lG+mD0oqvUo/uzS3KGneh4r1AGFuhFksMMmVnju0RJPzxVAfKFnJgjOpAQ9U/gPmpg92HVsf/VpFZWUqaGlLSptGJU2UCn9Asui1fWeZwX5UwcULYGuBHym5Zzmrd2n65/aqYXr9mvGkzs0b+0+v/8OVnrukRZPzxVAfLF0mFm5cqUcDofbT3p6utnNknSxu/7dT8+4HYtmd31lbZMaPz8f0H281fOEow7IrJV7Qx02sUoNVDTE03MFEF8sP8x01VVX6U9/+pPr9969e5vYmous0F0fSM1MT4W0vgpw8y4d5PrG7un+ZhaUhuPvYHbxcTTF03MFEF8s3TMjSX369FF6errrJy0tzewmWaK7PpCaGX8KaT0V4Kb076MDR0/7HL4xs6A0XH8HM4uPoy2eniuA+GH5npmKigplZmYqMTFRkydP1mOPPaZRo0Z5Pb+1tVWtra2u3xsbG8PeJit01//9W3at2t0XztXgpAStXzhJ9S3n/C6I7VqA+9z2I1533e7YXsDsHqpw/R1CLT62k3h6rgDih6V7ZiZPnqwNGzbojTfe0C9+8QudPHlS06ZNU319vdf7rFq1Sk6n0/WTlZUV9nZZZXr2xW/Z7j1VE0cO1o77Z+gLWYMCKqTtkJM6QNlDkrS/+nSP2wuY3UMV7r9DoMXHdhZPzxVA7HMYRpdPLAtrbm7WZZddpgceeEDLly/3eI6nnpmsrCw1NDQoJSUlbG1paGnrNj070FqRcK3CGu5v2dsP12jhuv1eb1+3cKJmjB2mytomt8X5uj3O/dPd2hOJVWfD8XcAAFhPY2OjnE6nX5/flh9m6mzAgAG6+uqrVVFR4fWcxMREJSYmRrwtoXTXh7toNic1vEMF/g7f+FtQGskiYYZNAACWHmbqqrW1VR9//LEyMjLMbopLMN31Vl+FNZDhG38KSqPxfBk2AYD4Zememfvvv1+33HKLLr30UtXU1OgnP/mJGhsbNX/+fLObFjSzi2b95c+u2x3DRo/MvkqSPPaM2OX5AgDsy9Jh5m9/+5sKCwtVV1entLQ0TZkyRXv37lV2drbZTQtKZW2T/t/7x32eE81dt33xNXzz3rHT+tHmQzp0/O8zxQpy0/T9mWO6rUkTrl3G2eUZAOCNpcPMb3/7W7ObEBaeaka8sdoqrJ3rcXw9j9KKWo81MaFOn2aXZwBAT2xVM2NXnmpGuor2tO5gLC0p19tHeg5k0t9rYnzV30zIHqzq+mafWyBYvb4IAGA+wkyEddSMdF2zpSurr8La8Twu+DmRv3NNTLCrC3u7dl3XuwEAxDdLDzPFgp5qRu67IVdfveYSU3pk/K1DOdNyTkt/G1xPSEdNTKCrC0vhq7fxhjocAIgNhJkI66lmxIwgE2gdytKScn10PLhtITrXxOSkDpBhGNpffbrbeZ5mN0Vq2wjqcAAgtjDMFILK2iZtP1zjc7jDKlsfdBZIHUqgw0sdvD2/QLZAiNS1ow4HAGILYSYIZ1rOad7afbr+qZ0+az46WGmn4kDrUHoKH954e36B9raE+9pRhwMAsYdhpiD4+mbfueajQ7SW3PenBiTQOpSewoc3j8y+yuOQjb9bIHQI97WLdB0OACD6CDMBCnRF264BIxIflIHUgATaM+ItfPSSdMHH4/gKBf6sLtxVuK5dpOpwAADmIcwEyN9v9tEsMg2kpyjQnhHJc/j4YvZgHTjavZC3g69QYObmkME8fwCAtVEzEyB/v9n3VGTqT/GwP4KpAQm0DqUjfGy/f7rWLZyo7fdP13/987SQi3PN2hzSSjVMAIDQ0TMTIH++2fc0FPW1F3a7TU8OpccmmBqQQHpGfA2TBTNcZAVm9gwBAMKPMBOEnj7EewoYB7sMz7x9pFbfWLtXxYVfDPhDNZQaEF91KP4Mk9k9FESqhgkAEF0Ow+hhnX2ba2xslNPpVENDg1JSUsL62N4+xCtrm3T9UzuDekx/emm69pbMW7vPa0+Rp9lV/ojEYwIA4K9APr/pmQmBt2/2wc4Aki7uPv2djQdVcs+Ubrd56y35tznj9KMth8I23BPojC0AAMxEmIkQT0NRXxgxSOV/O9PjffdU1nsMDN6Kin+05RBrsQAA4hZhJkI81ZM8/PsP/b7/3sr6bkNX/vSWsBYLACDeMDU7BP5Mr+6Yfmz8X+jwl6PL74HsaRQqK+4nBQCAN/TMBCGYBfEC3eNo8qihbr9Hu7fErtOuAQDxhzAThED3ZpIC2+No2mVDu/V+RHvlWrtPuwYAxA+GmQIU7K7L3oZuuirITdPz3xjv8TYzVq41a5VeAAD8Rc9MgEKZ6eNp6KYgN0333zhG9c3neuz96Npb0tvhULth6FTLubDv9wQAgF0QZgIUSu2Kr6GbytomVxFvT70gg5MS9PDvq6OyiSUAAFZHmAlQOGpXOk+hDqaYOJiaHQAAYhU1M0EIZ+1KT7trdxVszQ4AALGKnpkghGumTzDbBrA6LwAA7ggzIQh1xd1gggmr8wIA4I5hpjDwtBKwP6sDBxNMWJ0XAAB39MyEwFPx7tRRQ+VwSLs/qXcd81bQG2wxMavzAgDwdw7D6FJJGmMaGxvldDrV0NCglJSUsD72vLX7ugURTzrCiaeZRu8dO6Mfbf5Ah443uo5dfUmK7r3uMl2V6fTZ08LqvACAWBXI5zc9M0HyVrzriaeCXk+9Oh0++J9GLXn54mwmX9O0w7VLNgAAdkbNTJAC3ThSct/Z2tOUbE98TdMGAACEmaAFsnFkhz69LhbtelsrxhPWjwEAwDfCTJD83Tiys/MXLoaXUHt1AADA3xFmQuBpJWBfOqZaB9Orw/oxAAB4RgFwCDytBPzw7z/scaq1tynZnviz51NlbZOOnmphVhMAIC4xNTvMGlrauq0B42lGkqfzPJmQPVgLp43UlZd0n6YdzCaVAADYQSCf34SZCPF3DZjO50kXa2P69HLozGdt2rC7WvurT7vO7RpUPK1z42tNGwAA7IIw04lZYSZUPQWVytomXf/UTq/3337/dIacAAC2FcjnNwXAFuRt6nbnadr+bFIZyr/f075SAABYBQXAUVRZ26R3quolOTRl1FCvPSf+BJVI7J5NDQ4AwI4IM1FwpuWcvvubd902n5Qubkr5wjfHdwsK/gSVnNQBQW1S6YunVYk7ViCmBgcAYFUMM0XB0pLybkFGkvZU1nvcqmBU2kBNHTXU42NN7dSj42mdm2B3z/ZnaAsAACuiZybCetqQsrSiVm9V1OpLuWlux70tLNz5uKd1boIt+vVnaIuCYgCAFdEzE2H+bF1w19p9mrd2nxpa2iRdDECeenIkafcn9d16ScIxIS0SNTgAAEQDPTNh4GsFXn+3Luhcm+JvL0k4C3a9rUocSg0OAADRQM9MCM60nNO8tft0/VM7tXDdfs14codbD8vR+mbd/vxuvx6rc22Kv70kvgp2gxHOGhwAAKKFnpkQ9DT7Z85/7NLp/ws2/ioqeVe/WTSlx14Sb7U4nUNRoL0p4azBAQAgWuiZCVJPs39e2f9pwEFGkj463qiikrIee0kiuWheTuoAzRg7jCADALAFemaC1FOY2FPpuYC3JxeMizOcTrWc89lL4s9QFLtpAwDiAWEmSD2FiamjhmpL2XGvt2c6++l4w+deb+8o8u346cpXwe7kUUP08O8/ZCVfAEBcYJgpSB1horeXBWFee/+knP29BwdfQUbybyq0t6Eow1BYC4OjiX2hAACBYtfsEDS0tKmopMxjIW5vh0PjswepoqYpoNqZzjtj+6vzUJRhGLbcTZt9oQAAncXcrtnPPfeccnJy1K9fP40fP15vvfWW2U2SdHH2z8qvXunxtnbD0L7q09r03Xw9cfsX/H7MYKZCdy7YjWRhcCSFe5o5ACB+WL5m5pVXXtGyZcv03HPPKT8/Xz//+c81a9YsffTRR7r00kvNbp5f4SEtJdHnOatvu1rDnf3CUqhrx5V8IzHNHAAQPyzfM7NmzRotWrRI3/72t3XFFVfoZz/7mbKysvT88897PL+1tVWNjY1uP5HkT3jo6ZzJo4aGbSq0t1qe3g6HCnLTLBkK7NqbBACwBkuHmXPnzungwYOaOXOm2/GZM2dq927PK+uuWrVKTqfT9ZOVlRXRNvoTHqIdMOy2kq8de5MAANZh6TBTV1en9vZ2DR8+3O348OHDdfLkSY/3+cEPfqCGhgbXz7FjxyLeTn/CQzQDRsdKvtvvn651Cydq+/3TtWHRJMsW0tqxNwkAYB2Wr5mRJEeXDznDMLod65CYmKjERN81KuHmzzYAZmwV4G2NGisqLszrNjPMyr1JAADrsHSYSU1NVe/evbv1wtTU1HTrrbECf8KDnQJGNLEvFAAgWJYeZurbt6/Gjx+vbdu2uR3ftm2bpk2bZlKrEEnsCwUACJSle2Ykafny5brrrrs0YcIETZ06VS+++KI+/fRTfec73zG7aQAAwAIsH2bmzp2r+vp6/fjHP9aJEyc0btw4vf7668rOzja7ad2wsSMAANHHdgZhwFL8AACEV8xtZ2B1LMUPAIB5CDMh6liKv71LB1fnpfgBAEDkEGZCxFL8AACYizATIpbiBwDAXISZEHlbir+XQyzFDwBAFBBmwqC4ME+TRw1xO3bBkNraL6ihpc2kVgEAEB8IM2HgTEpQn169ul3MfVWnojajqbK2SdsP11BwDACIO5ZfNM8OOmY0ddV5RlOkhptY4wYAEO/omQkDM2c0scYNACDeEWbCwKwZTaxxAwAAYSYsvM1o6u1wRHRGE2vcAABAmAmb4sI85Y9OdTuWPzpVxYV5Efs3WeMGAAAKgMPGmZSgDYsmqaquWdX1zVHZObujR2jXkTq3oabeDofyR6eyxg0AIC7QMxNmOakDNGPssKgFCTN6hAAAsBJ6ZmzOjB4hAACshDATI3JSCTEAgPjEMBMAALA1wgwAALA1wgwAALA1wgwAALA1wgwAALA1wgwAALA1pmaHoLK2SUdPtbC2CwAAJiLMBOFMyzktLSlXaUWt61hBbpqKC/PkTEowsWUAAMQfhpmCsLSkXLuO1Lkd23WkTkUlZSa1CACA+EWYCVBlbZNKK2rdNnaUpHbDUGlFrarqmk1qGQAA8YkwE6Cjp1p83l5dT5gBACCaCDMByh6S5PP2kUMpBAYAIJoIMwEalTZQBblp6u1wuB3v7XCoIDeNWU0AAEQZYSYIxYV5yh+d6nYsf3SqigvzTGoRAADxi6nZQXAmJWjDokmqqmtWdX0z68wAAGAiwkwIclIJMQAAmI1hJgAAYGuEGQAAYGuEGQAAYGuEGQAAYGuEGQAAYGuEGQAAYGuEGQAAYGuEGQAAYGuEGQAAYGuEGQAAYGsxv52BYRiSpMbGRpNbAgAA/NXxud3xOe5LzIeZs2fPSpKysrJMbgkAAAjU2bNn5XQ6fZ7jMPyJPDZ24cIFHT9+XMnJyXI4HGF73MbGRmVlZenYsWNKSUkJ2+PaCdfgIq7DRVwHrkEHrsNFXIfQroFhGDp79qwyMzPVq5fvqpiY75np1auXRowYEbHHT0lJidsXaQeuwUVch4u4DlyDDlyHi7gOwV+DnnpkOlAADAAAbI0wAwAAbI0wE6TExEQ9/PDDSkxMNLsppuEaXMR1uIjrwDXowHW4iOsQvWsQ8wXAAAAgttEzAwAAbI0wAwAAbI0wAwAAbI0wAwAAbI0w48HKlSvlcDjcftLT033eZ+fOnRo/frz69eunUaNG6YUXXohSayNn5MiR3a6Dw+HQ4sWLPZ6/Y8cOj+f/5S9/iXLLg1daWqpbbrlFmZmZcjgc2rJli9vthmFo5cqVyszMVP/+/TV9+nR9+OGHPT7uq6++qiuvvFKJiYm68sortXnz5gg9g/DwdR3a2tr04IMP6uqrr9aAAQOUmZmpefPm6fjx4z4f86WXXvL4+vj8888j/GyC09NrYcGCBd2ey5QpU3p83Fh6LUjy+Dd1OBz66U9/6vUx7fZaWLVqlSZOnKjk5GQNGzZMc+bM0eHDh93OiYf3hp6ug5nvDYQZL6666iqdOHHC9fPBBx94Pbeqqkpf+cpX9KUvfUllZWX64Q9/qKVLl+rVV1+NYovDb//+/W7XYNu2bZKkr33taz7vd/jwYbf75ebmRqO5YdHc3KxrrrlGzz77rMfbn3jiCa1Zs0bPPvus9u/fr/T0dN1www2uPcA82bNnj+bOnau77rpL7733nu666y59/etf1zvvvBOppxEyX9ehpaVF7777rlasWKF3331XmzZt0l//+ld99atf7fFxU1JS3F4bJ06cUL9+/SLxFELW02tBkm666Sa35/L666/7fMxYey1I6vb3/NWvfiWHw6Hbb7/d5+Pa6bWwc+dOLV68WHv37tW2bdt0/vx5zZw5U83Nza5z4uG9oafrYOp7g4FuHn74YeOaa67x+/wHHnjAuPzyy92O3XvvvcaUKVPC3DJzfe973zMuu+wy48KFCx5v3759uyHJOH36dHQbFiGSjM2bN7t+v3DhgpGenm6sXr3adezzzz83nE6n8cILL3h9nK9//evGTTfd5HbsxhtvNO64446wtzkSul4HT/bt22dIMo4ePer1nHXr1hlOpzO8jYsST9dg/vz5xuzZswN6nHh4LcyePdu4/vrrfZ5j59eCYRhGTU2NIcnYuXOnYRjx+97Q9Tp4Eq33BnpmvKioqFBmZqZycnJ0xx13qLKy0uu5e/bs0cyZM92O3XjjjTpw4IDa2toi3dSoOHfunDZu3KhvfetbPW7YmZeXp4yMDH35y1/W9u3bo9TCyKuqqtLJkyfd/taJiYm67rrrtHv3bq/38/b68HUfu2loaJDD4dCgQYN8ntfU1KTs7GyNGDFCN998s8rKyqLTwAjZsWOHhg0bpjFjxujuu+9WTU2Nz/Nj/bXwv//7v3rttde0aNGiHs+182uhoaFBkjRkyBBJ8fve0PU6eDsnGu8NhBkPJk+erA0bNuiNN97QL37xC508eVLTpk1TfX29x/NPnjyp4cOHux0bPny4zp8/r7q6umg0OeK2bNmiM2fOaMGCBV7PycjI0IsvvqhXX31VmzZt0tixY/XlL39ZpaWl0WtoBJ08eVKSPP6tO27zdr9A72Mnn3/+uR566CHdeeedPjeSu/zyy/XSSy9p69atKikpUb9+/ZSfn6+KioootjZ8Zs2apd/85jd688039dRTT2n//v26/vrr1dra6vU+sf5aWL9+vZKTk3Xbbbf5PM/OrwXDMLR8+XJde+21GjdunKT4fG/wdB26iuZ7Q8zvmh2MWbNmuf776quv1tSpU3XZZZdp/fr1Wr58ucf7dO2tMP5vYeWeejHsYu3atZo1a5YyMzO9njN27FiNHTvW9fvUqVN17NgxPfnkkyooKIhGM6PC09+6p79zMPexg7a2Nt1xxx26cOGCnnvuOZ/nTpkyxa1ANj8/X1/84hdVXFysZ555JtJNDbu5c+e6/nvcuHGaMGGCsrOz9dprr/n8MI/V14Ik/epXv9I3vvGNHmsd7PxaWLJkid5//329/fbb3W6Lp/cGX9dBiv57Az0zfhgwYICuvvpqrykxPT29W5KuqalRnz59NHTo0Gg0MaKOHj2qP/3pT/r2t78d8H2nTJlii29b/uiY0ebpb93121XX+wV6Hztoa2vT17/+dVVVVWnbtm0+v3l50qtXL02cODFmXh8ZGRnKzs72+Xxi9bUgSW+99ZYOHz4c1PuEXV4LRUVF2rp1q7Zv364RI0a4jsfbe4O369DBjPcGwowfWltb9fHHHysjI8Pj7VOnTnXN9Onwxz/+URMmTFBCQkI0mhhR69at07Bhw/RP//RPAd+3rKzM63Wzm5ycHKWnp7v9rc+dO6edO3dq2rRpXu/n7fXh6z5W1/FmVVFRoT/96U9BhXbDMFReXh4zr4/6+nodO3bM5/OJxddCh7Vr12r8+PG65pprAr6v1V8LhmFoyZIl2rRpk958803l5OS43R4v7w09XQfJxPeGkMqHY9T3v/99Y8eOHUZlZaWxd+9e4+abbzaSk5ON6upqwzAM46GHHjLuuusu1/mVlZVGUlKScd999xkfffSRsXbtWiMhIcH4r//6L7OeQti0t7cbl156qfHggw92u63rdXj66aeNzZs3G3/961+NQ4cOGQ899JAhyXj11Vej2eSQnD171igrKzPKysoMScaaNWuMsrIyVyX+6tWrDafTaWzatMn44IMPjMLCQiMjI8NobGx0PcZdd91lPPTQQ67fd+3aZfTu3dtYvXq18fHHHxurV682+vTpY+zduzfqz89fvq5DW1ub8dWvftUYMWKEUV5ebpw4ccL109ra6nqMrtdh5cqVxh/+8Afjk08+McrKyoyFCxcaffr0Md555x0znmKPfF2Ds2fPGt///veN3bt3G1VVVcb27duNqVOnGpdccklcvRY6NDQ0GElJScbzzz/v8THs/lr453/+Z8PpdBo7duxwe723tLS4zomH94aeroOZ7w2EGQ/mzp1rZGRkGAkJCUZmZqZx2223GR9++KHr9vnz5xvXXXed23127Nhh5OXlGX379jVGjhzp9X9qu3njjTcMScbhw4e73db1Ojz++OPGZZddZvTr188YPHiwce211xqvvfZaFFsbuo7p5V1/5s+fbxjGxSmYDz/8sJGenm4kJiYaBQUFxgcffOD2GNddd53r/A6/+93vjLFjxxoJCQnG5ZdfbvmA5+s6VFVVebxNkrF9+3bXY3S9DsuWLTMuvfRSo2/fvkZaWpoxc+ZMY/fu3dF/cn7ydQ1aWlqMmTNnGmlpaUZCQoJx6aWXGvPnzzc+/fRTt8eI9ddCh5///OdG//79jTNnznh8DLu/Fry93tetW+c6Jx7eG3q6Dma+Nzj+r4EAAAC2RM0MAACwNcIMAACwNcIMAACwNcIMAACwNcIMAACwNcIMAACwNcIMAACwNcIMAACwNcIMAFuaPn26li1bZpnHAWCePmY3AACiYceOHZoxY4ZOnz6tQYMGuY5v2rQpJjaEBeIZYQZAXBsyZIjZTQAQIoaZAARk+vTpWrJkiZYsWaJBgwZp6NCh+pd/+Rd1bPN2+vRpzZs3T4MHD1ZSUpJmzZqliooK1/1feuklDRo0SFu2bNGYMWPUr18/3XDDDTp27JjrnAULFmjOnDlu/+6yZcs0ffp0r+3auHGjJkyYoOTkZKWnp+vOO+9UTU2NJKm6ulozZsyQJA0ePFgOh0MLFixwPZ/Ow0z+tv+NN97QFVdcoYEDB+qmm27SiRMngrmcAMKAMAMgYOvXr1efPn30zjvv6JlnntHTTz+tX/7yl5IuBpEDBw5o69at2rNnjwzD0Fe+8hW1tbW57t/S0qJ/+7d/0/r167Vr1y41NjbqjjvuCKlN586d06OPPqr33ntPW7ZsUVVVlSuwZGVl6dVXX5UkHT58WCdOnNC///u/e3wcf9v/5JNP6te//rVKS0v16aef6v777w+p/QCCxzATgIBlZWXp6aeflsPh0NixY/XBBx/o6aef1vTp07V161bt2rVL06ZNkyT95je/UVZWlrZs2aKvfe1rkqS2tjY9++yzmjx5sqSL4eiKK67Qvn37NGnSpKDa9K1vfcv136NGjdIzzzyjSZMmqampSQMHDnQNJw0bNsytZqaziooKv9v/wgsv6LLLLpMkLVmyRD/+8Y+DajeA0NEzAyBgU6ZMkcPhcP0+depUVVRU6KOPPlKfPn1cIUWShg4dqrFjx+rjjz92HevTp48mTJjg+v3yyy/XoEGD3M4JVFlZmWbPnq3s7GwlJye7hqQ+/fRTvx/j448/9qv9SUlJriAjSRkZGa4hLQDRR5gBEHGGYbiFH0ndfu98rFevXq4anA6dh3m6am5u1syZMzVw4EBt3LhR+/fv1+bNmyVdHH4KpJ3+tL/r7CeHw+H1vgAijzADIGB79+7t9ntubq6uvPJKnT9/Xu+8847rtvr6ev31r3/VFVdc4Tp2/vx5HThwwPX74cOHdebMGV1++eWSpLS0tG4FteXl5V7b85e//EV1dXVavXq1vvSlL+nyyy/v1lPSt29fSVJ7e7vXx/G3/QCshTADIGDHjh3T8uXLdfjwYZWUlKi4uFjf+973lJubq9mzZ+vuu+/W22+/rffee0/f/OY3dckll2j27Nmu+yckJKioqEjvvPOO3n33XS1cuFBTpkxx1ctcf/31OnDggDZs2KCKigo9/PDDOnTokNf2XHrpperbt6+Ki4tVWVmprVu36tFHH3U7Jzs7Ww6HQ//93/+t2tpaNTU1dXscf9sPwFoIMwACNm/ePH322WeaNGmSFi9erKKiIt1zzz2SpHXr1mn8+PG6+eabNXXqVBmGoddff91taCYpKUkPPvig7rzzTk2dOlX9+/fXb3/7W9ftN954o1asWKEHHnhAEydO1NmzZzVv3jyv7UlLS9NLL72k3/3ud7ryyiu1evVqPfnkk27nXHLJJXrkkUf00EMPafjw4VqyZInHx/Kn/QCsxWEw0AsgANOnT9c//MM/6Gc/+1lQ93/ppZe0bNkynTlzJqztAhC/6JkBAAC2RpgBAAC2xjATAACwNXpmAACArRFmAACArRFmAACArRFmAACArRFmAACArRFmAACArRFmAACArRFmAACArf1/YHvcI+OqMSEAAAAASUVORK5CYII=\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "restaurant_df.plot(kind='scatter', x='population', y='profit')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To simplify things this time around, we'll omit the validation set. Given that there is no validation set, and that we won't implement cross-validation, we won't be able to perform any hyper-parameter search.\n", "\n", "Here, the target label is the `profit`, and the (only) feature is the `population`." ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We'll use 80% of our data as training data and the remaining 20% as test data\n", "# Here, we use a random seed to ensure that the data shuffling and splitting can be reproduced\n", "X_train, y_train, X_test, y_test, feature_names = helpers.preprocess_data(restaurant_df, label=\"profit\", train_size=0.8, seed=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding the intercept" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of linear regression is to fit a line of slope $w_1$ and of intercept $b$ such that for any data $x^{(i)}$, the prediction $\\hat{y}^{(i)}$ is:\n", "$$\\hat{y}^{(i)} = w_{1}x^{(i)} + b$$\n", "\n", "Note that this can also be written as:\n", "$$\\hat{y}^{(i)} = \\begin{bmatrix} b & w_1 \\end{bmatrix} \\cdot \\begin{bmatrix} 1 \\\\ x^{(i)} \\end{bmatrix}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Therefore, in order to take into account the offset term ($b$) directly in our matrix, we add an additional first column to `X` and set it to all ones. Then, we treat the intercept as another feature (`b` will be treated as `w_0`), which will make our matrix computation easier.\n", "\n", "__Note__: The same principle applies if the data has multiple features:\n", "$$\\hat{y}^{(i)} = w_{n}x^{(i)}_{n} + \\ ... \\ + w_{2}x^{(i)}_{2} + w_{1}x^{(i)}_{1} + b$$\n", "is equivalent to\n", "$$\\hat{y}^{(i)} = \\begin{bmatrix} b & w_1 & w_2 & ... & w_D \\end{bmatrix} \\cdot \\begin{bmatrix} 1 \\\\ x^{(i)}_{1} \\\\ x^{(i)}_{2} \\\\ ... \\\\ x^{(i)}_{D} \\end{bmatrix}$$\n", "\n", "Let's add a column of ones (known as the offset term / constant term) to the feature matrix `X`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "As a rule, for linear regression, the constant is always included in the feature matrix $\\mathbf{X}$, and the intercept / bias term will be part to the weight vector $\\mathbf{w}$. \n", "\n", "However, this **will not be the case** in future exercises, where the bias term will be separate.\n", " \n", "
" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def add_constant(X: np.ndarray) -> np.ndarray:\n", " \"\"\" Adds an constant term to the dataset (as the first column)\n", "\n", " Args:\n", " X (np.ndarray): Dataset of shape (N, D-1)\n", "\n", " Returns: \n", " Dataset with offset term added, of shape (N, D)\n", "\n", " \"\"\"\n", " X_with_offset = np.insert(X, 0, 1, axis=1)\n", "\n", " return X_with_offset\n", "\n", "X_train = add_constant(X_train)\n", "X_test = add_constant(X_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question:** In simple linear regression, what happens if no intercept is added?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer:** YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data preview" ] }, { "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Features: ['population']\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "print(f\"Features: {feature_names}\")" ] }, { "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Training set features:\n", - "X_train: \n", - " [[ 1. 8.2951]\n", - " [ 1. 9.3102]\n", - " [ 1. 20.341 ]\n", - " [ 1. 6.0062]\n", - " [ 1. 7.0032]\n", - " [ 1. 8.5781]\n", - " [ 1. 8.2111]\n", - " [ 1. 8.0959]\n", - " [ 1. 5.1301]\n", - " [ 1. 5.0269]]\n", - "\n", - "Training set labels:\n", - "y_train: \n", - " [ 5.7442 3.9624 20.992 1.2784 11.854 12. 6.5426 4.1164\n", - " 0.56077 -2.6807 ]\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Visualisation of X_train and y_train (separation of the features and the labels)\n", "print('Training set features:')\n", "print(f'X_train: \\n {X_train[:10]}')\n", "\n", "print('\\nTraining set labels:')\n", "print(f'y_train: \\n {y_train[:10]}')" ] }, { "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Training set shape:\n", - "X: (77, 2), y: (77,)\n", - "\n", - "Test set shape:\n", - "X: (20, 2), y: (20,)\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Show shapes\n", "print('Training set shape:')\n", "print(f'X: {X_train.shape}, y: {y_train.shape}')\n", "\n", "print('\\nTest set shape:')\n", "print(f'X: {X_test.shape}, y: {y_test.shape}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Notation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have added the constant term, here's how our data looks like:\n", "\n", "- features: $\\boldsymbol{X} \\in \\mathbb{R}^{N \\times (d+1)}$, $\\forall \\ \\boldsymbol{x}^{(i)} \\in \\boldsymbol{X}: \\boldsymbol{x}^{(i)} \\in \\mathbb{R}^{(d+1)}$.\n", "- labels: $\\boldsymbol{y} \\in \\mathbb{R}^{N}$, $\\forall \\ y^{(i)} \\in \\boldsymbol{y}: y^{(i)} \\in \\mathbb{R}$ \n", " \n", " where $N$ is the number of examples in our dataset, and $d$ is the number of features per example. In other words, $d$ is the dimension of the independent variables. \n", " \n", "\n", "For the weights, we have:\n", " \n", " \n", " - weights: $\\mathbf{w} \\in \\mathbb{R}^{d+1}$, where $w_0$ (or $b$) is known as the intercept." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " **Note:**\n", " $\\boldsymbol{X}$ is sometimes called the design matrix, where $\\boldsymbol{X}_{i, :}$ denotes $\\boldsymbol{x}^{(i)}$. \n", " Note that a single example $\\boldsymbol{x}^{(i)}$ is a column vector of dimension (shape in python language) $((d+1) \\times 1)$, while the design matrix $\\boldsymbol{X}$ is of dimension (shape) $(N \\times (d+1))$, where each row represents an example and each column represents a feature. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Loss function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the first step when working on a machine learning problem is to pick a loss / cost function. Here, we will use the Mean Squared Error (MSE), defined as: \n", "\n", "$$\n", "\\begin{align}\n", "J(\\mathbf{w}) = \\frac{1}{N} \\sum_{i=1}^{N} (\\hat{y}^{(i)} - y^{(i)})^{2} \\\\\n", "= \\frac{1}{N} \\sum_{i=1}^{N} (\\mathbf{w}^T{\\boldsymbol{x}}^{(i)} - y^{(i)})^{2} \\\\\n", "= \\frac{1}{N} (\\mathbf{X} \\mathbf{w}-\\mathbf{y})^{T} (\\mathbf{X} \\mathbf{w}-\\mathbf{y})\n", "\\end{align}$$\n", "\n", "where $N$ is the number of examples, $\\hat{y}^{(i)}$ is the prediction for the $i^{th}$ example, and ${y}^{(i)}$ is the ground-truth for the $i^{th}$ example.\n", "\n", "Implement the function `mse_loss()`\n", "\n", "**Note about loss / cost:** The function we want to minimize or maximize is called the cost function, loss function, or error function. In this exercise, we use these terms interchangeably, though some machine learning publications assign special meaning to some of these terms.\n", "\n", "**Hint**: Use the matrix form shown above and make use of NumPy operations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mse_loss(X: np.ndarray, y: np.ndarray, w: np.ndarray) -> np.ndarray:\n", " \"\"\"Compute the Mean Square Error (MSE)\n", " \n", " Args:\n", " X (np.ndarray): Dataset of shape (N, D)\n", " y (np.ndarray): Labels of shape (N, )\n", " w (np.ndarray): Weights of shape (D, )\n", "\n", " Returns:\n", " Distances of shape (N,)\n", " \"\"\"\n", " ### START CODE HERE ### (≈ 3 lines of code)\n", "\n", " loss = ...\n", " ### END CODE HERE ###\n", " return loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's initialize the weights to 0 and look at the current loss." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "zero_weights = np.zeros(X_train.shape[1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_loss = mse_loss(X_train, y_train, zero_weights)\n", "test_loss = mse_loss(X_test, y_test, zero_weights)\n", "print(f\"Train loss: {train_loss:.5f}\")\n", "print(f\"Test loss: {test_loss:.5f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected output:** \n", "\n", "| | |\n", "|---|--------------------------------------------------|\n", "| **Train loss** | 62.15811 |\n", "| **Test loss** | 71.79680 |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "helpers.plot_linear_regression_2d(X=X_train, y=y_train, w=zero_weights, feature_name=\"population\", label_name=\"profit\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not great, right? We'll see in the next sections how to fit our model in order to get a much better predictor.\n", "\n", " **Question:** Before proceeding, based on the lecture, what are some ways you could think of to determine a better set of weights? \n", "\n", "**Answer:** YOUR ANSWER HERE\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Gradient Descent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we need to define a function to perform gradient descent on the weights $\\mathbf{w}$ using the update rules. First, write a function that computes the gradient of the loss function (`mse_gradient()`) and then use it in the `gradient_descent()` function to update the weights at every iteration.\n", "\n", "As seen in the previous section, our loss is:\n", "$$\n", "J(\\mathbf{w}) = \\frac{1}{N} (\\mathbf{X} \\mathbf{w}-\\mathbf{y})^{T} (\\mathbf{X} \\mathbf{w}-\\mathbf{y})\n", "$$\n", "\n", "Therefore, the derivative w.r.t to ${\\mathbf{w}}$ is:\n", "\n", "$$ \\nabla_{\\mathbf{w}} J(\\mathbf{w}) = \\frac{2}{N} \\mathbf{X}^{T} (\\mathbf{X} \\mathbf{w} - \\mathbf{y}) \n", "$$\n", "\n", "**Note:** You can use http://www.matrixcalculus.org/ to compute the gradient.\n", "\n", "\n", "The gradient descent formula is:\n", "$$\\mathbf{w} := \\mathbf{w} - \\alpha \\nabla_{\\mathbf{w}} J(\\mathbf{w})$$\n", "\n", "where $\\nabla_{\\mathbf{w}} J(\\mathbf{w})$ is the gradient of the loss function at the current iteration, $\\mathbf{w}$ is the weights vector, and $\\alpha$ is the learning rate.\n", "\n", "**Hint**: Use the matrix form of the gradient and make use of NumPy operations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mse_gradient(X: np.ndarray, y: np.ndarray, w: np.ndarray) -> np.ndarray:\n", " \"\"\"Compute the gradient of the MSE\n", " \n", " Args:\n", " X (np.ndarray): Dataset of shape (N, D)\n", " y (np.ndarray): Labels of shape (N, )\n", " w (np.ndarray): Weights of shape (D, )\n", "\n", " Returns:\n", " Gradient of shape (D, )\n", " \"\"\"\n", " ### START CODE HERE ### (≈ 2 lines of code)\n", " \n", " grad = ...\n", " ### END CODE HERE ###\n", " return grad\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def gradient_descent(X: np.ndarray, y: np.ndarray, w: np.ndarray, alpha: float, max_iters: int) -> (np.ndarray, np.ndarray):\n", " \"\"\"Gradient descent for linear regression.\n", " \n", " Args:\n", " X (np.ndarray): Dataset of shape (N, D)\n", " y (np.ndarray): Labels of shape (N, )\n", " w (np.ndarray): Weights of shape (D, )\n", " alpha (float): Learning rate\n", " max_iters (int): Maximum number of gradient descent iteration\n", "\n", " Returns:\n", " w (np.ndarray): Optimum weights of shape (D, )\n", " losses (np.ndarray): Loss at every iteration of gradient descent. Shape is (max_iters, )\n", " \"\"\"\n", " # Define an array to store the evolution of the loss\n", " losses = np.zeros(max_iters)\n", " \n", " for n_iter in range(max_iters):\n", " ### START CODE HERE ### (≈ 2 lines of code)\n", " # Update w using the gradient descent formula\n", " w = ...\n", " # Compute the loss with the updated w\n", " loss = ...\n", " ### END CODE HERE ###\n", " \n", " # Track losses\n", " losses[n_iter] = loss\n", " \n", " # Print loss at some iterations\n", " if n_iter % (max_iters / 20) == 0:\n", " if w.shape[0] == 2: \n", " print(f\"Iteration {n_iter}: loss={loss:.5f}, w0={w[0]:.3f}, w1={w[1]:.3f}\")\n", " else:\n", " print(f\"Iteration {n_iter}: loss={loss:.5f}\")\n", "\n", " return w, losses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's initialize some additional variables - the learning rate alpha, and the number of iterations to perform." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alpha = 0.01\n", "iters = 2000\n", "w = np.zeros((X_train.shape[1], ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's run the gradient descent algorithm to fit our parameters theta to the training set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "w, loss = gradient_descent(X_train, y_train, w, alpha, iters)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that `gradient_descent` prints the loss and the values of the weights matrix, `w`. The reason is that `w` is at the core of our algorithm. Make sure to understand that the whole point of the learning algorithm is to update this `w` so that the linear regression model (described by `w`) fits the data as well as possible. As `X` and `y` are fixed, the only parameter that can be changed is `w`. This why we use the gradient of the loss w.r.t `w` in gradient descent. It enables us to get closer to the best value of `w` at every iteration." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, play with the learning rate, `alpha`, and the number of iterations, `iters`, to see how the convergence changes. Document your findings.\n", "\n", "__Hint__: \n", "- Try `alpha = 0.05`. What's happening? Try to guess why.\n", "- Try `alpha = 0.001`. Why is the final loss bigger than when `alpha = 0.01`?\n", "- Try `alpha = 0.001` with `iters = 20 000`. Is the problem of the loss solved?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alpha = 0.01 # Try changing this\n", "iters = 2000 # Try changing this\n", "w = np.zeros((X_train.shape[1], ))\n", "w, loss = gradient_descent(X_train, y_train, w, alpha, iters)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally we can compute the loss (error) of the trained model using our fitted parameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_loss = mse_loss(X_train, y_train, w)\n", "test_loss = mse_loss(X_test, y_test, w)\n", "print(f\"Train loss: {train_loss:.5f}\")\n", "print(f\"Test loss: {test_loss:.5f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected output:** with `alpha = 0.01` and `iters = 2000`.\n", "\n", "| | |\n", "|---|--------------------------------------------------|\n", "| **Train loss** |9.34136 |\n", "| **Test loss** | 7.57149 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also look at how the regression line looks like." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "helpers.plot_linear_regression_2d(X=X_train, y=y_train, w=w, feature_name=\"population\", label_name=\"profit\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks pretty good! The red line is our trained model, it represents the estimated profit of our new restaurant for every population size possible. Remember that the model is 100% described by our parameters $\\mathbf{w}$ (in this case $\\mathbf{w} = [b, w_1]$). If we had chosen a $\\mathbf{w}$ that doesn't fit the model well, we would have gotten a red line that doesn't fit the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the gradient descent function also outputs a vector with the loss at each training iteration, we can plot that as well. The goal of gradient descent is to get a model that fits the data well, so we hope that the loss decreases throughout the iterations of gradient descent. Minimizing the MSE in linear regression is a convex optimization problem, so if everything goes well, it should reach a global minima." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "helpers.plot_loss(loss)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the plot had shown a non-decreasing function, it would have raised questions about the validity of our implementation of gradient descent. In any case, it's always good practice to plot this graph to see if our algorithm works as expected. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Least squares method" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It turns out that linear regression with MSE is one of these rare cases where we can compute the optimum of the loss function analytically. Let's see how:\n", "\n", "\n", "Let's start from the loss function: \n", "$$\n", "\\begin{align}\n", "J(\\mathbf{w}) = \\frac{1}{N} \\sum_{i=1}^{N} (\\hat{y}^{(i)} -y^{(i)})^{2} \\\\\n", " = \\frac{1}{N} \\sum_{i=1}^{N} (\\mathbf{w}^T \\boldsymbol{x}^{(i)} -y^{(i)})^{2} \\\\\n", "= \\frac{1}{N} (\\mathbf{X}\\mathbf{w}-\\mathbf{y})^{T}(\\mathbf{X}\\mathbf{w}-\\mathbf{y})\n", "\\end{align}\n", "$$\n", "\n", "This function is convex in $\\mathbf{w}$, so let's try to find its minimum.\n", "\n", "Take the derivative with respect to $\\mathbf{w}$: (Use http://www.matrixcalculus.org/ if necessary)\n", "$$\n", "\\frac{\\partial J(\\mathbf{w})}{\\partial \\mathbf{w}}=\\frac{2}{N} \\mathbf{X}^{\\top}(\\mathbf{X} \\mathbf{w} - \\mathbf{y})\n", "$$\n", "Set to 0 and solve:\n", "$$\n", "\\begin{align}\n", "\\frac{2}{N} \\mathbf{X}^{\\top}(\\mathbf{X} \\mathbf{w} - \\mathbf{y}) = 0 \\\\\n", "\\Leftrightarrow \\mathbf{X}^{T} \\mathbf{X} \\mathbf{w} = \\mathbf{X}^{T} \\mathbf{y}\n", "\\end{align}\n", "$$\n", "\n", "\n", "Therefore, the linear regression model has an analytical solution in the form of the normal equations:\n", "$$\\hat{\\mathbf{w}} = (\\mathbf{X}^{T}\\mathbf{X)}^{-1} \\ \\mathbf{X}^{T} \\ \\mathbf{y}$$\n", "This is known as the **least squares** method. The advantage of this method is that you can directly get the optimal weights $\\mathbf{w}$ from this short matrix expression.\n", "\n", "Please use this solution to complete the function `least_squares` and to obtain the weight parameters $\\mathbf{w}$. \n", "\n", "**Note:** Use `np.linalg.solve` to solve a linear matrix equation, as it is more stable and more accurate than computing the inverse. You can find the documentation for this method [here](https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def least_squares(X: np.ndarray, y: np.ndarray) -> np.ndarray:\n", " \"\"\"Solves linear regression using least squares\n", "\n", " Args:\n", " X: Data of shape (N, D)\n", " y: Labels of shape (N, )\n", "\n", " Returns:\n", " Weight parameters of shape (D, )\n", " \"\"\"\n", "\n", " ### START CODE HERE ### (≈ 1 line of code)\n", " w = ...\n", " ### END CODE HERE ###\n", " return w\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls_w = least_squares(X_train, y_train)\n", "print(ls_w)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_loss = mse_loss(X_train, y_train, ls_w)\n", "test_loss = mse_loss(X_test, y_test, ls_w)\n", "print(f\"Train loss: {train_loss:.5f}\")\n", "print(f\"Test loss: {test_loss:.5f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected output:** \n", "\n", "| | |\n", "|---|--------------------------------------------------|\n", "| **Train loss** | 9.34135 |\n", "| **Test loss** | 7.57472 |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "helpers.plot_linear_regression_2d(X=X_train, y=y_train, w=ls_w, feature_name=\"population\", label_name=\"profit\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question:** Compare the loss and plot obtained with least-squares to the loss and plot obtained with gradient descent. What can you say about these two methods, is the end result similar?\n", "\n", "**Answer:** YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Prediction\n", "Based on the weights ($\\mathbf{w}$), we just computed and the linear model, let's define a function `predict`, which we'll use to give a prediction of the expected restaurant profit ($\\hat{y}$) based on the population ($x$)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def predict(X, w):\n", " \"\"\"Predicts value using linear regression weights\n", "\n", " Args:\n", " X: Dataset (without the offset) of shape (M, D)\n", " w: Weights (with bias term) of shape (D,)\n", "\n", " Returns:\n", " Predictions of shape (M, )\n", " \"\"\"\n", " ### START CODE HERE (≈ 1 line of code)\n", " y_hat = ...\n", " ### END CODE HERE\n", " return y_hat" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# What's the predicted profit in a city of 10'000 inhabitants.\n", "expected_profit = predict([1, 10], w)\n", "print(f\"A new restaurant in a city of 10'000 inhabitants has an expected profit of {expected_profit*1000:.2f} $.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Multiple Linear Regression\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we're tasked with implementing linear regression with multiple features to predict the price of an house. We'll see that the code implemented in the previous parts works just as well for multiple features.\n", "\n", "*Background: Suppose you want to buy a new house and you want to figure out if its price is too low or too high based on the current house market. You know the number of rooms and the size of the house you want to buy, and you are going to predict the market price of the house based on these two features.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6.1. House Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we'll use a dataset containing information on houses in Portland, Oregon. This dataset consists of 47 houses with their respective size (in thousands of sqft), number of bedrooms and their respective price (in USD). Take a look at the file `house_data.csv` and see how it's loaded by running the cell below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "house_df = pd.read_csv('data/house_data.csv')\n", "\n", "print(f\"There are {house_df.shape[0]} rows and {house_df.shape[1]} columns.\")\n", "# Show the first 5 rows of the data\n", "house_df.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to predict the price of a house using its size and number of bedrooms. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We'll use 80% of our data as training data and the remaining 20% as test data\n", "# Here, we use a random seed to ensure that the data shuffling and splitting can be reproduced\n", "X_train_mult, y_train_mult, X_test_mult, y_test_mult, feature_names_mult = helpers.preprocess_data(house_df, label=\"price\", train_size=0.8, seed=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As in the simple linear regression case, we'll first add a constant term to our training data for the intercept." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train_mult = add_constant(X_train_mult)\n", "X_test_mult = add_constant(X_test_mult)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f\"Features: {feature_names_mult}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This time, there are several features. To be exact, we have 2 features and we can plot according to one feature at a time, to see how each feature correlates to the target variable `y`.\n", "\n", "Run the following cell with `feature_num = 1` and then with `feature_num = 2` (`0` is the constant term)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "feature_num = 1\n", "plt.scatter(X_train_mult[:,feature_num], y_train_mult)\n", "\n", "plt.ylabel(\"price\")\n", "if feature_num == 0:\n", " plt.xlabel(\"constant\")\n", " plt.title(\"constant vs price\")\n", "else:\n", " plt.xlabel(f\"{feature_names_mult[feature_num - 1]}\")\n", " plt.title(f\"{feature_names_mult[feature_num - 1]} vs price\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also possible to plot the target variable according to both features. \n", "\n", "Using `plot_data_3d` from `helpers.py`, we can generate 3D plot that shows the training and test set according to both features. \n", "- You can toggle each dataset on or off by clicking on the legend (upper left). \n", "- You can also interact with the plot, zoom in and out, and see it through different angles. Try to carefully choose the view angle in order to get the equivalent of the 2 plots above (cancel one dimension)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "helpers.plot_data_3d(X_train=X_train_mult, y_train=y_train_mult, X_test=X_test_mult, y_test=y_test_mult, feature_names=feature_names_mult, label_name=\"price\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6.2 Training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 6.2.1. Gradient Descent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you have implemented the function `gradient_descent` correctly in section 1, you should be able to use it for multiple features.\n", "\n", "Try to call `gradient_descent(X_train_mult, y_train_mult, np.zeros((X_train_mult.shape[1], )), 0.02, 5000)`. \n", "If it doesn't work, go back to your `gradient_descent` function in section 1, write in matrix form, rerun the function cell and try to call it again with the above parameters. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "w_mult, loss = gradient_descent(X_train_mult, y_train_mult, np.zeros((X_train_mult.shape[1], )), 0.02, 5000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, as usual, we can compute the loss of our newly trained model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_loss = mse_loss(X_train_mult, y_train_mult, w_mult)\n", "test_loss = mse_loss(X_test_mult, y_test_mult, w_mult)\n", "print(f\"Train loss: {train_loss:.1f}\")\n", "print(f\"Test loss: {test_loss:.1f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected output:** \n", "\n", "| | |\n", "|---|--------------------------------------------------|\n", "| **Train loss** | 4320753675.3 |\n", "| **Test loss** | 3673378116.2 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 6.2.2. Least squares" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If `least_squares` is implemented correctly, it should also be able to work without any modification for multiple features." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls_w_mult = least_squares(X_train_mult, y_train_mult)\n", "print(ls_w_mult)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_loss = mse_loss(X_train_mult, y_train_mult, ls_w_mult)\n", "test_loss = mse_loss(X_test_mult, y_test_mult, ls_w_mult)\n", "print(f\"Train loss: {train_loss:.1f}\")\n", "print(f\"Test loss: {test_loss:.1f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected output:** \n", "\n", "| | |\n", "|---|--------------------------------------------------|\n", "| **Train loss** | 4320753667.6 |\n", "| **Test loss** | 3673281817.0 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This output is a realistic one. Taking the square root of the test loss we get an approximation of the average difference between our model prediction and the reality (the Root Mean Square Error, or RMSE)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "average_difference = np.sqrt(test_loss)\n", "print(\"The average difference between the predicted price and the actual price of a house (on the test set) is\",average_difference,\"$.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When using MSE, this is a good practice to make sense of the result loss in a tangible way in order to evaluate if the model performs well or not. A big loss doesn't necessarily translates to a poor model. For example, here, our model has a ~18% relative error (because the average house price is 340,000$). If we had obtain the same loss for a model predicting a variable of average value 65 000, the same loss would have translated to a ~92% relative error." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6.3. Plotting the regression surface" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that our model is trained, we can plot the regression surface using `plot_surface_3d` from `helpers`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "helpers.plot_surface_3d(w=w_mult,\n", " X_train=X_train_mult, \n", " y_train=y_train_mult, \n", " X_test=X_test_mult, \n", " y_test=y_test_mult,\n", " feature_names=feature_names_mult, \n", " label_name=\"price\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question:**\n", "What are your thoughts on this regression fit?\n", "\n", "**Answer:**\n", "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Congratulations on finishing this exercise! In the next exercise, we'll take a look at classification with logistic regression." ] } ], "metadata": { "kernelspec": { "display_name": "Python", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }