{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "EE-311\n", "======\n", "\n", "Lab 5: Dimensionality Reduction\n", "----------------------------------------\n", "\n", "created by Zahra Farsijani and François Marelli on 25.03.2020\n", "\n", "# Homework\n", "\n", "The file `homework.py` contains the homework of the week. It contains empty functions that must be completed according to the instructions.\n", "\n", "When the homework is completed, it must be submitted on Moodle for grading.\n", "\n", "**Do not change the function definitions in the file!**\n", "\n", "## Data generation\n", "\n", "PCA does not take the labels into account when computing the principal components of a dataset. This can lead to a dataset of reduced dimensionality with poor classification performance in some cases (fortunately, this is not frequent).\n", "\n", "In order to illustrate this effect, you are given a 2D dataset that is exactly linearly separable, but that becomes non separable when reduced to 1D using PCA. In that situation, using PCA actually decreases the performance of our model! Have a look at the data in the next cell, and how the distribution of classes can fool the PCA. Where is the axis of maximum variance? And which axis should we use for classification?" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "import homework\n", "import importlib\n", "importlib.reload(homework)\n", "\n", "N = 100\n", "data_X, data_y = homework.generate_data(N)\n", "\n", "plt.figure(figsize=(10, 7))\n", "plt.grid()\n", "\n", "for cls in range(2):\n", " plt.scatter(data_X[data_y==cls,0], data_X[data_y==cls,1], label='y={}'.format(cls))\n", " \n", "plt.legend(loc=2, framealpha=1)\n", "plt.axis('equal')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PCA and correlation coefficient\n", "\n", "One possible way to avoid the previous situation is to combine PCA with a filtering method (such as Pearson's Correlation coefficient filter).\n", "\n", "The PCA is then computed normally, but instead of ranking the principal components accordint to the amplitude of the corresponding eigenvalues, they are ranked according to the chosen scoring criterion.\n", "\n", "This is what you are asked to implement in the homework.\n", "\n", "*You can follow these steps:*\n", "\n", "1. Perform PCA analysis on the data\n", "\n", "2. Project the data onto the obtained principal vectors without reducing dimension\n", "\n", "3. Compute the correlation filter on the projected data\n", "\n", "4. Select the important principal components according to this correlation criterion\n", "\n", "5. Build the projection matrix (for dimensionality reduction) using the selected vectors\n", "\n", "## Test cell\n", "\n", "Run this cell to check whether your homework passes the test." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import unittest\n", "import testing\n", "\n", "import importlib\n", "importlib.reload(testing)\n", "\n", "unittest.main(module=testing, argv=['first-arg-is-ignored'], exit=False)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 4 }