"+ `data_raw`: contains all rows and columns (headers removed in the .csv)\n",
"+ `data`: output resulting after selecting only energy consumption columns, applying row cleaning (wrong measurement times are deleted), NaN handling, discarding columns representing small buildings and renumbering of the columns, keeping track of the original ones."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape of raw dataset: \n",
" (68555, 835)\n"
]
}
],
"source": [
"# Load the raw dataset\n",
"data_path = \"data/data.csv\"\n",
"data_raw = load_data(data_path)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"FORMATTING DATA...\n",
"Index (datetime) unique: \n",
" True\n",
"Shape of formatted dataset: \n",
" (65856, 145)\n",
"First measurement: \n",
" 2018-01-01 00:00:00\n",
"Last measurement: \n",
" 2019-11-17 23:45:00\n",
"\n",
"REMOVING NANS...\n",
"Number of dropped columns: \n",
" 32\n",
"Shape of dataset after NaN handling: \n",
" (65856, 113)\n",
"\n",
"REMOVING SMALL BULIDINGS...\n",
"Number of dropped columns: \n",
" 8\n",
"Shape of dataset after removing small buildings: \n",