{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Correlation and Scatterplots\n", "In this tutorial we use the \"concrete strength\" data set to explore relationships between two continuous variables." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preliminaries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NoCementSlagFly ashWaterSPCoarse Aggr.Fine Aggr.Air EntrainmentCompressive Strength (28-day)(Mpa)
01273.082.0105.0210.09.0904.0680.0No34.990
12163.0149.0191.0180.012.0843.0746.0Yes32.272
23162.0148.0191.0179.016.0840.0743.0Yes35.450
34162.0148.0190.0179.019.0838.0741.0No42.080
45154.0112.0144.0220.010.0923.0658.0No26.820
.................................
9899248.3101.0239.1168.97.7954.2640.6No49.970
99100248.0101.0239.9169.17.7949.9644.1No50.230
100101258.888.0239.6175.37.6938.9646.0Yes43.461
101102297.140.9239.9194.07.5908.9651.8Yes44.639
102103348.70.1223.1208.59.6786.2758.1No48.770
\n", "

103 rows × 10 columns

\n", "
" ], "text/plain": [ " No Cement Slag Fly ash Water SP Coarse Aggr. Fine Aggr. \\\n", "0 1 273.0 82.0 105.0 210.0 9.0 904.0 680.0 \n", "1 2 163.0 149.0 191.0 180.0 12.0 843.0 746.0 \n", "2 3 162.0 148.0 191.0 179.0 16.0 840.0 743.0 \n", "3 4 162.0 148.0 190.0 179.0 19.0 838.0 741.0 \n", "4 5 154.0 112.0 144.0 220.0 10.0 923.0 658.0 \n", ".. ... ... ... ... ... ... ... ... \n", "98 99 248.3 101.0 239.1 168.9 7.7 954.2 640.6 \n", "99 100 248.0 101.0 239.9 169.1 7.7 949.9 644.1 \n", "100 101 258.8 88.0 239.6 175.3 7.6 938.9 646.0 \n", "101 102 297.1 40.9 239.9 194.0 7.5 908.9 651.8 \n", "102 103 348.7 0.1 223.1 208.5 9.6 786.2 758.1 \n", "\n", " Air Entrainment Compressive Strength (28-day)(Mpa) \n", "0 No 34.990 \n", "1 Yes 32.272 \n", "2 Yes 35.450 \n", "3 No 42.080 \n", "4 No 26.820 \n", ".. ... ... \n", "98 No 49.970 \n", "99 No 50.230 \n", "100 Yes 43.461 \n", "101 Yes 44.639 \n", "102 No 48.770 \n", "\n", "[103 rows x 10 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "con = pd.read_csv('Data/ConcreteStrength.csv')\n", "con" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Renaming columns\n", "Recall the the column names in the \"ConcreteStrength\" file are problematic: they are too long to type repeatedly, have spaces, and include special characters like \".\". Although we could change the name of the columns in the underlying spreadsheet before importing, it is generally more practical/less work/less risk to leave the organization's spreadsheets and files as they are and write some code to fix things prior to analysis. In this way, you do not have to start over when an updated version of the data is handed to you.\n", "\n", "Let's start by listing the column names. A Pandas DataFrame object exposes a list of columns through the `columns` property. Here I use the `list()` type conversion method to convert the results to a simple list (which prints nicer):" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['No',\n", " 'Cement',\n", " 'Slag',\n", " 'Fly ash',\n", " 'Water',\n", " 'SP',\n", " 'Coarse Aggr.',\n", " 'Fine Aggr.',\n", " 'Air Entrainment',\n", " 'Compressive Strength (28-day)(Mpa)']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(con.columns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `rename()` method for data frames is straightforward. Here I define a standard Python dictionary (of the form {key1: value1, key2: value2, ... }) and assign it to the \"columns\" axis. Remember that the `inplace=True` argument is required if we want to make changes to the underlying data frame." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NoCementSlagFlyAshWaterSPCoarseAggFineAggAirEntrainStrength
01273.082.0105.0210.09.0904.0680.0No34.990
12163.0149.0191.0180.012.0843.0746.0Yes32.272
23162.0148.0191.0179.016.0840.0743.0Yes35.450
34162.0148.0190.0179.019.0838.0741.0No42.080
45154.0112.0144.0220.010.0923.0658.0No26.820
\n", "
" ], "text/plain": [ " No Cement Slag FlyAsh Water SP CoarseAgg FineAgg AirEntrain \\\n", "0 1 273.0 82.0 105.0 210.0 9.0 904.0 680.0 No \n", "1 2 163.0 149.0 191.0 180.0 12.0 843.0 746.0 Yes \n", "2 3 162.0 148.0 191.0 179.0 16.0 840.0 743.0 Yes \n", "3 4 162.0 148.0 190.0 179.0 19.0 838.0 741.0 No \n", "4 5 154.0 112.0 144.0 220.0 10.0 923.0 658.0 No \n", "\n", " Strength \n", "0 34.990 \n", "1 32.272 \n", "2 35.450 \n", "3 42.080 \n", "4 26.820 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "con.rename(columns={'Fly ash': 'FlyAsh', 'Coarse Aggr.': \"CoarseAgg\",\n", " 'Fine Aggr.': 'FineAgg', 'Air Entrainment': 'AirEntrain', \n", " 'Compressive Strength (28-day)(Mpa)': 'Strength'}, inplace=True)\n", "con.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As before, we should convert any obvious categorical variables to categories:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AirEntrain
count103
unique2
topNo
freq56
\n", "
" ], "text/plain": [ " AirEntrain\n", "count 103\n", "unique 2\n", "top No\n", "freq 56" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "con['AirEntrain'] = con['AirEntrain'].astype('category')\n", "con.describe(include='category')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['No',\n", " 'Cement',\n", " 'Slag',\n", " 'FlyAsh',\n", " 'Water',\n", " 'SP',\n", " 'CoarseAgg',\n", " 'FineAgg',\n", " 'AirEntrain',\n", " 'Strength']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(con.columns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scatterplots\n", "Scatterplots are a fundamental graph type—much less complicated than histograms and boxplots. As such, we might use the Mathplotlib library instead of the Seaborn library. But since we have already used Seaborn, I will stick with it here. Just know that there are many ways to create scatterplots and other basic graphs in Python.\n", "\n", "To create a bare-bones scatterplot, we must do four things:\n", "1. Load the seaborn library\n", "2. Specify the source data frame\n", "3. Set the _x_ axis, which is generally the name of a predictor/independent variable\n", "4. Set the _y_ axis, which is generally the name of a response/dependent variable" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "sns.scatterplot(x=\"FlyAsh\", y=\"Strength\", data=con);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding labels\n", "To this point, we have not said much about decorating Seaborn charts with labels and other details. This is because Seaborn does a pretty good job by default. But if we do need to clean up our charts a bit, here is the thing to know: the Seaborn chart methods return an object (of type AxesSubplot, whatever that is) for which properties can be set.\n", "\n", "Here I assign the results of the `scatterplot()` call to a variable called `ax` and then set various properties of `ax`. I end the last line of the code block with a semicolon to suppress return values:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEWCAYAAABhffzLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAomUlEQVR4nO3deZhcdZ3v8fe3s5J0EsjeLEkTaAEDgjGCIOHBxHFidIaMyjYuqGjGuWIyjzNXcJkZx2VE75WRKHOvQR3jhqAsMooM3qAShs0EwhKjAiEJCZ0VskJn6f7eP86ppNKpU1XdVafqLJ/X8/TT1aeqTv1+VdXf8zvf33LM3RERkfxoaXYBRESksRT4RURyRoFfRCRnFPhFRHJGgV9EJGcU+EVEckaBX6RJzOy7ZvaFZpejFDO70MzWN/g13cxObuRr5pUCf06Y2V+b2TIz221mnWb2SzM7v9nlKlbrP76ZXWRmK8xsp5ltNbMlZtYe3vdZM/tB3Qrb97K938zub9brRwnf8z3h92K3mW1vdpkkfgObXQCJn5l9HLgG+AjwX8A+YDZwEdCQYGRmA939QIz7Pxn4HvAO4F6gFXgL0FPl8w0wd6/q8Rlzprs/0+xCSAO5u34y/AOMAnYDF5d5zBDga8AL4c/XgCHhfRcC64G/BzYDncAHip57FPBVYC2wg+BAchTQDjhwJbAOuC98/AeBVcBLBAehyeH2+8LH7wnLe2m4/e3ACmA78ADwmog6vAtYEXHfbIKD3f5w34+H238DfBH4b+AV4GTgVOBXwIvAH4FLivbzXeAG4BfALuBh4KSi+98SPmcH8O/Ab4EPAacBXUB3+Prbq9lfrzrcDVzVa9vjBAc6A/4t/Hx2AE8Ap1f5/XDg5BLbLwTWh7f/J3Brr/u/DnwtYp/XAM+Gdfo98FdF950cvi87gK3Azb3K8hHg6fD7cQPBwbjp/0dZ+2l6AfQT8wccBL0DwMAyj/kc8BAwHhgXBtjPh/ddGD7/c8AgYA7wMnBMeP8NYQA9DhgAnEdwIGkP/5G/BwwnOBjMBZ4JA+FA4DPAA0XlOCwIAdPCYHZOuO8rgDWEB6VedZgSBtd/A94EtPa6/7PAD3pt+w3BQWlqWJ5RwPPAB8K/p4XBaWr4+O8SHBDODu//IfDj8L6xwM4wEA8EFhAcaD4U3v9+4P5erx+5vxL1ex/w30V/v5rgYDgE+HNgOXA0wUHgNKCtyu9HNYG/jeCAfHT498Dwc3ldxD4vBo4lSCVfGj63LbzvJuDT4X1DgfN7leXnYT0mAVuA2c3+H8rij3L82TcG2Orl0yzvBj7n7pvdfQvwL8B7i+7fH96/393vImi1nmJmLQQt+AXuvsHdu939AXffW/Tcz7r7Hnd/Bfgb4Evuviosz78CZ5nZ5IhyfRj4prs/HO57MbAXeEPvB7r7aoJgdRxwC7A17DxtrfD+fNfdV4blmQ2scff/cPcD7v4ocCvB2UTBbe7+SPj4HwJnhdvnACvd/bbwvoXAxgqvXW5/vd3O4e/Vu8Pn7iX4fEYQnK1Y+P52VvHaBY+a2fbwZ2HvO8N93UcQ0CF4n7a6+/JSO3P3n7j7C+7e4+43E7Tgzw7v3g9MBo519y53751qvNbdt7v7OuDXRL8fUgMF/uzbBow1s3L9OccSpGoK1obbDu6j14HjZYIc+liCVtuzZfb9fNHtycD1hSBD0No1gmBdymTg74uC0nbghF5lO8jdH3L3S9x9HDADuICgdVlO7/Kd0+v13g1MLHpMcTAvvA+EZTq4L3d3ghRZJVH7O4y77yJICV0WbrqM4ECBu98LfIPg7GuTmS0ys5FVvHbBNHc/OvyZH/GYxcB7wtvvAb4ftTMze1/YyV54D08n+K4AfILgM3/EzFaa2Qd7Pb2q90Nqo8CffQ8SpEDmlnnMCwRBr2BSuK2SreG+TyrzmOLlX58H/qYoyBzt7ke5+wMRz30e+GKvxw9z95sqFczdfwfcRhB0epejXPl+2+v1Wt39byu9HkHfx/GFP8LO4uOL7q/HMrg3AZeb2bkEqbNfH9y5+0J3fx1B2upVBHn5eroDeI2ZnU7Q7/LDUg8Kz0huBK4Cxrj70cBTBMEed9/o7h9292MJzgD/XUM4G0+BP+PcfQfwT8ANZjbXzIaZ2SAze6uZfSV82E3AZ8xsnJmNDR9fceijByNgvgNcZ2bHmtkAMzvXzIZEPOX/Ap80s6kAZjbKzC4uun8TQa6+4EbgI2Z2jgWGm9nbzGxE7x2b2flm9mEzGx/+fSrwlwR9F4V9t4fpqSg/B15lZu8N36NBZvZ6Mzut0ntB0Bo/I3yPBwIf5fAzhU3A8WY2uIp9RbmL4AD9OYJO0R6AsIznmNkggnx6oSO5bty9C/gp8CPgkTAVU8pwgoPclrBsH+DQwRczu9jMCgfEl8LH1rWsUpkCfw64+3XAxwk6U7cQtGyvImjFAXwBWEYwGuRJ4NFwWzX+IXzO7whSN18m4nvl7reH9//YzHYStATfWvSQzwKLwxTBJe6+jCDP/w2CIPEMQSdpKdsJAv2TZrabYBTM7UDh4PaT8Pc2M3s0ony7CEbmXEZwxrMxLG/Ugaz4uVsJcuBfIUivvZrgPS30d9wLrAQ2mtnWSvuLeI29BGcxbyYIwAUjCQ6SLxGk6bYB/xvAzD5lZr/sz+uVsBg4gzJpHnf/PcEorwcJDnZnEIyaKng98HD4Gd1J0D/0XJ3KJ1WyIBUpIvUUnlmsB97t7r+u9Pg0MLNJwB+Aie6+s9nlkf5Ti1+kTszsz83s6DDV9SmCvPZDFZ6WCuGB7OMEw00V9FNOM3dF6udcghTMYIKJS3PDYaypZmbDCdI2awmGckrKKdUjIpIzSvWIiORMKlI9Y8eO9fb29mYXQ0QkVZYvX741nNB4mFQE/vb2dpYtW9bsYoiIpIqZrS21XakeEZGcUeAXEckZBX4RkZxR4BcRyRkFfhGRnEnFqB4RqU1Pj7Nm2x427exiwsihtI8ZTkuLJX7fEg8FfpGM6+lx7l65kY/fsoKu/T0MHdTCdZecxeypE2sO0HHuW+KjVI9Ixq3ZtudgYAbo2t/Dx29Zwe/WvMjqLbvp6en/si1R+16zbU9dyi7xUOAXybhNO7sOBuaCrv09LH1mK3MWLuXulRv7Hfyj9r15V1e/yyvxU+AXybgJI4cydNDh/+pDB7XgXnsLPWrf40cM7Xd5JX4K/CIZ1z5mONddctbBAD10UAvzZ3Zw26PBteBraaGX2vd1l5xF+5jh9Sm8xEKduyIZ19JizJ46kVPnz2Dttj089vx2vv/QWjp3BMG+lhZ68b437+pi/AiN6kkDBX6RHGhpMaaMa6V9zHBe2d/DSy/vA+rTQi/se8q41noVV2KmwC+SI2qhCyjwi+SOWuiizl0RkZxR4BcRyRmlekRE+iHNaxTFGvjNbA2wC+gGDrj7dDMbDdwMtANrgEvc/aU4yyEiUk8HDvTwi6c6ufrWJ1K5RlEjUj1vcvez3H16+Pc1wBJ37wCWhH+LiKRCT4/zwOptB4M+pG+Nombk+C8CFoe3FwNzm1AGEZF+WbNtD8vWvpjqNYriDvwO3GNmy81sXrhtgrt3AoS/x5d6opnNM7NlZrZsy5YtMRdTRKQ6m3Z20eOkeo2iuAP/G919GvBW4KNmdkG1T3T3Re4+3d2njxs3Lr4Sioj0wYSRQ/nPxzcwf2bHYWsUffmdr0nNGkWxdu66+wvh781mdjtwNrDJzNrcvdPM2oDNcZZBRKSe2scM5+rZp/Hlu1dx5flTGNAC0yeP5rwpY1LRsQsxBn4zGw60uPuu8PZbgM8BdwJXANeGv38WVxlERArqNfzy4LIXE0ekdtmLOFv8E4DbzazwOj9y97vN7HfALWZ2JbAOuDjGMoiI1P0SkWlf9iK2wO/uq4EzS2zfBsyK63VFRHqLukTkqfNnpDZ410JLNohI5ukSkYdT4BeRzNMlIg+nwC8imadLRB5Oi7SJSObpAjSHU+AXkVxI+0icelKqR0QkZxT4RURyRoFfRCRnFPhFRHJGgV9EJGcU+EVEckbDOUUyJs0XAZfGUOAXSZhaAne9V6GUbFKqRyRBCoF7zsKlXH7jw8xZuJS7V26kp8eren7UKpRpuQi4NIYCv0iC1Bq4tQqlVEOBXyRBag3cWoVSqqHAL5IgtQZurULZOD09zuotu3nw2a2s3rK76nRcEqhzVyRBCoG7d+dstYFbq1A2Rto70c09+Uep6dOn+7Jly5pdDJGGKIzqUeBOrmc37+ZtX196WFpu6KAWfvGxGZw0Pjmrf5rZcnef3nu7WvySS0ke656U5YOT/B4129oX95Tsi1n34p5EBf4oCvySO2k/TW8EvUfltQ4ZyNBBLUe0+FuHpCOkqnNXcqe/QybT3JnXV5oPUN6wwQNYMKvjsE70BbM6OGrwgCaXrDrpODyJ1FG5IZNR6ZW8tYD78x7lya6uA/zyyU6+8q4zeWXfAYYNHsiN9z3LWScc3eyiVUWBX3KnMGSy92l6uSGTUS3gU+fPyGQg7M97lCdto4by1jPa+MRPHz/YEFgwq4O2Uel4f5TqkdwpN9Y9Kp2Ttxmxmg9QXncPXL/k6cMaAtcveZrungpPTAi1+CV3osa6A5HpnLy1gDUfoLzNu0o3BLbs7krFqB61+CWXCkMm3zBlLFPGtdLSYmU7NPPYAi71Hkkg7UtjqMUvEqrUoakWsBTUOsO62RT4RUKV0jn1mFiVh0lReahj2lNhCvwiobhbcaWGhP7rX53BtElHM2l0eoJGOXkb9gqQglVvjqC1ekSKxLlOzuotu5mz8Mj1XeZdMIVTJ47MRHCMquNdGRv2mpYDXNRaPercFSkSZ4dmVB9Cj5OZWbF5GPba0+M8uWF7qmc2K9UjUmfFOe5hgweyr7ubMcOH0DaqdB+Ce31mxSYht571Ya+Flv4fNu5M9cxmtfhF6qj3NXMvXfQgv3vuJT7w3Uf4fecuvvHXrz1sSOj8mR3c9uj6moNjrdfqrZesD3stDPntcVI9nFM5fpE6ispxX3n+FL59/2p+8bEZAKzauJM/bdrFT5at56WX99WcH05Sbj3L1xN48NmtXH7jw7SNGsp73zCZhfc+ncocv1I9InUUleM2OzSz8w1TxnLi2OG8um0k5500pi7BMUmLqiXlegJxKKSyOnd08f2H1nLl+VMY0AKzTh3PGccdnaigX45SPSJ1FDWj0730nIB6dSKnfSZpWhSnsjp3dPHt+1dz6sSRsQT9OJcBV6pHpI5KDfObP7ODm5et4+rZp8WWCuj9upPHHMXnLzqDQQMss5OomqURqax6DReNSvXEHvjNbACwDNjg7m83s9HAzUA7sAa4xN1fKrcPBX5plHqMjDl8VM8A9nf3MHr4kNiDb+F1X9yzlw3bu7j61icSnX+WaPXqs2lmjn8BsAoYGf59DbDE3a81s2vCv69uQDlEyqpXK6tZOe7C6wK859uP5ObaAVkUd59NrDl+MzseeBvwraLNFwGLw9uLgblxlkGkWlm53GAeJlFlXdx9NnF37n4N+ARQ/C2c4O6dAOHv8aWeaGbzzGyZmS3bsmVLzMUUqV/AbPa1edXRm35xz4eILdVjZm8HNrv7cjO7sK/Pd/dFwCIIcvz1LZ3Ikeox6zQJa7ikfclgiX/1z9g6d83sS8B7gQPAUIIc/23A64EL3b3TzNqA37j7KeX2pc5daYR6BO2kTKSqx8iTJCwBIbVpeOeuu38S+GT44hcC/+Du7zGz/wVcAVwb/v5ZXGUQ6Yt6tLIqdco1Opj2t12XhDMXiU8zZu5eC9xiZlcC64CLm1AGkZJqHZETlS4a1zqUNVt38+i67Xzq9idjDab1CNpRHd0aGZQNDZm56+6/cfe3h7e3ufssd+8If7/YiDKINEJUp9xz23Zz22MbDgZ9iG/UUD1GJ2lkUHnN7sCvldbqEamjUumiFoPZ1y/lQzOmNGQ9nXqMAc/68sq1yEIaTGv1iNRZ73V4OnccCsSNGGZZj+GcWV9euRZZmO+hwC8Ss0IgvnX5eubP7Ig9mNYjaBfOXO6aP4MfzzuHu+bPSFWLNk5ZSIMp1SOZVmkUTSNG2RSPq//+Q2uZd8EUXjVhBKdNHMmJY+v/evUaA57l5ZVrkYU0mFbnlMyqlIttZK42yxcnyZs05fibtjpnPSjwS39UmkyVlMlWkj5pOZDrClySO5VGtyTpqlVx0yzc+kp7GkyBXzKrUi42C7naaqQpNSGNoVE9klmVRrfkZchiFoYfSn2pxS+ZVWl0S9wrIJbTyNRLnlJajZTm9JkCv2RapVxsM3K1jU695CWl1UhpT58p1SPSYI1OveQlpdVIaU+fqcUvudPsU/RGp16akdJq9nsct007uzhm2GDeMe14LKzWrcvXpyZ9psAvuZKEU/RmpF4amdJKwnsct7ZRQ3nfuZO5fsnTB+u4YFYHE0ce/hkm9QBYdarHzI4zs/PM7ILCT5wFE6lWX5bITcIpetZTL0l4j+PW3cPBoA9BHa9f8jTdRSdyhQPgnIVLufzGh5mzcCl3r9yYiCWcq2rxm9mXgUuB3wPd4WYH7oupXCJV6WvrMgkjXJo5mqgRkvAex23zrtJ13LK7i5PGB3VM8sVsqk31zAVOcfe9MZZFpM/6+s+VlBEuaZ/5WU5S3uM4VVPHJB8Aq031rAYGxVkQkf7o6xK5WU+zJEEe3uNq6liP6yLEpWyL38y+TpDSeRlYYWZLgIOtfnefH2/xRA4p1VHW19Zl1tMsSZCH97iaOhYvx12chkzCAbDs6pxmdkWZ57q7f6/+RTqSVueUqFz+W06bwD2rNmV6BImkV7NX8axpWWYzW+Du11faFhcFfim3hHL7mOGpWCJX0impQzKrUeuyzFcAvYP8+0tsE4lFpY6yrHaUpkWag2M5UWeabz5lPKs27aRzRxdto45iattIBg5Mz0IIlXL8lwN/DZxoZncW3TUC2BZnwUSK5WGkSD01MhBnecJWqVFj37n/WfbsPcA//uypg/X9wtzTmXvmcakJ/pVa/A8AncBY4KtF23cBT8RVKJHektxRljSNDsRJHq9eq1Jnmu87bwqf+Onjh9X3M3c8Rcf4Vs484ZhmFLPPygZ+d18LrAXObUxxRErLw0iReml0IE7yePValTrTfGXfgZL13bijizNPaHQJ+6eq8xIz22VmO3v9PG9mt5vZlLgLKQKHJj29YcpYpoxrrTno92WphzTp69yGWiV5vHqtSo3XnzRmWMn6ThyVnvpW27l7HfAC8CPAgMuAicAfge8AF8ZRuFpktbNJ6iPLeelG94dkOQ1XfKa5aWdw4Nzw0sv8819M5V/+c+VhOf6pbaOaXNrqVTuc82F3P6fXtofc/Q1m9ri7nxlbCen7cM4s/1NLfZQbHpr29EQzvv+9x6tPOmYY6156OVMNr9VbdnPHig0sum81rxrfyocuOIlX9h1g+OCBnDR+GKdMGJW4OtY6nLPHzC4Bfhr+/a6i+xJ3fpzlziapjyznpZvRH1K89lBWG16bdnbR48H35IkNO5l/02MH75s/62Se2/pKaupY7dijdwPvBTYDm8Lb7zGzo4CrYipbvxUukvDRN53MVTODn2OGDY4txynpk+W8NNS/P6Qvsros84SRQxlglPzedPeQqjpWFfjdfbW7/4W7j3X3ceHtZ9z9FXe/P+5C9lXhIgnfvn8137j3Gb61dDXvO3fyERdJkPzKw0JizdLozuVGaR8znDOOH8WCWR2HfW/mz+zgtkfXp6qO1a7HPw74MNBe/Bx3/2A8xapN1EUS3vLqiU0umSSFhofGJ6uT7VpajJmnTODkca28bvIx3P/MVrp74PsPraVzR1eq6lhtjv9nwFLg/3HoQiyJFXWRhM27Dl0kQSTLa+I3U9ZH+bSPbWXS6OFs3b2vT3VM0kjDagP/MHe/OtaS1NGwwQNLtjiGDR7QxFKJ5ENezqamHjuC733gbLbu2ctxo45i6rHRo3qS1uFdbefuz81sTqwlqaN93d3Mn3lkHm5fd0+FZ4rULqsTw/qimZ3Lcevpce794ybuenIj7/uPR/gfP3yMS298iHtWbYr8rJPW4V1ti38B8Ckz2wfsI5jE5e4+MraS1WDwgAHcvGwdV54/BTNwh5uXreONJ49pdtEk45LWspP6W7NtD0+s38Gi+1ZXPWQ8acOHqx3VM8LdW9x9qLuPDP9OZNCHoMX/wfNOZEBYu4Et8MHzTmS/WvwSs6S17KT+isfzFys3qidpw4erXavHzOw9ZvaP4d8nmNnZ8Rat/8a1DmFvdw+L7guGc37zvtXs7e5hbOuQZhdNMi6rQxnlkHLj+aMCedKGD1eb6vl3oAeYCXwe2A3cALw+6glmNhS4DxgSvs5P3f2fzWw0cDPB0NA1wCXu/lI/y19Sdw9c96s/Hdbquu5Xf2LWqRPq+TIiR8jqUEY5pHg8f2HYeKVAnrQO72oD/znuPs3MHgNw95fMbHCF5+wFZrr7bjMbBNxvZr8E3gEscfdrzewa4BqgriOGNu8KZu6+Y9rxWPi+3rp8PVt2azinxCvLQxklUDyef9qkY3h53wEmjR7OiWPLB/IkDR+uNvDvN7MBhOvyhBO6yibMPVj9bXf456Dwx4GLOLSa52LgN9Q58Bdm7hYfjRfM6tDMXYld0lp2Eo/CeP72saWDeJLG7JdSbeBfCNwOjDezLxIs0vaZSk8KDxbLgZOBG9z9YTOb4O6dAO7eaWbjI547D5gHMGnSpCqLGdDMXWmmJLXspLw4AnQaRnZV7Nw1sxbgOeATwJcILsU4191/Uum57t7t7mcBxwNnm9np1RbM3Re5+3R3nz5u3LhqnwZEz9zdslsdbCISKAToOQuXcvmNDzNn4VLuXrmx5nkXaRjZVTHwu3sP8FV3/4O73+Du33D3VX15EXffTpDSmQ1sMrM2gPD35j6XuoKkDZ0Skfj0d8JcXAE6DSO7qp25e4+ZvdPMqj5PMbNxZnZ0ePso4M3AH4A7gSvCh11BsA5QXSVt6JSIxKOWVntcAToNDc9qc/wfB4YDB8ysi+pm7rYBi8M8fwtwi7v/3MweBG4xsyuBdcDF/S9+aepgE8mHWi66FNfQ2zSM7Koq8Lv7iL7u2N2fAF5bYvs2YFZf99dX6mATyb5alkKoJUCX6xROQ8Oz2vX4l7j7rErbkiTpw6lEpHa1tNr7G6CrGbWT9IZn2Ry/mQ0NZ9qONbNjzGx0+NMOHNuQEvZDXL31IpIstfbn9WcV0TSM2qmkUov/b4C/Iwjyy4u27yJYsiGRdLF1kXxoRlolaStt9kelwP8AcAvwLnf/upldAbyTYI2dH8Vctn6L+mA27UzPByMi1aklrdKflHAW1mOqNJzzm8DeMOhfQDCBazGwA1gUd+H6q3AFrmJDB7UwbJCuwCXJpwu5NEZ/U8JZGC5eqcU/wN1fDG9fCixy91uBW81sRawlq8HOrn3Mn9nBwnsPrdUzf2YHu/bub3bRRMpKw3T/rOhvSjgNo3YqqRj4zWygux8gGII5rw/PbZqoK3C9bvKZzS6ayEGl0gzqn2qcWnL1SR+1U0ml4H0T8Fsz2wq8AiwFMLOTCdI9iTRh5BAue/2kI1bnnDBSF2KRZIhq2R8zbFAiOw6zODw6C7n6/iob+N39i2a2hGAW7j3hUssQ9A18LO7C9dek0cPpmNDKvAum0OPQYtAxoZVJo9OTg5Nsi2rZ3zzv3NiCUX+Dd1rST32tXxpm2MalYrrG3R8qse1P8RSnPgoXSpgytjW1OTjJtqg0w/7u7liCUS3BOw3pp/7ULwu5+v5KbJ6+VmnPwUl6VdPyjEozjB4+hGmTRtc9GNUSvNMwbr2Wjto8xonMBn6RZqi25VkuzRBHMKoleKchF97og1OpgzuQmn4QBX6ROqq25dnoNEMtwTsNufBGHpyiDu6DBxpX/eixRPeDFFS7Hr+IVKEva7z3Z52Y/qpl0lHhIHXX/Bn8eN453DV/xmEBLQkTzho5qSrq4P7E+h2pWb9HLX6ROkpqWqTWM4yo9FNSRvw08gwq6uDe+3iXtH6QYmrxi9RRkqfzx3GGkaSVKht1BhV1ha3eL5eEA34Utfgl0dI2cSipQwTjeh/TMOKn3qL6PAYPtINne0k64JeiwC+JlZQ0Ql8lbYhgnO9jUlNbcYo6uAPclbADfhSleiSxkpRGSLM438ckp7biVCqt1MjO+lqpxS+Jlcc0QhzifB+TmtqS8hT4JbHymEaIQ9zvY9JSW1JZZlM9SRhbLLXJaxqh3rLyPup/un7s0IKbyTV9+nRftmxZ1Y9Pa6egHKkwGkVphNqk/X3U/3T/mNlyd59+xPYsBv7VW3YzZ+HSI05t70rQaoIiedbX4aX6n+6fqMCfyRz/xh262LrkW5LnP/Sn9a6O/vrKZI5/yMCWkjPrBg3IZHVFDtPfi4g3Sn+Gl0bNllVHf/9kMhLuCC+2XtyZNX9mB7u69jW5ZCLxS/r8h74sZFeQlQ7qpMhkqmfM8KHc+4dn+Mq7zuSVvQcYNmQgix9YzRtPntrsoonELulpkf4ML9V8gfrKZOA/bcIILnn9ZD7x08cP5hA/d9HpnDZhZLOLJhK7pM9/6O/6/povUD+ZHNXz7ObdvO3rR44A+MXHZnDSeH1pJNvSMPQx7cNL0yJXo3rWvrin5Knuuhf3KPBL5qUhLaLWe3NlMvAPHzyw5KnusMGZrK7IERRYpZxMjuqZMHIIC2YdPqpnwawOJowc0uSSiYg0XyabwJNGD6djQivzLphCj0OLQceEViaN1tAvEamPJE+SqySTgb+lxZh5ygSmjG1NbI5T8iPNAUJKS0MHejmZTPVA8MHs6trP9pf3s6vrQGJmLUq+JH0WrfRP0ifJVZLJwH/gQA93PL6BSxc9xEd+8CiXLnqQOx7fwIEDPZWfLFJHaQ8QWVbLMs/9mX2cJJlM9azs3MFn7njqsH+2z9zxFB3jWznzhGOaXDrJk6TPos2rWlM1SZ8kV0lsLX4zO8HMfm1mq8xspZktCLePNrNfmdnT4e+6R+LOiNU5N+5Ix9FYskOLiyVTrWdiaV87KM4W/wHg7939UTMbASw3s18B7weWuPu1ZnYNcA1wdT1fuG3UUSWPxhNH6Z9NGqu/yxNIvGo9E0vDJLlyYgv87t4JdIa3d5nZKuA44CLgwvBhi4HfUOfAP7VtJF+Ye/rBdM/QQS18Ye7pTG0bVc+XEako7QEiq+qRqknzJLmGrNVjZu3AfcDpwDp3P7rovpfc/Yh0j5nNA+YBTJo06XVr167t02seONDDys4dbNzRxcRRQ5naNoqBAzPZly0ifZT24ZjVatqlF82sFfgt8EV3v83MtlcT+Iv1dZE2EZFK8rBQXFMWaTOzQcCtwA/d/bZw8yYza3P3TjNrAzbHWQYRkVLSnKqpVZyjegz4NrDK3a8ruutO4Irw9hXAz+Iqg4iIHCnOFv8bgfcCT5rZinDbp4BrgVvM7EpgHXBxjGUQEZFe4hzVcz8QlTCbFdfrFmh9FGm2wndw2569DB7Qwsv7uvVdlETI5MzdvPTYS/80olFQ+A5++e5VXDp9EgvvfVrfRUmMTI5v1PooEqVRi6YVvoNvf81xB4M+6LsoyZDJwJ/2BZQkPo1qFBS+g2bouyiJk8nAr/VRJEqjGgXF30F9FyVpMhn4076AksSnUY2CwnfwPx/fwPyZHfouSqI0ZMmGWvVn5m4eZuVJ3zWy47/wHXxxz14GaVSPNEHTlmyoBy3ZIPWkRoHkRVOWbBBJojxP1ReBjOb4RUQkmgK/iEjOKPCLiOSMAr+ISM4o8IuI5IwCv4hIzijwi4jkTGbH8Ws9fhGR0jIZ+LUev4hItEymerQev4hItEwGfq3HLyISLZOBX+vxi4hEy2Tg13r8IiLRMtm529JizJ46kVPnz9DSuyIivWQy8IOW3hURiZLJVI+IiERT4BcRyRkFfhGRnFHgFxHJGQV+EZGcyeyoHi3SJiJSWiYDvxZpExGJlslUjxZpExGJlsnAr0XaRESiZTLwa5E2EZFomQz8WqRNRCRaJjt3tUibiEi0TLb4i7k3uwQiIsmSyRa/hnOKiETLZItfwzlFRKLFFvjN7DtmttnMniraNtrMfmVmT4e/j4njtTWcU0QkWpwt/u8Cs3ttuwZY4u4dwJLw77rTcE4RkWixBX53vw94sdfmi4DF4e3FwNw4XlvDOUVEojW6c3eCu3cCuHunmY2PeqCZzQPmAUyaNKlPL6LhnCIi0RI7qsfdFwGLAKZPn97nQZm65q6ISGmNHtWzyczaAMLfmxv8+iIiudfowH8ncEV4+wrgZw1+fRGR3ItzOOdNwIPAKWa23syuBK4F/szMngb+LPxbREQaKLYcv7tfHnHXrLheU0REKsvkzF0REYlmnoJVzMxsC7C2n08fC2ytY3GSLC91zUs9IT91zUs9obF1nezu43pvTEXgr4WZLXP36c0uRyPkpa55qSfkp655qScko65K9YiI5IwCv4hIzuQh8C9qdgEaKC91zUs9IT91zUs9IQF1zXyOX0REDpeHFr+IiBRR4BcRyZlMB34zm21mfzSzZ8wslou+NIuZrTGzJ81shZktC7c15Apncevr1dvM7JPhZ/xHM/vz5pS67yLq+Vkz2xB+rivMbE7RfWmt5wlm9mszW2VmK81sQbg9i59pVF2T9bm6eyZ/gAHAs8AUYDDwOPDqZperjvVbA4ztte0rwDXh7WuALze7nP2s2wXANOCpSnUDXh1+tkOAE8PPfECz61BDPT8L/EOJx6a5nm3AtPD2COBPYX2y+JlG1TVRn2uWW/xnA8+4+2p33wf8mOAKYFnWkCucxc37dvW2i4Afu/ted38OeIbgs0+8iHpGSXM9O9390fD2LmAVcBzZ/Eyj6hqlKXXNcuA/Dni+6O/1lP8A0saBe8xseXi1Muh1hTMg8gpnKRRVtyx+zleZ2RNhKqiQ/shEPc2sHXgt8DAZ/0x71RUS9LlmOfCXus5ilsauvtHdpwFvBT5qZhc0u0BNkrXP+f8AJwFnAZ3AV8Ptqa+nmbUCtwJ/5+47yz20xLa01zVRn2uWA/964ISiv48HXmhSWerO3V8If28Gbic4PczyFc6i6papz9ndN7l7t7v3ADdy6LQ/1fU0s0EEgfCH7n5buDmTn2mpuibtc81y4P8d0GFmJ5rZYOAygiuApZ6ZDTezEYXbwFuAp8j2Fc6i6nYncJmZDTGzE4EO4JEmlK8uCoEw9FcEnyukuJ5mZsC3gVXufl3RXZn7TKPqmrjPtdm94DH3sM8h6FV/Fvh0s8tTx3pNIRgJ8DiwslA3YAywBHg6/D262WXtZ/1uIjgd3k/QIrqyXN2AT4ef8R+Btza7/DXW8/vAk8ATBEGhLQP1PJ8gffEEsCL8mZPRzzSqron6XLVkg4hIzmQ51SMiIiUo8IuI5IwCv4hIzijwi4jkjAK/iEjOKPBLrplZd9GKiSvMrN3MLjSzn8f0eu3Fq3GKNMPAZhdApMlecfezijeEa6yIZJZa/CIRzKwlXCt+XNHfz5jZ2F6PO9vMHjCzx8Lfp4Tbp5rZI+GZxBNm1hE+ZYCZ3Riu136PmR3V4KpJzinwS94dVZTmub34Dg/WVfkB8O5w05uBx919a699/AG4wN1fC/wT8K/h9o8A14dnFNMJZudCMC3/BnefCmwH3lnfKomUp1SP5N0RqZ5evkOwhszXgA8C/1HiMaOAxWGL3oFB4fYHgU+b2fHAbe7+dLCUC8+5+4rwMcuB9tqqINI3avGLlOHuzxOsIjkTOAf4ZYmHfR74tbufDvwFMDR87o+AvwReAf4r3AfA3qLndqMGmDSYAr9IZd8iSPnc4u7dJe4fBWwIb7+/sNHMpgCr3X0hwcJcr4m5nCJVUeAXqexOoJXSaR4Irh37JTP7b4JrPRdcCjxlZiuAU4HvxVlIkWppdU6RCsxsOvBv7j6j2WURqQflFkXKMLNrgL/l0MgekdRTi19EJGeU4xcRyRkFfhGRnFHgFxHJGQV+EZGcUeAXEcmZ/w/01ISuf5uaGwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = sns.scatterplot(x=\"FlyAsh\", y=\"Strength\", data=con)\n", "ax.set_title(\"Concrete Strength vs. Fly ash\")\n", "ax.set_xlabel(\"Fly ash\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding a best fit line\n", "As we saw with SAS Enterprise Guide and R, it is sometimes useful to add a best fit line (with confidence intervals around the slope) to a scatterplot. But let's be clear: this is not one of these situations. It is obvious from the scatterplot above that the relationship between concrete strength and fly ash is only weakly linear.\n", "\n", "The easiest way to \"add\" a best-fit line to a scatterplot is to use a different plotting method. Seaborn's `lmplot()` method (where \"lm\" stands for \"linear model\") is one possibility:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.lmplot(x=\"FlyAsh\", y=\"Strength\", data=con);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding color as a third dimension\n", "A graphics \"party trick\" made fashionable by tools like Tableau is to use color, size, or some other visual cue to add a third dimension to a two-dimensional scatterplot. In the case of color (or \"hue\" in Seaborn terminology), this third dimension need to be a non-continuous variable. This is because the palette of colors available has a finite number of options." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.lmplot(x=\"FlyAsh\", y=\"Strength\", hue=\"AirEntrain\", data=con);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Coefficient of correlation\n", "A correlation coefficient (typically denoted _r_) is a single number that describes the extent of the **linear** relationship between two variables. A value of +1 indicates perfect linearity (the two variables move together, like \"height in inches\" and \"height in centimeters\"). A value of _r_ = 0 indicates no correlation (the variables are independent) and _r_ = -1 indicates the variables are inversely correlated (an increase in one variable is associated with a decrease in the other).\n", "\n", "Like many other statistics (measures derived from raw data), there are slightly different ways to calculate the correlation coefficient that are more or less sensitive to outliers and other characteristics of the data. The most common measure is the Pearson correlation coefficient. The Scipy library provides a method called `pearsonr()` (Pearson's _r_)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.4063870105954507, 2.0500713273946373e-05)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from scipy import stats\n", "stats.pearsonr(con['Strength'], con['FlyAsh'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, it is not the prettiest result. But, if we were so inclined, we could write the results to a data frame and apply whatever formatting in Python we wanted to. In this form, however, we get two numbers:\n", "1. Pearson's _r_ (0,4063---same as we got in Excel, R, etc.)\n", "2. A _p_-value. This is the probability that the true value of _r_ is zero (no correlation).\n", "\n", "We conclude based on this that there is weak linear relationship between concrete strength and fly ash but not so weak that we should conclude the variables are uncorrelated. In other words, it seems that fly ash does have _some_ influence on concrete strength.\n", "\n", "Of course, correlation does not imply causality. It is equally correct, based on the value of _r_, to say that concrete strength has some influence on the amount of fly ash in the mix. But hopefully we are worldly enough to know _something_ about mixing up a batch of concrete and can generally _infer_ causality, or at least directionality. That is, we use our domain knowledge to help interpret statistical results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Corrleation matrix\n", "A correlation matrix is a handy way to calculate the pairwise correlation coefficients between two or more (numeric) variables. The Pandas data frame has this functionality built-in to its `corr()` method, which I have wrapped inside the `round()` method to keep things tidy. Notice that every correlation matrix is symmetrical: the correlation of \"Cement\" with \"Slag\" is the same as the correlation of \"Slag\" with \"Cement\" (-0.24). Thus, the top (or bottom, depending on your preferences) of every correlation matrix is redundant. The correlation between each variable and itself is 1.0, hence the diagonal." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NoCementSlagFlyAshWaterSPCoarseAggFineAggStrength
No1.00-0.03-0.080.34-0.14-0.330.22-0.310.19
Cement-0.031.00-0.24-0.490.22-0.11-0.310.060.46
Slag-0.08-0.241.00-0.32-0.030.31-0.22-0.18-0.33
FlyAsh0.34-0.49-0.321.00-0.24-0.140.17-0.280.41
Water-0.140.22-0.03-0.241.00-0.16-0.600.11-0.22
SP-0.33-0.110.31-0.14-0.161.00-0.100.06-0.02
CoarseAgg0.22-0.31-0.220.17-0.60-0.101.00-0.49-0.15
FineAgg-0.310.06-0.18-0.280.110.06-0.491.00-0.17
Strength0.190.46-0.330.41-0.22-0.02-0.15-0.171.00
\n", "
" ], "text/plain": [ " No Cement Slag FlyAsh Water SP CoarseAgg FineAgg \\\n", "No 1.00 -0.03 -0.08 0.34 -0.14 -0.33 0.22 -0.31 \n", "Cement -0.03 1.00 -0.24 -0.49 0.22 -0.11 -0.31 0.06 \n", "Slag -0.08 -0.24 1.00 -0.32 -0.03 0.31 -0.22 -0.18 \n", "FlyAsh 0.34 -0.49 -0.32 1.00 -0.24 -0.14 0.17 -0.28 \n", "Water -0.14 0.22 -0.03 -0.24 1.00 -0.16 -0.60 0.11 \n", "SP -0.33 -0.11 0.31 -0.14 -0.16 1.00 -0.10 0.06 \n", "CoarseAgg 0.22 -0.31 -0.22 0.17 -0.60 -0.10 1.00 -0.49 \n", "FineAgg -0.31 0.06 -0.18 -0.28 0.11 0.06 -0.49 1.00 \n", "Strength 0.19 0.46 -0.33 0.41 -0.22 -0.02 -0.15 -0.17 \n", "\n", " Strength \n", "No 0.19 \n", "Cement 0.46 \n", "Slag -0.33 \n", "FlyAsh 0.41 \n", "Water -0.22 \n", "SP -0.02 \n", "CoarseAgg -0.15 \n", "FineAgg -0.17 \n", "Strength 1.00 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cormat = con.corr()\n", "round(cormat,2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Correlation matrix to heat map\n", "Python, and its libraries, make lots of things easy. For example, once the correlation matrix is defined (I assigned to the variable `cormat` above), it can be passed to Seaborn's `heatmap()` method to create a heatmap (or headgrid). The basic idea of heatmaps is that they replace numbers with colors of varying shades, as indicated by the scale on the right. Cells that are lighter have higher values of _r_. This type of visualization can make it much easier to spot linear relationships between variables than a table of numbers. For example, if I focus on the \"Strength\" column, I immediately see that \"Cement\" and \"FlyAsh\" have the largest positive correlations whereas \"Slag\" has the large negative correlation." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.heatmap(cormat);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 2 }