{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Basic Visualization\n", "In this tutorial we show how Python and its graphics libraries can be used to create the two most common types of distributional plots: histograms and boxplots.\n", "\n", "## Preliminaries\n", "I include the data import and library import commands at the start of each lesson so that the lessons are self-contained.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "bank = pd.read_csv('Data/Bank.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Basic descriptive statistics\n", "Pandas provides basic descriptive statistic functions as methods of the Series object. Recall that each DataFrame object consists of multiple Series (columns). Thus, the average salary for bank employees can be found as: " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "39.921923076923086" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bank['Salary'].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, using a variable to save some typing:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(26.7, 39.921923076923086, 37.0, 97.0)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sal = bank['Salary']\n", "sal.min(), sal.mean(), sal.median(), sal.max() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, recall, we can get statistical summary of all numerical columns using the `describe()` method:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Employee | \n", "EducLev | \n", "JobGrade | \n", "YrHired | \n", "YrBorn | \n", "YrsPrior | \n", "Salary | \n", "
---|---|---|---|---|---|---|---|
count | \n", "208.000000 | \n", "208.000000 | \n", "208.000000 | \n", "208.000000 | \n", "208.000000 | \n", "208.000000 | \n", "208.000000 | \n", "
mean | \n", "104.500000 | \n", "3.158654 | \n", "2.759615 | \n", "85.326923 | \n", "54.605769 | \n", "2.375000 | \n", "39.921923 | \n", "
std | \n", "60.188592 | \n", "1.467464 | \n", "1.566529 | \n", "6.987832 | \n", "10.318988 | \n", "3.135237 | \n", "11.256154 | \n", "
min | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "56.000000 | \n", "30.000000 | \n", "0.000000 | \n", "26.700000 | \n", "
25% | \n", "52.750000 | \n", "2.000000 | \n", "1.000000 | \n", "82.000000 | \n", "47.750000 | \n", "0.000000 | \n", "33.000000 | \n", "
50% | \n", "104.500000 | \n", "3.000000 | \n", "3.000000 | \n", "87.000000 | \n", "56.500000 | \n", "1.000000 | \n", "37.000000 | \n", "
75% | \n", "156.250000 | \n", "5.000000 | \n", "4.000000 | \n", "90.000000 | \n", "63.000000 | \n", "4.000000 | \n", "44.000000 | \n", "
max | \n", "208.000000 | \n", "5.000000 | \n", "6.000000 | \n", "93.000000 | \n", "73.000000 | \n", "18.000000 | \n", "97.000000 | \n", "