Skip to content

Pandas - Data Visualization

For a very quick analysis of our pandas data frame we can use some of the built in methods for pandas objects.

import numpy as np
import pandas as pd
%matplotlib inline

The Data

We'll use some fake data csv files that we can read in as dataframes.

Downloads: df1.csv, df2.csv

df1 = pd.read_csv('df1',index_col=0)
df2 = pd.read_csv('df2')

Graph Graph

Graph Graph

Style Sheets

Matplotlib has style sheets you can use to make your plots look a little nicer. These style sheets include plot_bmh, plot_fivethirtyeight, plot_ggplot and more. They basically create a set of style rules that your plots follow.

Here is how to use them.

Before plt.style.use() your plots look like this:

df1['A'].hist()

Graph Graph

We can change the style as follows:

import matplotlib.pyplot as plt
plt.style.use('ggplot')

Now the plot looks like this

df1['A'].hist(rwidth=0.9)

Graph Graph

Other options are bmh, dark_background, fivethirtyeight. Let's stick with the ggplot style and actually show you how to utilize pandas built-in plotting capabilities!

Plot Types

There are several plot types built-in to pandas, most of them statistical plots by nature:

  • df.plot.area
  • df.plot.barh
  • df.plot.density
  • df.plot.hist
  • df.plot.line
  • df.plot.scatter
  • df.plot.bar
  • df.plot.box
  • df.plot.hexbin
  • df.plot.kde
  • df.plot.pie

You can also just call df.plot(kind='hist') or replace that kind argument with any of the key terms shown in the list above (e.g. box,bar, etc..)

df2.plot.area(alpha=0.4)

Graph Graph

df2.plot.bar()

Graph Graph

df2.plot.bar(stacked=True)

Graph Graph

df1['A'].plot.hist(bins=50)

Graph Graph

df1.plot.scatter(x='A',y='B')

Graph Graph

You can use c to have the colour based off another column value Use cmap to indicate colormap to use.

df1.plot.scatter(x='A',y='B',c='C',cmap='coolwarm')

Graph Graph

Or use s to indicate size based off another column. s parameter needs to be an array, not just the name of a column:

df1.plot.scatter(x='A',y='B',s=df1['C']*100)

Graph Graph

df2.plot.box() # Can also pass a by= argument for groupby

Graph Graph

A hexagonal bin plot is useful for bivariate data, and is an alternative to scatterplot.

df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
df.plot.hexbin(x='a',y='b',gridsize=25,cmap='Oranges')

Graph Graph

df2['a'].plot.kde()

Graph Graph

df2.plot.density()

Graph Graph


Exercises

Use the df3 data set to replicate the following plots.

Download: df3.csv

import pandas as pd
import matplotlib.pyplot as plt
df3 = pd.read_csv('df3')
%matplotlib inline
df3.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 4 columns):
a    500 non-null float64
b    500 non-null float64
c    500 non-null float64
d    500 non-null float64
dtypes: float64(4)
memory usage: 15.7 KB
df.head()

Graph Graph

  1. Recreate this scatter plot of b vs a. Note the colour and size of the points. Also note the figure size. See if you can figure out how to stretch it in a similar fashion. Remember back to your matplotlib lecture.
    Graph Graph

  2. Create a histogram of the 'a' column.
    Graph Graph

  3. These plots are okay, but they don't look very polished. Use style sheets to set the style to 'ggplot' and redo the histogram from above. Also figure out how to add more bins to it.
    Graph Graph

  4. Create a boxplot comparing the a and b columns.
    Graph Graph

  5. Create a kde plot of the 'd' column
    Graph Graph

  6. Figure out how to increase the linewidth and make the linestyle dashed. (Note: You would usually not dash a kde plot line).
    Graph Graph

  7. Create an area plot of all the columns for just the rows up to 30. (hint: use .loc)
    Graph Graph

  8. Note, you may find this really hard, reference the solutions if you can't figure it out! Notice how the legend in our previous figure overlapped some of actual diagram. Can you figure out how to display the legend outside of the plot as shown below?
    Try searching Google for a good stackoverflow link on this topic. hint Graph Graph