It’s December 2020 and it’s time to check out different python plotting packages. I primarily will use pandas dataframe plotting with either seaborn’s sns.set()
or setting plotly as the plotting backend during data exploration, but there’s nothing in python that matches R’s ggplot2
. I’ve known about plotnine as the python equivalent but last time I tried it it was still lacking in the feature department. I’m happy to say that it’s totally viable now.
One chart that I often used in R was a ‘tie fighter’ (error bar) chart. It’s handy to compare accuracy for various predictive models. Here’s a motivating example from ‘thecodeforest’ where he compares various time serious forecasting models in R. I’m also motivated to stick with ggplot because then I’d be able to quickly make plots in R, python, and in Julia (given how seamless RCall
is in Julia and native support of dataframes).
Plotnine
Here’s what the syntax for a quick tie fighter chart in plotnine
using the included mtcars
dataset as a toy example:
from plotnine import *
from plotnine.data import mtcars
import pandas as pd
import numpy as np
from scipy.stats import sem
# Calc mean, std error of mpg by number of cylinders and whether they're automatic (0) or manual (1)
= mtcars.groupby(['cyl', 'am'])['mpg'].agg([np.mean, sem]).reset_index()
df
= (ggplot(df, aes('factor(cyl)', 'mean', color='factor(am)'))
p + geom_point()
+ geom_errorbar(aes(ymin = 'mean - sem', ymax = 'mean + sem'))
+ coord_flip()
+ labs(title="A Tie Fighter Chart",
= 'Cylinders',
x = 'Mean MPG +/-1 SE')
y
)
p
And what I love about ggplot2
(and plotnine
) is the consistent syntax and how easy it is to add or remove customization and complexity compared to the other plotting libraries in python:
# Now it's trivial to add or remove stuff to our plots, even in python!
+ theme(xkcd) p
One thing we’re still missing is the ability to do subtitles and captions, but maybe someday.
Altair
Altair is another opinionated plotting package based on Vega-lite that has a very consistent syntax that’s worth a look.
Here’s the same example as above, but in Altair:
import altair as alt
from plotnine.data import mtcars
= alt.Chart(mtcars).mark_point().encode(
points ='average(mpg):Q',
x='cyl:O',
y='am:N',
color=200)
).properties(height
= alt.Chart(mtcars).mark_errorbar(extent='stderr', ticks=True).encode(
error_bars =alt.X('average(mpg):Q', scale=alt.Scale(zero=False)),
x=alt.Y('cyl:O'),
y='am:N',
color
)
+error_bars points
Notice that Altair
has baked in aggregation and transformations so I didn’t have to do my groupby
and agg
in pandas
. And similar to what I like in ggplot2
/plotnine
there’s the ability to quickly add and remove complexity. But as you can see the result was so small on my macbook pro, and I wish there was an easy way to scale the whole thing up to a reasonable aspect ratio (looks like I’d have to scale up each component separately or save as SVG).