Improving matplotlib plots
In this tutorial we will learn, how to make default matplotlib plots look more appealing with just a few extra commands.
Let’s create some dummy data:
import numpy as np
import matplotlib.pyplot as plt
import pingouin as pg
# Generate some random dummy data:
Group_A = np.random.randn(10)*10+15
Group_B = np.random.randn(10)*10+2
# bar-plot:
fig=plt.figure(1, figsize=(4,6))
fig.clf()[1, 2], [Group_A.mean(), Group_B.mean()],
color=["blue", "orange"])
plt.xticks([1,2], labels=["A", "B"])
plt.title("A bar-plot")
plt.xlim([0.5, 2.5])
First, we change the bar-plot to a dot-plot, which provides a better visual impression of the data distributions. We will also adjust the fontsizes:
fig=plt.figure(1, figsize=(4,6))
xVals = np.ones(Group_A.shape[0])
# Group A data:
plt.plot(xVals, Group_A, 'o', markeredgecolor="blue",
markerfacecolor="blue", markersize=20, alpha=0.5)
plt.plot(1, Group_A.mean(), 'o', markeredgecolor="k",
markerfacecolor="white", markersize=20)
# Group B data:
plt.plot(xVals+1, Group_B, 'o', markeredgecolor="orange",
markerfacecolor="orange", markersize=20, alpha=0.5)
plt.plot(2, Group_B.mean(), 'o', markeredgecolor="k",
markerfacecolor="white", markersize=20)
plt.xticks([1,2], labels=["A", "B"], fontsize=16)
plt.xlabel("Groups", fontsize=16)
plt.ylabel("measurments", fontsize=16)
plt.title("A dot-plot", fontsize=22, fontweight="normal")
plt.xlim([0.5, 2.5])
We also increased the discernability of the individual data points via the alpha
value, which controls the transparency. The transparency also had an effect on the plot colors, which became a bit muted and look less like matplotlib’s default color definitions.
Next, let’s
- remove parts of the black bounding box
- change the thickness of the remaining bounds
- increase the size of the ticks
ax.tick_params(width=2, length=10)
fig=plt.figure(1, figsize=(4,6))
# Group A data:
plt.plot(xVals, Group_A, 'o', markeredgecolor="blue",
markerfacecolor="blue", markersize=20, alpha=0.5)
plt.plot(1, Group_A.mean(), 'o', markeredgecolor="k",
markerfacecolor="white", markersize=20)
# Group B data:
plt.plot(xVals+1, Group_B, 'o', markeredgecolor="orange",
markerfacecolor="orange", markersize=20, alpha=0.5)
plt.plot(2, Group_B.mean(), 'o', markeredgecolor="k",
markerfacecolor="white", markersize=20)
plt.xticks([1,2], labels=["A", "B"], fontsize=16)
plt.xlabel("Groups", fontsize=16)
plt.ylabel("measurments", fontsize=16)
plt.title("A dot-plot", fontsize=22, fontweight="normal")
# control the black bound box and tick sizes:
ax = plt.gca() # get current axis
ax.tick_params(width=2, length=10)
plt.xlim([0.5, 2.5])
While changing the transparency works best when you want to visualize multiple datapoints (e.g., in dot- and scatter-plots, multiple line plots), removing parts of the black bounding box and increasing the fontsizes work well for almost any matplotlib plot.
Add statistical annotations to your plot
Let’s perform a simple statistical test:
# let's check the statistics:
stats_results = pg.ttest(Group_A, Group_B, paired=False)
p_val = stats_results["p-val"].values[0].round(4)
print(f"p-value: {p_val}")
p-value: 0.0163
We now annotate our plot with the result from the statistical test:
fig=plt.figure(1, figsize=(4,6))
# Group A data:
plt.plot(xVals, Group_A, 'o', markeredgecolor="blue",
markerfacecolor="blue", markersize=20, alpha=0.5)
plt.plot(1, Group_A.mean(), 'o', markeredgecolor="k",
markerfacecolor="white", markersize=20)
# Group B data:
plt.plot(xVals+1, Group_B, 'o', markeredgecolor="orange",
markerfacecolor="orange", markersize=20, alpha=0.5)
plt.plot(2, Group_B.mean(), 'o', markeredgecolor="k",
markerfacecolor="white", markersize=20)
# statistical annotations:
h = 36 # height of the horizontal bar
annotation_offset = 0.5 # offset of the stats-annotation
plt.plot([1, 2], [h, h], '-k', lw=3)
plt.text(1.5, h+annotation_offset,
ha='center', va='bottom', fontsize=14)
plt.xticks([1,2], labels=["A", "B"], fontsize=16)
plt.xlabel("Groups", fontsize=16)
plt.ylabel("measurments", fontsize=16)
plt.title("A dot-plot", fontsize=22, fontweight="normal")
# control the black bound box and tick sizes:
ax = plt.gca() # get current axis
ax.tick_params(width=2, length=10)
plt.xlim([0.5, 2.5])
plt.ylim([-22, 40])
Exercise 1
- Implement an if-statement to check for normality and let your script choose the correct significance test (Student’s t-test or Mann-Whitney-U). Hint: Pingoiun has a normality-check function as well as the corresponding test-functions.
Complete the decision tree for the stars-annotation. You can use the notation convention from GraphPad:
Symbol Meaning n.s. $p\gt$0.05 $\mbox{*}$ $p\le0.05$ $\mbox{**}$ $p\le0.01$ $\mbox{***}$ $p\le0.001$ $\mbox{****}$ $p\le0.0001$ - Replace the p-value by the stars-annotation in the plot.
- Display the p-value below the stars-annotation.
# Your solution 1 here:
signficance tets applied: ttest p-value: 0.0163
Toggle solution
</script>Further readings
You can find a bunch of other visualization hacks in the free online book Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures by Calus O. Wilkeꜛ (O’Reilly, 2019)