Table of Contents
Motivation
If you’re new to Python visualization, the vast number of libraries and examples available might seem overwhelming. Some popular libraries for visualization include Matplotlib, seaborn, Plotly, Bokeh, Altair, and Pygal.
When visualizing a DataFrame, choosing the right library can be challenging as different libraries excel in specific cases.
This article will show the pros and cons of each library. By the end, you will gain a better understanding of their distinct features, making it easier for you to select the optimal library.
💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!
Key Takeaways
Here’s what you’ll learn:
- Master the strengths and limitations of 6 essential Python visualization libraries
- Choose the optimal library based on your project requirements and complexity
- Create interactive dashboards with Plotly and Bokeh for business intelligence
- Leverage seaborn’s statistical plots to analyze data relationships with minimal code
- Deploy lightweight SVG visualizations using Pygal for responsive web applications
Quick Reference
Before diving into detailed examples, here’s a comprehensive comparison to help you choose the right visualization library for your project:
Feature | Matplotlib | seaborn | Pygal | Plotly | Altair | Bokeh |
---|---|---|---|---|---|---|
Code Complexity | High | Low | Low | Medium | Medium | Medium-High |
Interactivity | None (static) | None (static) | Basic hover | Advanced | Grammar-based | Advanced |
Chart Types | Extensive (50+) | Common plots | Basic (14 types) | Extensive (50+) | Statistical focus | Extensive |
Web Integration | Poor | Poor | Good (SVG) | Excellent | Good | Excellent |
Customization | High | Limited | Medium | High | Medium | High |
Dependencies | Moderate | Moderate | Minimal | Heavy | Moderate | Moderate |
Quick Decision Guide
Here is a quick decision guide to help you choose the right visualization library for your project:
Choose Matplotlib when: Creating publication-quality static plots, need unlimited customization control, working on academic papers or research
Choose seaborn when: Making statistical visualizations quickly, want beautiful plots with minimal code, working with pandas DataFrames
Choose Pygal when: Building lightweight web applications, need SVG vector graphics that scale perfectly, want minimal dependencies and fast loading
Choose Plotly when: You need interactive visualizations with hover tooltips, zooming, and clickable legends, or when building web dashboards and applications that require user engagement with the data
Choose Altair when: Doing statistical data exploration, want grammar-of-graphics approach, need linked/coordinated visualizations, working primarily in Jupyter notebooks
Choose Bokeh when: Building complex interactive web applications, need linked plots and advanced interactions, want fine-grained control over web deployment, creating custom visualization tools
Matplotlib
Matplotlib is probably the most common Python library for visualizing data. Almost everyone interested in data science has likely utilized Matplotlib at least once.
Pros
Easy to interpret data properties
When analyzing data, it’s often helpful to get a quick overview of its distribution.
For example, if you want to examine the distribution of the top 100 users with the most followers, Matplotlib is typically sufficient.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
new_profile = pd.read_csv(
"https://gist.githubusercontent.com/khuyentran1401/98658198f0ef0cb12abb34b4f2361fd8/raw/ece16eb32e1b41f5f20c894fb72a4c198e86a5ea/github_users.csv"
)
top_followers = new_profile.sort_values(by="followers", axis=0, ascending=False)[:100]
fig = plt.figure()
plt.bar(top_followers.user_name, top_followers.followers)
plt.show()
Despite Matplotlib’s suboptimal x-axis representation, the graph provides a clear understanding of the data distribution.
Versatility
Matplotlib is very versatile and capable of generating a wide range of graph types. The Matplotlib’s website offers comprehensive documentation and a gallery of various graphs, making it easy to find tutorials for virtually any type of plot.
fig = plt.figure()
plt.text(
0.6,
0.7,
"learning",
size=40,
rotation=20.0,
ha="center",
va="center",
bbox=dict(
boxstyle="round",
ec=(1.0, 0.5, 0.5),
fc=(1.0, 0.8, 0.8),
),
)
plt.text(
0.55,
0.6,
"machine",
size=40,
rotation=-25.0,
ha="right",
va="top",
bbox=dict(
boxstyle="square",
ec=(1.0, 0.5, 0.5),
fc=(1.0, 0.8, 0.8),
),
)
plt.show()
Animation capabilities
Matplotlib provides powerful animation features through the matplotlib.animation
module, enabling dynamic visualizations that evolve over time. Here are three examples that showcase the animation capabilities:
Animated Line Plot with Real-time Data
import matplotlib.animation as animation
import numpy as np
from IPython.display import Image
fig, ax = plt.subplots()
x = np.arange(0, 2 * np.pi, 0.01)
(line,) = ax.plot(x, np.sin(x))
ax.set_ylim(-1.5, 1.5)
ax.set_title("Animated Sine Wave")
def animate(frame):
line.set_ydata(np.sin(x + frame / 10.0))
return (line,)
ani = animation.FuncAnimation(fig, animate, frames=10, interval=50, blit=True)
ani.save("sine_wave_animation.gif", writer="pillow", fps=10)
Image("sine_wave_animation.gif")
Animated Bar Chart Race
# Create sample data for animation
categories = ["Product A", "Product B", "Product C", "Product D"]
fig, ax = plt.subplots()
def animate_bars(frame):
ax.clear()
# Simulate changing data over time
values = [np.sin(frame / 10 + i) * 50 + 60 for i in range(4)]
colors = plt.cm.viridis(np.linspace(0, 1, 4))
bars = ax.bar(categories, values, color=colors)
ax.set_ylim(0, 120)
ax.set_title(f"Sales Performance - Month {frame + 1}")
# Add value labels on bars
for bar, value in zip(bars, values):
height = bar.get_height()
ax.text(
bar.get_x() + bar.get_width() / 2.0,
height + 2,
f"{value:.0f}",
ha="center",
va="bottom",
)
return bars
ani = animation.FuncAnimation(fig, animate_bars, frames=50, interval=100)
ani.save("bar_race_animation.gif", writer="pillow", fps=5)
Image("bar_race_animation.gif")
These animations demonstrate Matplotlib’s versatility for creating engaging dynamic visualizations, from scientific data trends to business dashboards.
Publication-quality output
Matplotlib excels at creating high-resolution, publication-ready visualizations suitable for academic papers, research reports, and professional presentations. The library provides precise control over figure size, DPI, and output formats (PNG, PDF, SVG, EPS), ensuring your plots meet the strict requirements of scientific journals and publications.
# Set publication-quality parameters
plt.rcParams.update({
'font.size': 10, # Standard academic paper font size
'font.family': 'serif', # Traditional serif fonts for publications
'axes.linewidth': 1.2, # Thicker axes for better print visibility
'figure.dpi': 300, # High resolution for sharp display
'savefig.dpi': 300, # High resolution for saved files
'savefig.bbox': 'tight' # Remove extra whitespace when saving
})
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle('GitHub User Analysis: Publication Ready', fontsize=14, fontweight='bold')
# Subplot 1: Distribution histogram
top_users = new_profile.sort_values('followers', ascending=False)[:50]
ax1.hist(top_users['followers'], bins=15, alpha=0.7, color='steelblue', edgecolor='black')
ax1.set_xlabel('Followers Count')
ax1.set_ylabel('Frequency')
ax1.set_title('A) Follower Distribution')
ax1.grid(True, alpha=0.3)
# Subplot 2: Correlation scatter
ax2.scatter(top_users['followers'], top_users['total_stars'], alpha=0.6, s=30)
ax2.set_xlabel('Followers')
ax2.set_ylabel('Total Stars')
ax2.set_title('B) Followers vs Stars Correlation')
ax2.grid(True, alpha=0.3)
# Save as publication-ready formats
plt.tight_layout()
plt.savefig('github_analysis.pdf', format='pdf', bbox_inches='tight')
plt.savefig('github_analysis.eps', format='eps', bbox_inches='tight')
plt.show()
Cons
Steep learning curve
Matplotlib’s extensive functionality comes with complexity. New users often find the syntax overwhelming, especially when transitioning from point-and-click visualization tools. Understanding the figure-axes hierarchy and object-oriented vs. pyplot interfaces requires significant time investment.
Extensive styling needed for publication-ready common plots
While Matplotlib supports virtually any chart type, producing visually polished versions of standard plots like histograms, scatter plots, or bar charts requires substantial customization work.
To make common visualizations suitable for presentations or sharing, you must manually style numerous elements: axis formatting, color schemes, legends, annotations, and layout spacing. Matplotlib’s low-level interface provides complete control but assumes users will configure all visual aspects from scratch.
For example, by default, the heatmap doesn’t have the x-axis and y-axis labels and annotations.
num_features = new_profile.select_dtypes("int64")
correlation = num_features.corr()
fig, ax = plt.subplots()
im = plt.imshow(correlation, cmap="coolwarm")
Creating a readable heatmap in Matplotlib requires several manual steps:
- Define the visualization layout and color scheme
- Manually position axis ticks and labels
- Loop through each cell to add number annotations with appropriate formatting
num_features = new_profile.select_dtypes("int64")
correlation = num_features.corr()
fig, ax = plt.subplots()
im = plt.imshow(correlation, cmap="coolwarm")
ax.set_xticks(np.arange(len(correlation.columns)))
ax.set_yticks(np.arange(len(correlation.columns)))
ax.set_xticklabels(correlation.columns)
ax.set_yticklabels(correlation.columns)
# Add number annotations manually
for i in range(len(correlation.columns)):
for j in range(len(correlation.columns)):
# Choose text color based on background intensity
ax.text(
j,
i,
f"{correlation.iloc[i, j]:.2f}",
ha="center",
va="center",
color="black",
)
plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
plt.tight_layout()
plt.show()
Creating a basic annotated heatmap shouldn’t require this many lines of manual setup and configuration.
Manual statistical processing required
Unlike higher-level libraries like seaborn, Matplotlib doesn’t automatically handle statistical computations or data preprocessing. You must manually perform all data grouping, filtering, aggregations, and statistical calculations before plotting.
from scipy import stats
from seaborn import load_dataset
penguins = load_dataset("penguins")
# Matplotlib requires manual statistical processing for regression analysis
# Filter out missing values manually
penguins_clean = penguins.dropna(subset=["bill_length_mm", "flipper_length_mm"])
# Manual regression calculation (no automatic trend estimation)
x = penguins_clean["bill_length_mm"].values
y = penguins_clean["flipper_length_mm"].values
# Compute regression statistics manually
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
# Manual confidence interval calculation with proper statistical formula
n = len(x)
x_mean = np.mean(x)
sxx = np.sum((x - x_mean) ** 2)
y_pred = slope * x + intercept
residuals = y - y_pred
mse = np.sum(residuals**2) / (n - 2)
std_error_regression = np.sqrt(mse)
# Create confidence bands with proper varying width
x_smooth = np.linspace(x.min(), x.max(), 100)
y_smooth = slope * x_smooth + intercept
# Calculate standard error for each point on the smooth line
se_y = std_error_regression * np.sqrt(1 / n + (x_smooth - x_mean) ** 2 / sxx)
# Use t-distribution for 95% confidence interval
t_val = stats.t.ppf(0.975, df=n - 2) # 95% confidence
ci = t_val * se_y
# Plot with manually computed statistics
plt.figure()
plt.scatter(x, y, alpha=0.6, s=20, color="#1f77b4", label="Data points")
plt.plot(x_smooth, y_smooth, color="#1f77b4", linewidth=2, label="Regression line")
plt.fill_between(
x_smooth,
y_smooth - ci,
y_smooth + ci,
alpha=0.2,
color="#1f77b4",
label="95% Confidence interval",
)
plt.xlabel("bill_length_mm")
plt.ylabel("flipper_length_mm")
plt.title("Penguins Regression (Manual statistical processing required)")
plt.legend()
plt.tight_layout()
plt.show()
Key Takeaways
Matplotlib is capable of producing any plot, but creating complex plots often requires more code compared to other libraries.
seaborn
seaborn is a Python data visualization library built on top of Matplotlib. It offers a higher-level interface, simplifying the process of creating visually appealing plots.
Pros
Reduced code
seaborn offers a higher-level interface that simplifies plot creation compared to Matplotlib. Since it’s designed specifically for pandas DataFrames, you can create attractive visualizations with minimal code.
For instance, using the same data as before, we can create a nice heatmap without explicitly setting the x and y labels:
import seaborn as sns
# Load penguins dataset for seaborn examples
penguins = sns.load_dataset("penguins")
# Use numeric features only for correlation
correlation = num_features.corr()
sns.heatmap(correlation, annot=True)
This results in a more visually appealing heatmap without the need for additional configuration.
Statistical plots with automatic processing
seaborn excels at automatically performing statistical computations and aggregations, eliminating the need for manual data processing. It handles statistical estimation, uncertainty visualization, and data transformations behind the scenes.
Automatic statistical estimation with confidence intervals:
Unlike the extensive manual statistical processing required in Matplotlib (as shown above), seaborn can create a scatter plot with trend line and confidence interval using just one function call:
# Load penguins dataset and automatically estimate trend lines with confidence intervals
sns.lmplot(
data=penguins,
x="bill_length_mm",
y="flipper_length_mm",
)
Other examples of seaborn’s statistical plots:
Automatic distribution fitting and visualization:
# Automatically computes and overlays histogram, KDE, and rug plots
sns.displot(data=penguins, x="flipper_length_mm", kde=True, rug=True)
Automatic marginal distribution and correlation computation:
# Automatically computes scatter plot, marginal histograms, and correlation coefficient
sns.jointplot(data=penguins, x="flipper_length_mm", y="body_mass_g", kind="reg")
Automatic pairwise relationships:
# Automatically creates scatter plots for all numeric pairs with marginal histograms
sns.pairplot(data=penguins, hue="species", palette="deep")
Distribution visualization with kernel density:
# Automatically fits KDE and shows distribution shape
sns.violinplot(data=penguins, x="species", y="body_mass_g", palette="coolwarm")
Statistical summary with quartiles:
# Automatically computes median, quartiles, and outliers
sns.boxplot(data=penguins, x="species", y="body_mass_g", palette="coolwarm")
These examples demonstrate how seaborn handles complex statistical processing automatically, saving significant manual computation and code complexity.
Cons
Limited customization compared to Matplotlib
seaborn excels at creating attractive plots quickly but sacrifices fine-grained control for simplicity. Advanced customizations like precise positioning, custom annotations, or non-standard modifications require dropping to matplotlib’s lower level, defeating seaborn’s simplicity purpose.
In the following example, while seaborn creates clean line plots effortlessly, adding highlighted periods and custom annotations requires matplotlib’s lower-level interface.
# Load flights dataset - contains smooth passenger trends
flights = sns.load_dataset("flights")
# seaborn creates smooth line plot
sns.lineplot(data=flights, x="year", y="passengers")
# But adding business annotations requires matplotlib
ax = plt.gca()
# Highlight jet age introduction period
ax.axvspan(1955, 1958, alpha=0.2, color='green', label='Jet Age Introduction')
# Add annotation for technical achievement
ax.annotate('Jet Engine Impact',
xy=(1957, 400),
xytext=(1952, 450),
arrowprops=dict(arrowstyle='->', color='red', lw=2),
fontsize=12, color='red', weight='bold')
# Custom legend with mixed elements
ax.legend(['Passenger Trend', 'Jet Age'], loc='upper left')
plt.title('Air Travel Growth with Historical Context')
plt.show()
Limited plot type collection
While seaborn excels at statistical plots, it lacks the breadth of Matplotlib’s visualization options for specialized scientific or custom chart types.
No interactive or animated features
seaborn is designed exclusively for static statistical visualizations and lacks any built-in support for interactivity or animations.
Key Takeaways
seaborn is a higher-level version of Matplotlib. Even though it does not have a wide collection as Matplotlib, seaborn makes popular plots such as bar plot, box plot, heatmap, etc look pretty in less code.
Pygal
Pygal is a lightweight Python library that generates scalable vector graphics (SVG) charts. Built specifically for web applications, Pygal creates interactive visualizations with minimal dependencies and extremely fast rendering.
Pros
SVG vector graphics with perfect scaling
Pygal generates pure SVG output that scales perfectly across all devices and screen sizes. Unlike bitmap images from other libraries, SVG charts maintain crisp quality at any zoom level.
This makes Pygal ideal for responsive web applications where charts need to look sharp on both mobile devices and large desktop monitors:
import pygal
# Create a bar chart showing top GitHub users by followers
top_followers = new_profile.sort_values(by="followers", ascending=False)[:10]
bar_chart = pygal.Bar(
title='Top 10 GitHub Users by Followers',
x_title='Users',
y_title='Followers'
)
bar_chart.x_labels = top_followers['user_name'].tolist()
bar_chart.add('Followers', top_followers['followers'].tolist())
# Save chart as SVG file
bar_chart.render_to_file('github_top_users.svg')
The resulting SVG can be embedded directly into HTML without additional dependencies or image files.
Built-in interactivity with hover tooltips
Every Pygal chart includes hover tooltips by default through native SVG interactivity. Users can explore data points without requiring additional JavaScript libraries, frameworks, or configuration.
The bar chart above automatically displays precise follower counts when you hover over each bar, providing instant data exploration without any setup required.
To see the chart in a browser, we can use the following code:
bar_chart.render_in_browser()
To see Pygal’s interactivity in Jupyter notebooks, include the following JavaScript dependencies:
# For interactive display in notebooks, we need to wrap the SVG with JavaScript
from IPython.display import display, HTML
def display_pygal_chart(chart):
"""Display a Pygal chart with full interactivity in Jupyter notebooks"""
html_template = """
<!DOCTYPE html>
<html>
<head>
<script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
<script type="text/javascript" src="https://kozea.github.io/pygal.js/2.0.x/pygal-tooltips.min.js"></script>
</head>
<body>
<figure>{chart}</figure>
</body>
</html>
"""
rendered = chart.render(is_unicode=True)
display(HTML(html_template.format(chart=rendered)))
# Display the chart with full interactivity (hover tooltips, etc.)
display_pygal_chart(bar_chart)
Professional styling for common chart types
One of Pygal’s key advantages is how it automatically enhances common chart types with professional aesthetics and built-in interactivity. Here are some examples:
Radar Chart – Multi-Dimensional User Comparison
Compare multiple users across different metrics simultaneously:
# Create radar chart for multi-dimensional comparison
top_5_users = new_profile.sort_values(by="followers", ascending=False)[:5]
radar_chart = pygal.Radar(title="Top 5 Users: Multi-Metric Comparison", fill=True)
# Normalize metrics to 0-100 scale for better comparison
max_followers = top_5_users["followers"].max()
max_stars = top_5_users["total_stars"].max()
max_forks = top_5_users["forks"].max()
max_contrib = top_5_users["contribution"].max()
for _, user in top_5_users.iterrows():
radar_chart.add(
user["user_name"],
[
(user["followers"] / max_followers) * 100,
(user["total_stars"] / max_stars) * 100,
(user["forks"] / max_forks) * 100,
(user["contribution"] / max_contrib) * 100,
],
)
radar_chart.x_labels = ["Followers", "Stars", "Forks", "Contributions"]
display_pygal_chart(radar_chart)
Box Plot – Age Distribution by Passenger Class
Compare age distributions across different Titanic passenger classes:
# Load titanic dataset
titanic = sns.load_dataset("titanic")
# Filter out passengers with missing age data
titanic_with_age = titanic.dropna(subset=["age"])
# Create box plot comparing age across passenger classes
box_plot = pygal.Box(
title="Titanic: Age Distribution by Passenger Class",
y_title="Age (years)",
box_mode="tukey", # Shows outliers beyond 1.5 IQR
)
# Get passenger classes and add data for each
classes = ["First", "Second", "Third"]
for class_name in classes:
class_passengers = titanic_with_age[titanic_with_age["class"] == class_name]
age_data = class_passengers["age"].tolist() # Pygal expects lists
# Add class with passenger count for context
passenger_count = len(class_passengers)
box_plot.add(f"{class_name} Class", age_data)
display_pygal_chart(box_plot)
Minimal dependencies and fast loading
Pygal requires minimal dependencies compared to other visualization libraries, making it ideal for lightweight applications where deployment size and startup speed matter.
Cons
Limited to 14 basic chart types
Pygal’s biggest limitation is its restricted chart type collection. With only 14 basic options (bar, line, pie, scatter, etc.), it lacks advanced statistical visualizations like violin plots, heatmaps, or complex multi-axis charts that are essential for sophisticated data analysis and scientific visualization.
Key Takeaways
Pygal excels when you need lightweight, scalable charts for web applications. Its SVG output, built-in interactivity, and minimal dependencies make it perfect for responsive dashboards, but consider other libraries for complex statistical analysis or specialized chart types.
Plotly
Plotly’s Python graphing library provides an effortless way to create interactive and high-quality graphs. It offers a range of chart types similar to Matplotlib and seaborn, including line plots, scatter plots, area charts, bar charts, and more.
Pros
Easily create beautiful interactive plots
Plotly excels at creating interactive visualizations with minimal code. Plotly Express makes this especially easy, allowing you to create beautiful interactive plots with just a single line of Python code:
import plotly.express as px
fig = px.scatter(
new_profile[:100],
x="followers",
y="total_stars",
color="forks",
size="contribution",
)
fig.show()
Simplicity in complex plots
Plotly simplifies the creation of complex plots that might be challenging with other libraries.
For example, if we want to visualize the locations of GitHub users on a map provided with their latitudes and longitudes, we can plot the locations on a map in a single line of code:
location_df = pd.read_csv(
"https://gist.githubusercontent.com/khuyentran1401/ce61bbad3bc636bf2548d70d197a0e3f/raw/ab1b1a832c6f3e01590a16231ba25ca5a3d761f3/location_df.csv",
index_col=0,
)
m = px.scatter_geo(
location_df,
lat="latitude",
lon="longitude",
color="total_stars",
size="forks",
hover_data=["user_name", "followers"],
title="Locations of Top Users",
)
m.show()
In this example, the color of the bubbles represents the number of stars, while the size corresponds to the number of forks.
Business intelligence features
Plotly offers enterprise-grade visualization features including interactive drill-down charts, real-time data updates, and cross-filtering capabilities. Business users can create comprehensive dashboards that allow stakeholders to explore data relationships, filter across multiple dimensions, and export insights for presentations.
The following example shows an interactive sunburst chart for hierarchical data exploration. Users can click through data layers from regions to departments and view detailed revenue breakdowns on hover.
# Create hierarchical data for drill-down
df = pd.DataFrame({
'Region': ['North', 'North', 'South', 'South'],
'Department': ['Sales', 'Marketing', 'Sales', 'Marketing'],
'Revenue': [250000, 180000, 200000, 150000],
'Quarter': ['Q4', 'Q4', 'Q4', 'Q4']
})
# Sunburst chart for hierarchical drill-down
fig = px.sunburst(df, path=['Region', 'Department'],
values='Revenue',
title='Revenue Drill-down: Region → Department')
# Add crossfilter-style hover interactions
fig.update_traces(textinfo="label+percent parent")
fig.update_layout(height=500)
fig.show()
The following example shows an animated bubble chart with timeline controls. Users can navigate through decades of data evolution with custom timeline controls, play/pause buttons, and range sliders.
df = px.data.gapminder()
fig = px.scatter(
df,
x="gdpPercap",
y="lifeExp",
animation_frame="year",
size="pop",
color="continent",
log_x=True,
size_max=55,
range_x=[100, 100000],
range_y=[25, 90],
title="GDP vs Life Expectancy by Year",
)
fig.show()
Cons
Heavy dependencies
Plotly comes with substantial dependencies that can significantly increase your project’s size and deployment complexity. The full Plotly package includes multiple rendering engines, which may be overkill for simple visualization needs and can slow down application startup times.
Key Takeaways
Plotly excels at creating interactive and publication-quality visualizations with minimal code required. While it offers a wide range of visualizations and simplifies complex plots, consider the substantial dependencies that can increase project size and deployment complexity.
Altair
Altair is a powerful declarative statistical visualization library for Python that is based on Vega-Lite. It shines when it comes to creating plots that require extensive statistical transformations.
Pros
Simple visualization grammar
Altair utilizes intuitive grammar for creating visualizations. You only need to specify the links between data columns and encoding channels, and the rest of the plotting is handled automatically. This simplicity makes visualizing information fast and intuitive.
For instance, to count the number of people in each class using the Titanic dataset:
import altair as alt
titanic = sns.load_dataset("titanic")
alt.Chart(titanic).mark_bar().encode(alt.X("class"), y="count()")
Altair’s concise syntax allows you to focus on the data and its relationships, resulting in efficient and expressive visualizations.
Easy data transformation
Altair makes it effortless to perform data transformations while creating charts.
For example, if you want to find the average age of each sex in the Titanic dataset, you can perform the transformation within the code itself:
hireable = (
alt.Chart(titanic)
.mark_bar()
.encode(x="sex:N", y="mean_age:Q")
.transform_aggregate(mean_age="mean(age)", groupby=["sex"])
)
hireable
Altair’s transform_aggregate()
function enables you to aggregate data on the fly and use the results in your visualization.
You can also specify the data type, such as nominal (categorical data without any order) or quantitative (measures of values), using the :N
or :Q
notation.
See a full list of data transformations here.
Linked plots
Altair provides impressive capabilities for linking multiple plots together. You can use selections to filter the contents of the attached plots based on user interactions.
For example, to visualize the number of people in each class within a selected interval on a scatter plot:
brush = alt.selection_interval()
points = (
alt.Chart(titanic)
.mark_point()
.encode(
x="age:Q",
y="fare:Q",
color=alt.condition(brush, "class:N", alt.value("lightgray")),
)
.add_params(brush)
)
bars = (
alt.Chart(titanic)
.mark_bar()
.encode(y="class:N", color="class:N", x="count(class):Q")
.transform_filter(brush)
)
points & bars
As you select an interval within the scatter plot, the bar chart dynamically updates to reflect the filtered data. Altair’s ability to link plots allows for highly interactive visualizations with on-the-fly calculations, without the need for a running Python server.
Cons
Limited styling options
Altair’s simple charts, such as bar charts, may not look as styled as those in libraries like seaborn or Plotly unless you specify custom styling.
Dataset size limitations
Altair recommends aggregating your data prior to visualization when dealing with datasets exceeding 5000 samples. Handling larger datasets may require additional steps to manage data size and complexity.
Key Takeaways
Altair excels at statistical visualization with intuitive grammar and linked plots that enable interactive exploration. While it simplifies complex data transformations, its styling limitations and constraints with large datasets may require workarounds for specialized visualization needs.
Bokeh
Bokeh is a highly flexible interactive visualization library designed for web browsers.
Pros
Interactive version of Matplotlib
Bokeh stands out as the most similar library to Matplotlib when it comes to interactive visualization. While Matplotlib is a low-level visualization library, Bokeh offers both high-level and low-level interfaces. With Bokeh, you can create sophisticated plots similar to Matplotlib but with fewer lines of code and higher resolution.
For example, the circle plot of Matplotlib:
fig, ax = plt.subplots()
x = [1, 2, 3, 4, 5]
y = [2, 5, 8, 2, 7]
for x, y in zip(x, y):
ax.add_patch(
plt.Circle((x, y), 0.5, edgecolor="#f03b20", facecolor="#9ebcda", alpha=0.8)
)
# Use adjustable='box-forced' to make the plot area square-shaped as well.
ax.set_aspect("equal", adjustable="datalim")
ax.set_xbound(3, 4)
ax.plot() # Causes an autoscale update.
plt.show()
…can be achieved with better resolution and interactivity using Bokeh:
from bokeh.io import show, output_notebook
from bokeh.models.glyphs import Scatter
from bokeh.plotting import figure
output_notebook()
plot = figure(tools="tap", title="Select a circle")
renderer = plot.scatter([1, 2, 3, 4, 5], [2, 5, 8, 2, 7], size=50, marker="circle")
selected_circle = Scatter(size=50, fill_alpha=1, fill_color="firebrick", line_color=None, marker="circle")
nonselected_circle = Scatter(size=50, fill_alpha=0.2, fill_color="blue", line_color="firebrick", marker="circle")
renderer.selection_glyph = selected_circle
renderer.nonselection_glyph = nonselected_circle
show(plot)
Link between plots
Bokeh makes it incredibly easy to establish links between plots. Changes applied to one plot can be automatically reflected in another plot with similar variables. This feature allows for exploring relationships between multiple plots.
For instance, if you create three graphs side by side and want to observe their relationship, you can utilize linked brushing:
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource
source = ColumnDataSource(new_profile)
TOOLS = "box_select,lasso_select,help"
TOOLTIPS = [
("user", "@user_name"),
("followers", "@followers"),
("following", "@following"),
("forks", "@forks"),
("contribution", "@contribution"),
]
s1 = figure(tooltips=TOOLTIPS, title=None, tools=TOOLS)
s1.scatter(x="followers", y="following", source=source)
s2 = figure(tooltips=TOOLTIPS, title=None, tools=TOOLS)
s2.scatter(x="followers", y="forks", source=source)
s3 = figure(tooltips=TOOLTIPS, title=None, tools=TOOLS)
s3.scatter(x="followers", y="contribution", source=source)
p = gridplot([[s1, s2, s3]])
show(p)
By utilizing ColumnDataSource
, the data can be shared among plots. Thus, when a change is applied to one plot, the other plots automatically update accordingly.
Fine-grained control over web deployment
Bokeh offers exceptional control over visualization deployment in web applications. You can embed plots in existing websites, create standalone HTML files, or build complete web applications with custom servers, providing seamless integration flexibility.
Here’s how to embed a Bokeh plot in your existing website. First, create your Bokeh visualization as usual:
from bokeh.plotting import figure
# Create sample sales data
df = pd.DataFrame({
'sales': [100, 150, 200, 120, 180, 250, 300, 220],
'profit': [20, 30, 45, 25, 40, 60, 75, 50],
'category': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
})
source = ColumnDataSource(df)
# Create plot
p = figure(title="Sales Analysis Widget", width=500, height=300)
p.scatter('sales', 'profit', source=source, size=10, alpha=0.6)
Next, generate the HTML components for web embedding:
from bokeh.embed import components
# Generate components for embedding
script, div = components(p)
Finally, integrate these components into your website. Add the Bokeh CDN to your HTML template’s head section:
<head>
<script src="https://cdn.bokeh.org/bokeh/release/bokeh-3.3.0.min.js"></script>
</head>
Then include the plot components in your page body where you want the visualization to appear:
<body>
<!-- Your existing HTML content -->
{{ plot_div|safe }} <!-- Insert the div component -->
{{ plot_script|safe }} <!-- Insert the script component -->
</body>
Cons
Verbose code requirements
While Bokeh offers powerful customization, it demands considerably more setup code to make even simple plots look professional compared to simpler alternatives like seaborn and Plotly.
For the same titanic count plot, you must transform the data beforehand and manually configure bar width and colors to achieve an attractive result.
If we didn’t add width for the bar graph, the graph would look like this:
from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral6
titanic_groupby = titanic.groupby("class")["survived"].sum().reset_index()
p = figure(x_range=list(titanic_groupby["class"]))
p.vbar(
x="class",
top="survived",
source=titanic_groupby,
fill_color=factor_cmap(
"class", palette=Spectral6, factors=list(titanic_groupby["class"])
),
)
show(p)
Thus, we need to manually adjust the dimensions to make the plot nicer:
p = figure(x_range=list(titanic_groupby["class"]))
p.vbar(
x="class",
top="survived",
width=0.9,
source=titanic_groupby,
fill_color=factor_cmap(
"class", palette=Spectral6, factors=list(titanic_groupby["class"])
),
)
show(p)
Key Takeaways
Bokeh excels at web deployment with flexible integration options for existing websites, standalone files, and custom servers. However, its comprehensive capabilities often require more verbose code compared to higher-level libraries like Plotly or seaborn when creating similar visualizations.
Conclusion
Congratulations! You’ve explored six powerful Python visualization libraries, each excelling in different scenarios:
• Matplotlib – Choose for complete customization control and publication-quality static plots
• seaborn – Select for statistical analysis and elegant plots with minimal code
• Plotly – Use for interactive dashboards and web-ready visualizations
• Altair – Pick for declarative data exploration through grammar of graphics
• Pygal – Opt for lightweight web integration and simple SVG charts
• Bokeh – Go with for complex web applications and flexible deployment options
Match your project’s specific requirements – whether it’s interactivity, customization level, or deployment target – to select the optimal library.
Related Resources
- Scale your data processing with Polars for high-performance DataFrame operations
- Enhance your development workflow with Marimo notebooks for reproducible visualization creation
2 thoughts on “Top 6 Python Libraries for Visualization: Which One to Use?”
Really well written comparison. Super helpful and I learned a couple of new tricks – thanks
Thank you for the compliment, Sam! I’m happy that you learned something new from the article