Top 6 Python Libraries for Visualization: Which One to Use?

Motivation

If you’re new to Python visualization, the vast number of libraries and examples available might seem overwhelming. Some popular libraries for visualization include Matplotlib, Seaborn, Plotly, Bokeh, Altair, and Folium.

When visualizing a DataFrame, choosing the right library can be challenging as different libraries excel in specific cases.

This article will show the pros and cons of each library. By the end, you will gain a better understanding of their distinct features, making it easier for you to select the optimal library.

We will do this by focusing on a few specific attributes:

Interactivity

Do you want interactive visualization? Libraries like Altair, Bokeh, and Plotly allow you to create interactive graphs that users can explore and interact with.

Alternatively, some libraries like Matplotlib render visualizations as static images, making them suitable for explaining concepts in papers, slide decks, or presentations.

Syntax and Flexibility

How does the syntax differ across libraries? Lower-level libraries such as Matplotlib provide extensive flexibility, allowing you to accomplish almost anything. However, they come with a more complex API.

Declarative libraries like Altair simplify the mapping of data to visualizations, offering a more intuitive syntax.

Type of data and visualization

Are you working with specialized use cases, such as geographical plots or large datasets? Consider whether a specific library supports the plot types or handles large datasets effectively.

Data

To explore each plot, we will use the data of GitHub users:

import pandas as pd  

new_profile = pd.read_csv('https://gist.githubusercontent.com/khuyentran1401/98658198f0ef0cb12abb34b4f2361fd8/raw/ece16eb32e1b41f5f20c894fb72a4c198e86a5ea/github_users.csv')
new_profile

Feel free to fork and play with the code for this article in this Github repo:

Matplotlib

Matplotlib is probably the most common Python library for visualizing data. Almost everyone interested in data science has likely utilized Matplotlib at least once.

Pros

Easy to interpret data properties

When analyzing data, it’s often helpful to get a quick overview of its distribution.

For example, if you want to examine the distribution of the top 100 users with the most followers, Matplotlib is typically sufficient.

import matplotlib.pyplot as plt

top_followers = new_profile.sort_values(by="followers", axis=0, ascending=False)[:100]

fig = plt.figure()

plt.bar(top_followers.user_name, top_followers.followers)
plt.show()

Despite Matplotlib’s suboptimal x-axis representation, the graph provides a clear understanding of the data distribution.

Versatility

Matplotlib is very versatile and capable of generating a wide range of graph types. The Matplotlib’s website offers comprehensive documentation and a gallery of various graphs, making it easy to find tutorials for virtually any type of plot.

fig = plt.figure()

plt.text(
    0.6,
    0.7,
    "learning",
    size=40,
    rotation=20.0,
    ha="center",
    va="center",
    bbox=dict(
        boxstyle="round",
        ec=(1.0, 0.5, 0.5),
        fc=(1.0, 0.8, 0.8),
    ),
)

plt.text(
    0.55,
    0.6,
    "machine",
    size=40,
    rotation=-25.0,
    ha="right",
    va="top",
    bbox=dict(
        boxstyle="square",
        ec=(1.0, 0.5, 0.5),
        fc=(1.0, 0.8, 0.8),
    ),
)

plt.show()

Cons

While Matplotlib can plot virtually anything, generating non-basic plots or adjusting plots for aesthetic purposes can be complex.

If you intend to present your data to others, customizing the x-axis, y-axis, and other plot elements may require substantial effort. This is due to Matplotlib’s low-level interface.

num_features = new_profile.select_dtypes("int64")
correlation = num_features.corr()

fig, ax = plt.subplots()
im = plt.imshow(correlation)

ax.set_xticklabels(correlation.columns)
ax.set_yticklabels(correlation.columns)

plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
plt.show()

Takeaway: Matplotlib is capable of producing any plot, but creating complex plots often requires more code compared to other libraries.

Seaborn

Seaborn is a Python data visualization library built on top of Matplotlib. It offers a higher-level interface, simplifying the process of creating visually appealing plots.

Pros

Reduced code

Seaborn provides a higher-level interface for generating similar plots as Matplotlib. This means you can achieve similar visualizations with less code and a more visually pleasing design.

For instance, using the same data as before, we can create a heatmap without explicitly setting the x and y labels:

correlation = new_profile.corr()

sns.heatmap(correlation, annot=True)

This results in a more visually appealing heatmap without the need for additional configuration.

Improved aesthetics for common plots

Seaborn is a popular choice for common plot types such as bar plots, box plots, count plots, and histograms. Not only does Seaborn require less code to generate these plots, but they also have enhanced visual aesthetics.

In the following example, the count plot appears more visually appealing due to Seaborn’s default settings:

sns.set(style="darkgrid")
titanic = sns.load_dataset("titanic")
ax = sns.countplot(x="class", data=titanic)

Cons

Seaborn, despite its advantages, does not have as extensive a collection of plot types as Matplotlib. While it excels in popular plot types, it may not offer the same breadth of options for more specialized or custom plots.

Takeaway: Seaborn is a higher-level version of Matplotlib. Even though it does not have a wide collection as Matplotlib, seaborn makes popular plots such as bar plot, box plot, heatmap, etc look pretty in less code.

Plotly

Plotly’s Python graphing library provides an effortless way to create interactive and high-quality graphs. It offers a range of chart types similar to Matplotlib and Seaborn, including line plots, scatter plots, area charts, bar charts, and more.

Pros

Similar to R

If you’re familiar with creating plots in R and miss its features when working with Python, Plotly is an excellent choice. It allows you to achieve the same level of quality plots using Python.

Plotly Express, in particular, stands out as it enables the creation of impressive plots with just a single line of Python code. For example:

import plotly.express as px

fig = px.scatter(
    new_profile[:100],
    x="followers",
    y="total_stars",
    color="forks",
    size="contribution",
)
fig.show()

Interactive plot creation

Plotly excels in creating interactive plots, which not only enhance the visual appeal but also allow viewers to explore the data in greater detail.

Let’s consider the bar plot example from earlier created with Matplotlib. Here’s how it can be achieved with Plotly:

top_followers = new_profile.sort_values(by="followers", axis=0, ascending=False)[:100]

fig = px.bar(
    top_followers,
    x="user_name",
    y="followers",
)

fig.show()

With a similar amount of code, Plotly generates an interactive plot where users can hover over each bar to view the corresponding user and follower count. This interactivity empowers the consumer of your visualization to explore the data on their own.

Simplicity in complex plots

Plotly simplifies the creation of complex plots that might be challenging with other libraries.

For example, if we want to visualize the locations of GitHub users on a map, we can obtain their latitudes and longitudes and plot them accordingly:

location_df = pd.read_csv(
    "https://gist.githubusercontent.com/khuyentran1401/ce61bbad3bc636bf2548d70d197a0e3f/raw/ab1b1a832c6f3e01590a16231ba25ca5a3d761f3/location_df.csv",
    index_col=0,
)

m = px.scatter_geo(
    location_df,
    lat="latitude",
    lon="longitude",
    color="total_stars",
    size="forks",
    hover_data=["user_name", "followers"],
    title="Locations of Top Users",
)

m.show()

With just a few lines of code, Plotly beautifully represents the locations of users on a map. The color of the bubbles represents the number of forks, while the size corresponds to the total number of stars.

Takeaway: Plotly is an excellent choice for creating interactive and publication-quality graphs with minimal code required. It offers a wide range of visualizations and simplifies the process of creating complex plots.

Altair

Altair is a powerful declarative statistical visualization library for Python that is based on Vega-Lite. It shines when it comes to creating plots that require extensive statistical transformations.

Pros

Simple visualization grammar

Altair utilizes intuitive grammar for creating visualizations. You only need to specify the links between data columns and encoding channels, and the rest of the plotting is handled automatically. This simplicity makes visualizing information fast and intuitive

For instance, to count the number of people in each class using the Titanic dataset:

import seaborn as sns
import altair as alt

titanic = sns.load_dataset("titanic")

alt.Chart(titanic).mark_bar().encode(alt.X("class"), y="count()")

Altair’s concise syntax allows you to focus on the data and its relationships, resulting in efficient and expressive visualizations.

Easy data transformation

Altair makes it effortless to perform data transformations while creating charts.

For example, if you want to find the average age of each sex in the Titanic dataset, you can perform the transformation within the code itself:

hireable = (
    alt.Chart(titanic)
    .mark_bar()
    .encode(x="sex:N", y="mean_age:Q")
    .transform_aggregate(mean_age="mean(age)", groupby=["sex"])
)

hireable

Altair’s transform_aggregate() function enables you to aggregate data on the fly and use the results in your visualization.

You can also specify the data type, such as nominal (categorical data without any order) or quantitative (measures of values), using the :N` or :Q notation.

See a full list of data transformations here.

Linked plots

Altair provides impressive capabilities for linking multiple plots together. You can use selections to filter the contents of the attached plots based on user interactions.

For example, to visualize the number of people in each class within a selected interval on a scatter plot:

brush = alt.selection(type="interval")

points = (
    alt.Chart(titanic)
    .mark_point()
    .encode(
        x="age:Q",
        y="fare:Q",
        color=alt.condition(brush, "class:N", alt.value("lightgray")),
    )
    .add_selection(brush)
)

bars = (
    alt.Chart(titanic)
    .mark_bar()
    .encode(y="class:N", color="class:N", x="count(class):Q")
    .transform_filter(brush)
)

points & bars

As you select an interval within the scatter plot, the bar chart dynamically updates to reflect the filtered data. Altair’s ability to link plots allows for highly interactive visualizations with on-the-fly calculations, without the need for a running Python server.

Cons

Altair’s simple charts, such as bar charts, may not look as styled as those in libraries like Seaborn or Plotly unless you specify custom styling.

Altair recommends aggregating your data prior to visualization when dealing with datasets exceeding 5000 samples. Handling larger datasets may require additional steps to manage data size and complexity.

Takeaway: Altair is an excellent choice for creating sophisticated statistical charts. While it may lack some default styling options and have limitations with large datasets, Altair’s simplicity, data transformation capabilities, and linked plots make it a powerful tool for statistical visualization.

Bokeh

Bokeh is a highly flexible interactive visualization library designed for web browsers.

Pros

Interactive version of Matplotlib

Bokeh stands out as the most similar library to Matplotlib when it comes to interactive visualization. While Matplotlib is a low-level visualization library, Bokeh offers both high-level and low-level interfaces. With Bokeh, you can create sophisticated plots similar to Matplotlib but with fewer lines of code and higher resolution.

For example, the circle plot of Matplotlib.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

x = [1, 2, 3, 4, 5]
y = [2, 5, 8, 2, 7]

for x, y in zip(x, y):
    ax.add_patch(
        plt.Circle((x, y), 0.5, edgecolor="#f03b20", facecolor="#9ebcda", alpha=0.8)
    )


# Use adjustable='box-forced' to make the plot area square-shaped as well.
ax.set_aspect("equal", adjustable="datalim")
ax.set_xbound(3, 4)

ax.plot()  # Causes an autoscale update.
plt.show()

… can be achieved with better resolution and greater utility using Bokeh:

from bokeh.io import show, output_notebook
from bokeh.models import Circle
from bokeh.plotting import figure

output_notebook()

plot = figure(tools="tap", title="Select a circle")
renderer = plot.circle([1, 2, 3, 4, 5], [2, 5, 8, 2, 7], size=50)

selected_circle = Circle(fill_alpha=1, fill_color="firebrick", line_color=None)
nonselected_circle = Circle(fill_alpha=0.2, fill_color="blue", line_color="firebrick")

renderer.selection_glyph = selected_circle
renderer.nonselection_glyph = nonselected_circle

show(plot)

Link between plots

Bokeh makes it incredibly easy to establish links between plots. Changes applied to one plot can be automatically reflected in another plot with similar variables. This feature allows for exploring relationships between multiple plots.

For instance, if you create three graphs side by side and want to observe their relationship, you can utilize linked brushing:

from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource


source = ColumnDataSource(new_profile)

TOOLS = "box_select,lasso_select,help"
TOOLTIPS = [
    ("user", "@user_name"),
    ("followers", "@followers"),
    ("following", "@following"),
    ("forks", "@forks"),
    ("contribution", "@contribution"),
]

s1 = figure(tooltips=TOOLTIPS, title=None, tools=TOOLS)
s1.circle(x="followers", y="following", source=source)

s2 = figure(tooltips=TOOLTIPS, title=None, tools=TOOLS)
s2.circle(x="followers", y="forks", source=source)

s3 = figure(tooltips=TOOLTIPS, title=None, tools=TOOLS)
s3.circle(x="followers", y="contribution", source=source)

p = gridplot([[s1, s2, s3]])
show(p)

By utilizing ColumnDataSource, the data can be shared among plots. Thus, when a change is applied to one plot, the other plots automatically update accordingly.

Cons

As a library with a somewhat middle-level interface, Bokeh often requires more code to produce the same plot as Seaborn, Altair, or Plotly. While Bokeh requires less code than Matplotlib, it may need additional lines of code to achieve similar quality output compared to other libraries.

For example, to create the same count plot using titanic data, besides the need of transforming the data in advance, we also need to set the width of the bar and color if we want the graph to look nice.

If we didn’t add width for the bar graph, the graph would look like this:

from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral6

titanic_groupby = titanic.groupby("class")["survived"].sum().reset_index()

p = figure(x_range=list(titanic_groupby["class"]))
p.vbar(
    x="class",
    top="survived",
    source=titanic_groupby,
    fill_color=factor_cmap(
        "class", palette=Spectral6, factors=list(titanic_groupby["class"])
    ),
)
show(p)

from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral6

titanic_groupby = titanic.groupby("class")["survived"].sum().reset_index()

p = figure(x_range=list(titanic_groupby["class"]))
p.vbar(
    x="class",
    top="survived",
    source=titanic_groupby,
    fill_color=factor_cmap(
        "class", palette=Spectral6, factors=list(titanic_groupby["class"])
    ),
)
show(p)

Thus, we need to manually adjust the dimensions to make the plot nicer:

p = figure(x_range=list(titanic_groupby["class"]))
p.vbar(
    x="class",
    top="survived",
    width=0.9,
    source=titanic_groupby,
    fill_color=factor_cmap(
        "class", palette=Spectral6, factors=list(titanic_groupby["class"])
    ),
)
show(p)

Takeaway: Bokeh’s unique advantage lies in its ability to provide a range of interfaces, from low to high, enabling the creation of versatile and visually appealing graphics. However, this flexibility often results in the need for more code compared to other libraries when aiming for similar plot quality.

Folium

Folium simplifies the process of visualizing data on an interactive leaflet map. This library provides built-in tilesets from OpenStreetMap, Mapbox, and Stamen.

Pros

Easy to create a map with markers

Compared to other options like Plotly, Altair, and Bokeh, Folium offers a more straightforward approach by utilizing an open street map. This gives a similar experience to Google Maps with minimal code.

Remember the map we created using Plotly to visualize the locations of Github users? With Folium, we can enhance the map’s appearance even further.

import folium

# Save latitudes, longitudes, and locations' names in a list
lats = location_df["latitude"]
lons = location_df["longitude"]
names = location_df["location"]

# Create a map with an initial location
m = folium.Map(location=[lats[0], lons[0]])

for lat, lon, name in zip(lats, lons, names):
    # Create marker with other locations
    folium.Marker(
        location=[lat, lon], popup=name, icon=folium.Icon(color="green")
    ).add_to(m)

m

With just a few lines of code, we have created a real map displaying user locations.

Add potential location

Folium makes it easy to add potential locations of other users by allowing the inclusion of markers.

# Enable adding more locations in the map
m = m.add_child(folium.ClickForMarker(popup="Potential Location"))

Click on the map to generate a new location marker right where you click.

Plugins

Folium offers various plugins that can be utilized with your map, including a plugin for Altair. For instance, if we want to visualize the heatmap of total stars for Github users worldwide and identify areas with a high number of top users and stars, the Folium heatmap plugin allows us to achieve that.

# heatmap

from folium.plugins import HeatMap

m = folium.Map(location=[lats[0], lons[0]])

HeatMap(data=location_df[["latitude", "longitude", "total_stars"]]).add_to(m)

m

Takeaway: Folium enables the creation of interactive maps with just a few lines of code. It provides a user experience similar to Google Maps.

Conclusion

Congratulation! You have just learned about six different visualization tools for your visualization. This article aims to provide you with an understanding of each library’s functionality and when to utilize them. Acquiring a grasp of the key features offered by each library will facilitate a faster selection of the appropriate tool for your needs.

I like to write about basic data science concepts and play with different algorithms and data science tools. You could connect with me on LinkedIn and Twitter.

Star this repo if you want to check out the codes for all of the articles I have written.

2 thoughts on “Top 6 Python Libraries for Visualization: Which One to Use?”

  1. Really well written comparison. Super helpful and I learned a couple of new tricks – thanks

Comments are closed.

Related Posts

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran