Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Visualization

Mermaid: Create Flow Chart Using Code

Have you ever wanted to create a flow chart using simple code logic? That is when Mermaid comes in handy.

The code to create the diagram above:

graph TD
A[Should you go to work today?] –> B(Do you like working?)
B –Yes–> C{Go to work}
B –No–> D(Are you feeling sick?)
D –Yes–> E{Go to the doctor}
D –No–> F(Do you have a lot of work to do?)
F –Yes–> H(And you don't want to go?)
F –No–> H
H –Yes, I don't want to–>I(You signed up for this. Get dressed and go to work!)

Link to Mermaid Editor.
Favorite

Mermaid: Create Flow Chart Using Code Read More »

Add Statistical Significance Annotations on Seaborn Plots

Have you ever looked at two box plots and wondered if there is a significant difference between the means of the two groups? statannotations makes it easy for you to add statistical significance annotations on seaborn plots.

In the code above, we use an independent t-test to compare the means of two independent groups.

From the plot, we can see that there is statistical evidence that the mean taxi fare in Manhattan is significantly different from the mean taxi fare in Brooklyn or Bronx, or Queens.

Full code.

Link to statsannotations.
Favorite

Add Statistical Significance Annotations on Seaborn Plots Read More »

Analyze and Visualize URLs with Network Graph

Have you ever tried to extract features and insights from URLs, but found it difficult to do so? Wouldn’t it be nice if you can extract features and create a nice network graph for your URLs as shown in the graph above?

In my latest article, you will learn how to use the combination of yarl and PyGraphistry to do exactly that.

Link to the article.

Link to the source code.
Favorite

Analyze and Visualize URLs with Network Graph Read More »

texthero: Reduce Dimension and Visualize Text in One Line of Code

Visualizing text data in 2D typically requires several steps: cleaning, encoding, and dimensionality reduction. These processes can be time-consuming. 

texthero library simplifies this task, allowing you to perform all these steps efficiently.

The following example demonstrates how to use texthero to visualize CNN news article descriptions from a Kaggle dataset. Each point in the resulting plot represents an article, color-coded by its category.

import pandas as pd
import texthero as hero
import matplotlib.pyplot as plt

# Load the data
df = pd.read_csv("small_CNN.csv")

# Process and reduce dimensionality of the text data
df["pca"] = (df["Description"]
.pipe(hero.clean)
.pipe(hero.tfidf)
.pipe(hero.pca))

# Create the visualization
plt.figure(figsize=(10, 3))
hero.scatterplot(df, col="pca", color="Category", title="CNN News")
plt.show()

This code efficiently cleans the text, applies TF-IDF encoding, performs PCA, and creates a 2D scatter plot of the articles, all in just a few lines of code.

Link to texthero.
Favorite

texthero: Reduce Dimension and Visualize Text in One Line of Code Read More »

Visualize Feature Importances with Yellowbrick

The more features a model has, the more sensitive the model is to errors due to variance. Thus, we want to select the minimum required features to produce a valid model.

A common approach to eliminate features is to eliminate the ones that are the least important to the model. Then we re-evaluate if the model actually performs better during cross-validation.

Yellowbrick’s FeatureImportances is ideal for this task since it helps us to visualize the relative importance of the features for the model.

Link to Yellowbrick.

My full article about Yellowbrick.

Favorite

Visualize Feature Importances with Yellowbrick Read More »

pydeps: Python Module Dependency Visualization

If you want to generate the graph showing the dependencies of your Python modules, try pydeps.

For example, to generate the dependency graph for files in the folder top_github_scraper, I type:

<meta http-equiv="content-type" content="text/html; charset=utf-8">$ pydeps top_github_scraper

The image above is the output of the command.

Link to pydeps.
Favorite

pydeps: Python Module Dependency Visualization Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran