texthero: Reduce Dimension and Visualize Text in One Line of Code

texthero: Reduce Dimension and Visualize Text in One Line of Code

Visualizing text data in 2D typically requires several steps: cleaning, encoding, and dimensionality reduction. These processes can be time-consuming. 

texthero library simplifies this task, allowing you to perform all these steps efficiently.

The following example demonstrates how to use texthero to visualize CNN news article descriptions from a Kaggle dataset. Each point in the resulting plot represents an article, color-coded by its category.

import pandas as pd
import texthero as hero
import matplotlib.pyplot as plt

# Load the data
df = pd.read_csv("small_CNN.csv")

# Process and reduce dimensionality of the text data
df["pca"] = (df["Description"]
             .pipe(hero.clean)
             .pipe(hero.tfidf)
             .pipe(hero.pca))

# Create the visualization
plt.figure(figsize=(10, 3))
hero.scatterplot(df, col="pca", color="Category", title="CNN News")
plt.show()

This code efficiently cleans the text, applies TF-IDF encoding, performs PCA, and creates a 2D scatter plot of the articles, all in just a few lines of code.

Link to texthero.

Search

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran