Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Faster Chart Analysis with Hugging Face Vision Models

Table of Contents

Faster Chart Analysis with Hugging Face Vision Models

Table of Contents

Motivation

Manually analyzing charts is time-consuming. Data teams spend hours examining dashboards, extracting insights from visualizations, and documenting findings – a process that doesn’t scale when dealing with dozens of reports daily.

# Current manual process: time-consuming chart analysis
import matplotlib.pyplot as plt
import pandas as pd

# Load sales data and create visualizations
df = pd.read_csv("quarterly_sales.csv")
charts = [
    df.groupby('region').sum().plot(kind='bar'),
    df.plot(x='date', y='revenue', kind='line'),
    df.corr().style.background_gradient()  # Correlation heatmap
]

# Manual analysis required for each chart:
# 1. Open and examine each visualization
# 2. Identify patterns and trends visually
# 3. Extract key insights manually
# 4. Document findings for stakeholders

Automated insight extraction with human oversight changes this dynamic. Use AI to quickly analyze visualizations and generate initial insights, then review and refine the output. This approach reduces analysis time from hours to minutes while maintaining accuracy through human validation.

Key Takeaways

Here’s what you’ll learn:

  • Analyze charts and visualizations using natural language queries instead of manual pattern recognition
  • Process documents, receipts, and forms automatically with SmolVLM’s multimodal capabilities
  • Build interactive web dashboards with Gradio for non-technical stakeholders to analyze visualizations
  • Reduce chart analysis time from hours to minutes while maintaining human oversight for accuracy
  • Deploy lightweight 500M parameter models that run efficiently on standard hardware without GPU requirements

Introduction to SmolVLM

SmolVLM enables this workflow by combining image analysis with natural language processing. This lightweight multimodal AI model, hosted on Hugging Face, processes images and text together for visual question-answering tasks.

SmolVLM excels at:

  • Chart and graph analysis
  • Document understanding
  • Visual question answering
  • Image content summarization

Install the required packages:

pip install transformers>=4.45.0 pillow torch

💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!

Getting Started

SmolVLM eliminates the complexity of traditional image analysis by enabling direct natural language queries about visual content.

Let’s start by loading the model:

import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
import requests

# Load the processor and model
model_id = "HuggingFaceTB/SmolVLM-500M-Instruct"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype=torch.float16
)

# Move model to available device
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

SmolVLM setup explained:

  • AutoProcessor: Converts images and text into format the model understands
  • AutoModelForVision2Seq: Loads the actual SmolVLM neural network weights
  • torch.float16: Uses half-precision to reduce memory usage by 50%
  • Device detection: Automatically uses GPU if available for faster inference

Now, let’s create a helper function to analyze a single image with a natural language question:

def analyze_image_with_smolvlm(image, question, max_tokens=200):
    """Analyze an image with SmolVLM using a natural language question."""
    # Format input as chat conversation
    messages = [{
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": question}
        ]
    }]

    # Convert to model input format
    prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
    inputs = processor(text=prompt, images=[image], return_tensors="pt").to(device)

    # Generate response
    generated_ids = model.generate(
        **inputs, max_new_tokens=max_tokens, do_sample=True, temperature=0.3
    )

    # Extract and return the response text
    response = processor.batch_decode(
        generated_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True
    )[0]

    return response

The function above does the following:

  • apply_chat_template: Formats the conversation for SmolVLM’s expected input structure
  • processor(): Tokenizes text and preprocesses images into tensors
  • model.generate(): Runs the actual AI inference with configurable parameters
  • batch_decode(): Converts model output tokens back to readable text

Next, create a helper function to analyze multiple questions:

def print_qa_results(questions, answers, separator_length=40):
    """Print question and answer pairs with formatted output."""
    for question, answer in zip(questions, answers):
        print(f"Question: {question}")
        print(f"Answer: {answer}")
        print("-" * separator_length)

Chart Analysis

Let’s put our helper function to work by analyzing a real heatmap using natural language queries.

We’ll analyze this correlation heatmap:

Heatmap showing correlation matrix with color-coded values

Here are the questions we’ll ask:

# Load a complex chart for analysis - financial correlation heatmap
image_url = (
    "https://eodhd.com/financial-academy/wp-content/uploads/2023/12/heatmap_sector.png"
)
image = Image.open(requests.get(image_url, stream=True).raw)

# Ask questions about the chart
questions = [
    "What type of chart is this?",
    "What are the main trends shown in this visualization?",
    "What insights can you derive from this data?",
]

answers = [analyze_image_with_smolvlm(image, q) for q in questions]
print_qa_results(questions, answers, separator_length=50)

Output:

Question: What type of chart is this?
Answer:  Heatmap.
--------------------------------------------------
Question: What are the main trends shown in this visualization?
Answer: This is a sector heat map showing percentage allocation across financial sectors. Consumer Staples has the highest allocation, followed by Energy and Industrials sectors.
--------------------------------------------------
Question: What insights can you derive from this data?
Answer: Key insight: Consumer-focused sectors (services and staples) dominate the market, with communication services also performing strongly.
--------------------------------------------------

The model provides direct answers about chart type, trends, and insights without requiring manual preprocessing or specialized analysis tools.

Document Understanding

SmolVLM excels at extracting information from documents, receipts, and forms. Let’s test this with a receipt:

Receipt

and ask the following questions:

# Analyze a receipt or invoice
receipt_url = "https://raw.githubusercontent.com/mistralai/cookbook/main/mistral/ocr/receipt.png"
receipt_image = Image.open(requests.get(receipt_url, stream=True).raw)

# Document analysis questions
document_questions = [
    "What type of document is this?",
    "What is the total amount?",
    "What items can you identify?",
]

answers = [analyze_image_with_smolvlm(receipt_image, q, max_tokens=150) for q in document_questions]
print_qa_results(document_questions, answers)

Output:

Question: What type of document is this?
Answer:  This is an invoice or receipt for a parking permit.
----------------------------------------
Question: What is the total amount?
Answer:  The total amount is $15.00.
----------------------------------------
Question: What items can you identify?
Answer:  The text contains a list of items, but specific details about each item are not provided.
----------------------------------------

Image Content Summarization

We can also use SmolVLM to generate a summary of an image.

Let’s use the following image:

Street scene

and ask the following questions:

# Analyze a street scene
street_url = "https://images.unsplash.com/photo-1449824913935-59a10b8d2000?w=500"
street_image = Image.open(requests.get(street_url, stream=True).raw)

# Summarization prompts
summary_prompts = [
    "What's the main activity happening here?",
    "Summarize the key elements of this scene"
]

answers = [analyze_image_with_smolvlm(street_image, q, max_tokens=250) for q in summary_prompts]
print_qa_results(summary_prompts, answers, separator_length=50)

Output:

Prompt: What's the main activity happening here?
Response:  There are many people on the road in the image.
--------------------------------------------------
Prompt: Summarize the key elements of this scene
Response:  A busy city street with a few people walking and a few cars.
--------------------------------------------------

Automated Business Intelligence with Chart Analysis

See how SmolVLM integrates into a real data analysis workflow. We’ll create a sales performance chart and then analyze it with natural language queries.

First, generate sample sales data and create the visualization:

# Generate quarterly sales data
import pandas as pd
import matplotlib.pyplot as plt

data = {
    'Quarter': ['Q1 2024', 'Q2 2024', 'Q3 2024', 'Q4 2024'],
    'Product A': [45000, 52000, 48000, 61000],
    'Product B': [38000, 41000, 39000, 44000],
    'Product C': [23000, 28000, 32000, 35000]
}

df = pd.DataFrame(data)
df.set_index('Quarter').plot(kind='bar', figsize=(10, 6))
plt.title('Quarterly Sales Performance by Product')
plt.ylabel('Sales ($)')
plt.xticks(rotation=45)
plt.savefig('sales_chart.png', dpi=300)
plt.show()

Sales Chart

Now analyze the chart with SmolVLM using targeted business questions:

from PIL import Image

# Load the chart we just created
chart_image = Image.open('sales_chart.png')

questions = [
    "What quarter had the best overall performance?",
    "Are there any concerning trends I should investigate?"
]

for question in questions:
    answer = analyze_image_with_smolvlm(chart_image, question)
    print(f"Q: {question}")
    print(f"A: {answer}\n")

Output:

Q: What quarter had the best overall performance?
A:  Q4 2024.

Q: Are there any concerning trends I should investigate?
A:  The sales figures for Q1 2024 and Q4 2024 are significantly higher than those for Q2 2024 and Q3 2024.

This workflow demonstrates SmolVLM’s value in everyday data analysis: create visualizations with your preferred tools, then get instant insights through natural language queries.

Building a Web Dashboard

To make chart analysis accessible for non-technical stakeholders, you can create an interactive web dashboard using Gradio. This Python framework enables rapid deployment of machine learning applications with just a few lines of code.

To use Gradio, start by installing it:

pip install gradio

Next, create a dashboard function that combines image upload with question input. This function uses the SmolVLM helper we defined earlier:

def dashboard_analysis(image, question):
    if image is None:
        return "Please upload an image to analyze."

    # Use the analyze_image_with_smolvlm function from earlier in the article
    return analyze_image_with_smolvlm(image, question, max_tokens=200)

If running this dashboard code independently, make sure to include the model loading and helper function from the “Getting Started” section above.

The function handles image uploads and passes user questions directly to SmolVLM. Now build the Gradio interface with three components:

import gradio as gr

dashboard = gr.Interface(
    fn=dashboard_analysis,
    inputs=[
        gr.Image(type="pil", label="Upload Chart or Visualization"),
        gr.Textbox(value="What are the key trends in this chart?", label="Ask a Question")
    ],
    outputs=gr.Textbox(label="Analysis Results"),
    title="SmolVLM Chart Analysis Dashboard"
)

Launch the dashboard to make it accessible to your team:

dashboard.launch(share=True)  # share=True creates public link

The dashboard provides a clean interface where users upload charts and ask questions in natural language.

SmolVLM Chart Analysis Dashboard

Conclusion

SmolVLM democratizes visual analysis for data scientists. No specialized computer vision expertise needed – just natural language queries on your existing charts and visualizations.

Practical applications using SmolVLM’s capabilities:

  • Chart analysis: Upload any matplotlib/seaborn visualization and ask about patterns, outliers, and trends
  • Document understanding: Extract information from scanned forms, receipts, and research papers automatically
  • Image summarization: Generate descriptions of complex multi-panel figures from research publications
  • Dashboard automation: Batch process weekly report screenshots for automated insight generation
  • Visual validation: Ask “Does this chart support my hypothesis?” to strengthen analytical conclusions

Related Resources

Here are some related resources that you may find useful:

Leave a Comment

Your email address will not be published. Required fields are marked *

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran