Faster Chart Analysis with Hugging Face Vision Models

Table of Contents

Motivation
Introduction to SmolVLM
Getting Started
Chart Analysis
Document Understanding
Image Content Summarization
Automated Business Intelligence with Chart Analysis
Building a Web Dashboard
Conclusion

Motivation
Manually analyzing charts is time-consuming. Data teams spend hours examining dashboards, extracting insights from visualizations, and documenting findings – a process that doesn’t scale when dealing with dozens of reports daily.
# Current manual process: time-consuming chart analysis
import matplotlib.pyplot as plt
import pandas as pd

# Load sales data and create visualizations
df = pd.read_csv("quarterly_sales.csv")
charts = [
df.groupby('region').sum().plot(kind='bar'),
df.plot(x='date', y='revenue', kind='line'),
df.corr().style.background_gradient() # Correlation heatmap
]

# Manual analysis required for each chart:
# 1. Open and examine each visualization
# 2. Identify patterns and trends visually
# 3. Extract key insights manually
# 4. Document findings for stakeholders

Automated insight extraction with human oversight changes this dynamic. Use AI to quickly analyze visualizations and generate initial insights, then review and refine the output. This approach reduces analysis time from hours to minutes while maintaining accuracy through human validation.
Key Takeaways
Here’s what you’ll learn:

Analyze charts and visualizations using natural language queries instead of manual pattern recognition
Process documents, receipts, and forms automatically with SmolVLM’s multimodal capabilities
Build interactive web dashboards with Gradio for non-technical stakeholders to analyze visualizations
Reduce chart analysis time from hours to minutes while maintaining human oversight for accuracy
Deploy lightweight 500M parameter models that run efficiently on standard hardware without GPU requirements

Introduction to SmolVLM
SmolVLM enables this workflow by combining image analysis with natural language processing. This lightweight multimodal AI model, hosted on Hugging Face, processes images and text together for visual question-answering tasks.
SmolVLM excels at:

Chart and graph analysis
Document understanding
Visual question answering
Image content summarization

Install the required packages:
pip install transformers>=4.45.0 pillow torch

💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!

Getting Started
SmolVLM eliminates the complexity of traditional image analysis by enabling direct natural language queries about visual content.
Let’s start by loading the model:
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
import requests

# Load the processor and model
model_id = "HuggingFaceTB/SmolVLM-500M-Instruct"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForVision2Seq.from_pretrained(
model_id,
torch_dtype=torch.float16
)

# Move model to available device
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

SmolVLM setup explained:

AutoProcessor: Converts images and text into format the model understands
AutoModelForVision2Seq: Loads the actual SmolVLM neural network weights
torch.float16: Uses half-precision to reduce memory usage by 50%
Device detection: Automatically uses GPU if available for faster inference

Now, let’s create a helper function to analyze a single image with a natural language question:
def analyze_image_with_smolvlm(image, question, max_tokens=200):
"""Analyze an image with SmolVLM using a natural language question."""
# Format input as chat conversation
messages = [{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": question}
]
}]

# Convert to model input format
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt").to(device)

# Generate response
generated_ids = model.generate(
**inputs, max_new_tokens=max_tokens, do_sample=True, temperature=0.3
)

# Extract and return the response text
response = processor.batch_decode(
generated_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True
)[0]

return response

The function above does the following:

apply_chat_template: Formats the conversation for SmolVLM’s expected input structure
processor(): Tokenizes text and preprocesses images into tensors
model.generate(): Runs the actual AI inference with configurable parameters
batch_decode(): Converts model output tokens back to readable text

Next, create a helper function to analyze multiple questions:
def print_qa_results(questions, answers, separator_length=40):
"""Print question and answer pairs with formatted output."""
for question, answer in zip(questions, answers):
print(f"Question: {question}")
print(f"Answer: {answer}")
print("-" * separator_length)

Chart Analysis
Let’s put our helper function to work by analyzing a real heatmap using natural language queries.
We’ll analyze this correlation heatmap:

Here are the questions we’ll ask:
# Load a complex chart for analysis – financial correlation heatmap
image_url = (
"https://eodhd.com/financial-academy/wp-content/uploads/2023/12/heatmap_sector.png"
)
image = Image.open(requests.get(image_url, stream=True).raw)

# Ask questions about the chart
questions = [
"What type of chart is this?",
"What are the main trends shown in this visualization?",
"What insights can you derive from this data?",
]

answers = [analyze_image_with_smolvlm(image, q) for q in questions]
print_qa_results(questions, answers, separator_length=50)

Output:
Question: What type of chart is this?
Answer: Heatmap.
————————————————–
Question: What are the main trends shown in this visualization?
Answer: This is a sector heat map showing percentage allocation across financial sectors. Consumer Staples has the highest allocation, followed by Energy and Industrials sectors.
————————————————–
Question: What insights can you derive from this data?
Answer: Key insight: Consumer-focused sectors (services and staples) dominate the market, with communication services also performing strongly.
————————————————–

The model provides direct answers about chart type, trends, and insights without requiring manual preprocessing or specialized analysis tools.
Document Understanding
SmolVLM excels at extracting information from documents, receipts, and forms. Let’s test this with a receipt:

and ask the following questions:
# Analyze a receipt or invoice
receipt_url = "https://raw.githubusercontent.com/mistralai/cookbook/main/mistral/ocr/receipt.png"
receipt_image = Image.open(requests.get(receipt_url, stream=True).raw)

# Document analysis questions
document_questions = [
"What type of document is this?",
"What is the total amount?",
"What items can you identify?",
]

answers = [analyze_image_with_smolvlm(receipt_image, q, max_tokens=150) for q in document_questions]
print_qa_results(document_questions, answers)

Output:
Question: What type of document is this?
Answer: This is an invoice or receipt for a parking permit.
—————————————-
Question: What is the total amount?
Answer: The total amount is $15.00.
—————————————-
Question: What items can you identify?
Answer: The text contains a list of items, but specific details about each item are not provided.
—————————————-

Image Content Summarization
We can also use SmolVLM to generate a summary of an image.
Let’s use the following image:

and ask the following questions:
# Analyze a street scene
street_url = "https://images.unsplash.com/photo-1449824913935-59a10b8d2000?w=500"
street_image = Image.open(requests.get(street_url, stream=True).raw)

# Summarization prompts
summary_prompts = [
"What's the main activity happening here?",
"Summarize the key elements of this scene"
]

answers = [analyze_image_with_smolvlm(street_image, q, max_tokens=250) for q in summary_prompts]
print_qa_results(summary_prompts, answers, separator_length=50)

Output:
Prompt: What's the main activity happening here?
Response: There are many people on the road in the image.
————————————————–
Prompt: Summarize the key elements of this scene
Response: A busy city street with a few people walking and a few cars.
————————————————–

Automated Business Intelligence with Chart Analysis
See how SmolVLM integrates into a real data analysis workflow. We’ll create a sales performance chart and then analyze it with natural language queries.
First, generate sample sales data and create the visualization:
# Generate quarterly sales data
import pandas as pd
import matplotlib.pyplot as plt

data = {
'Quarter': ['Q1 2024', 'Q2 2024', 'Q3 2024', 'Q4 2024'],
'Product A': [45000, 52000, 48000, 61000],
'Product B': [38000, 41000, 39000, 44000],
'Product C': [23000, 28000, 32000, 35000]
}

df = pd.DataFrame(data)
df.set_index('Quarter').plot(kind='bar', figsize=(10, 6))
plt.title('Quarterly Sales Performance by Product')
plt.ylabel('Sales ($)')
plt.xticks(rotation=45)
plt.savefig('sales_chart.png', dpi=300)
plt.show()

Now analyze the chart with SmolVLM using targeted business questions:
from PIL import Image

# Load the chart we just created
chart_image = Image.open('sales_chart.png')

questions = [
"What quarter had the best overall performance?",
"Are there any concerning trends I should investigate?"
]

for question in questions:
answer = analyze_image_with_smolvlm(chart_image, question)
print(f"Q: {question}")
print(f"A: {answer}\n")

Output:
Q: What quarter had the best overall performance?
A: Q4 2024.

Q: Are there any concerning trends I should investigate?
A: The sales figures for Q1 2024 and Q4 2024 are significantly higher than those for Q2 2024 and Q3 2024.

This workflow demonstrates SmolVLM’s value in everyday data analysis: create visualizations with your preferred tools, then get instant insights through natural language queries.
Building a Web Dashboard
To make chart analysis accessible for non-technical stakeholders, you can create an interactive web dashboard using Gradio. This Python framework enables rapid deployment of machine learning applications with just a few lines of code.
To use Gradio, start by installing it:
pip install gradio

Next, create a dashboard function that combines image upload with question input. This function uses the SmolVLM helper we defined earlier:
def dashboard_analysis(image, question):
if image is None:
return "Please upload an image to analyze."

# Use the analyze_image_with_smolvlm function from earlier in the article
return analyze_image_with_smolvlm(image, question, max_tokens=200)

If running this dashboard code independently, make sure to include the model loading and helper function from the “Getting Started” section above.
The function handles image uploads and passes user questions directly to SmolVLM. Now build the Gradio interface with three components:
import gradio as gr

dashboard = gr.Interface(
fn=dashboard_analysis,
inputs=[
gr.Image(type="pil", label="Upload Chart or Visualization"),
gr.Textbox(value="What are the key trends in this chart?", label="Ask a Question")
],
outputs=gr.Textbox(label="Analysis Results"),
title="SmolVLM Chart Analysis Dashboard"
)

Launch the dashboard to make it accessible to your team:
dashboard.launch(share=True) # share=True creates public link

The dashboard provides a clean interface where users upload charts and ask questions in natural language.

Conclusion
SmolVLM democratizes visual analysis for data scientists. No specialized computer vision expertise needed – just natural language queries on your existing charts and visualizations.
Practical applications using SmolVLM’s capabilities:

Chart analysis: Upload any matplotlib/seaborn visualization and ask about patterns, outliers, and trends
Document understanding: Extract information from scanned forms, receipts, and research papers automatically
Image summarization: Generate descriptions of complex multi-panel figures from research publications
Dashboard automation: Batch process weekly report screenshots for automated insight generation
Visual validation: Ask “Does this chart support my hypothesis?” to strengthen analytical conclusions

Related Resources
Here are some related resources that you may find useful:

Docling guide for transforming PDFs into searchable AI data
MLflow guide for building quality metrics

Favorite

Faster Chart Analysis with Hugging Face Vision Models Read More »