Table of Contents
- Motivation
- Introduction to SmolVLM
- Getting Started
- Chart Analysis
- Document Understanding
- Image Content Summarization
- Automated Business Intelligence with Chart Analysis
- Building a Web Dashboard
- Conclusion
Motivation
Manually analyzing charts is time-consuming. Data teams spend hours examining dashboards, extracting insights from visualizations, and documenting findings – a process that doesn’t scale when dealing with dozens of reports daily.
# Current manual process: time-consuming chart analysis
import matplotlib.pyplot as plt
import pandas as pd
# Load sales data and create visualizations
df = pd.read_csv("quarterly_sales.csv")
charts = [
df.groupby('region').sum().plot(kind='bar'),
df.plot(x='date', y='revenue', kind='line'),
df.corr().style.background_gradient() # Correlation heatmap
]
# Manual analysis required for each chart:
# 1. Open and examine each visualization
# 2. Identify patterns and trends visually
# 3. Extract key insights manually
# 4. Document findings for stakeholders
Automated insight extraction with human oversight changes this dynamic. Use AI to quickly analyze visualizations and generate initial insights, then review and refine the output. This approach reduces analysis time from hours to minutes while maintaining accuracy through human validation.
Key Takeaways
Here’s what you’ll learn:
- Analyze charts and visualizations using natural language queries instead of manual pattern recognition
- Process documents, receipts, and forms automatically with SmolVLM’s multimodal capabilities
- Build interactive web dashboards with Gradio for non-technical stakeholders to analyze visualizations
- Reduce chart analysis time from hours to minutes while maintaining human oversight for accuracy
- Deploy lightweight 500M parameter models that run efficiently on standard hardware without GPU requirements
Introduction to SmolVLM
SmolVLM enables this workflow by combining image analysis with natural language processing. This lightweight multimodal AI model, hosted on Hugging Face, processes images and text together for visual question-answering tasks.
SmolVLM excels at:
- Chart and graph analysis
- Document understanding
- Visual question answering
- Image content summarization
Install the required packages:
pip install transformers>=4.45.0 pillow torch
💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!
Getting Started
SmolVLM eliminates the complexity of traditional image analysis by enabling direct natural language queries about visual content.
Let’s start by loading the model:
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
import requests
# Load the processor and model
model_id = "HuggingFaceTB/SmolVLM-500M-Instruct"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForVision2Seq.from_pretrained(
model_id,
torch_dtype=torch.float16
)
# Move model to available device
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
SmolVLM setup explained:
AutoProcessor
: Converts images and text into format the model understandsAutoModelForVision2Seq
: Loads the actual SmolVLM neural network weightstorch.float16
: Uses half-precision to reduce memory usage by 50%Device detection
: Automatically uses GPU if available for faster inference
Now, let’s create a helper function to analyze a single image with a natural language question:
def analyze_image_with_smolvlm(image, question, max_tokens=200):
"""Analyze an image with SmolVLM using a natural language question."""
# Format input as chat conversation
messages = [{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": question}
]
}]
# Convert to model input format
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt").to(device)
# Generate response
generated_ids = model.generate(
**inputs, max_new_tokens=max_tokens, do_sample=True, temperature=0.3
)
# Extract and return the response text
response = processor.batch_decode(
generated_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True
)[0]
return response
The function above does the following:
apply_chat_template
: Formats the conversation for SmolVLM’s expected input structureprocessor()
: Tokenizes text and preprocesses images into tensorsmodel.generate()
: Runs the actual AI inference with configurable parametersbatch_decode()
: Converts model output tokens back to readable text
Next, create a helper function to analyze multiple questions:
def print_qa_results(questions, answers, separator_length=40):
"""Print question and answer pairs with formatted output."""
for question, answer in zip(questions, answers):
print(f"Question: {question}")
print(f"Answer: {answer}")
print("-" * separator_length)
Chart Analysis
Let’s put our helper function to work by analyzing a real heatmap using natural language queries.
We’ll analyze this correlation heatmap:
Here are the questions we’ll ask:
# Load a complex chart for analysis - financial correlation heatmap
image_url = (
"https://eodhd.com/financial-academy/wp-content/uploads/2023/12/heatmap_sector.png"
)
image = Image.open(requests.get(image_url, stream=True).raw)
# Ask questions about the chart
questions = [
"What type of chart is this?",
"What are the main trends shown in this visualization?",
"What insights can you derive from this data?",
]
answers = [analyze_image_with_smolvlm(image, q) for q in questions]
print_qa_results(questions, answers, separator_length=50)
Output:
Question: What type of chart is this?
Answer: Heatmap.
--------------------------------------------------
Question: What are the main trends shown in this visualization?
Answer: This is a sector heat map showing percentage allocation across financial sectors. Consumer Staples has the highest allocation, followed by Energy and Industrials sectors.
--------------------------------------------------
Question: What insights can you derive from this data?
Answer: Key insight: Consumer-focused sectors (services and staples) dominate the market, with communication services also performing strongly.
--------------------------------------------------
The model provides direct answers about chart type, trends, and insights without requiring manual preprocessing or specialized analysis tools.
Document Understanding
SmolVLM excels at extracting information from documents, receipts, and forms. Let’s test this with a receipt:
and ask the following questions:
# Analyze a receipt or invoice
receipt_url = "https://raw.githubusercontent.com/mistralai/cookbook/main/mistral/ocr/receipt.png"
receipt_image = Image.open(requests.get(receipt_url, stream=True).raw)
# Document analysis questions
document_questions = [
"What type of document is this?",
"What is the total amount?",
"What items can you identify?",
]
answers = [analyze_image_with_smolvlm(receipt_image, q, max_tokens=150) for q in document_questions]
print_qa_results(document_questions, answers)
Output:
Question: What type of document is this?
Answer: This is an invoice or receipt for a parking permit.
----------------------------------------
Question: What is the total amount?
Answer: The total amount is $15.00.
----------------------------------------
Question: What items can you identify?
Answer: The text contains a list of items, but specific details about each item are not provided.
----------------------------------------
Image Content Summarization
We can also use SmolVLM to generate a summary of an image.
Let’s use the following image:
and ask the following questions:
# Analyze a street scene
street_url = "https://images.unsplash.com/photo-1449824913935-59a10b8d2000?w=500"
street_image = Image.open(requests.get(street_url, stream=True).raw)
# Summarization prompts
summary_prompts = [
"What's the main activity happening here?",
"Summarize the key elements of this scene"
]
answers = [analyze_image_with_smolvlm(street_image, q, max_tokens=250) for q in summary_prompts]
print_qa_results(summary_prompts, answers, separator_length=50)
Output:
Prompt: What's the main activity happening here?
Response: There are many people on the road in the image.
--------------------------------------------------
Prompt: Summarize the key elements of this scene
Response: A busy city street with a few people walking and a few cars.
--------------------------------------------------
Automated Business Intelligence with Chart Analysis
See how SmolVLM integrates into a real data analysis workflow. We’ll create a sales performance chart and then analyze it with natural language queries.
First, generate sample sales data and create the visualization:
# Generate quarterly sales data
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Quarter': ['Q1 2024', 'Q2 2024', 'Q3 2024', 'Q4 2024'],
'Product A': [45000, 52000, 48000, 61000],
'Product B': [38000, 41000, 39000, 44000],
'Product C': [23000, 28000, 32000, 35000]
}
df = pd.DataFrame(data)
df.set_index('Quarter').plot(kind='bar', figsize=(10, 6))
plt.title('Quarterly Sales Performance by Product')
plt.ylabel('Sales ($)')
plt.xticks(rotation=45)
plt.savefig('sales_chart.png', dpi=300)
plt.show()
Now analyze the chart with SmolVLM using targeted business questions:
from PIL import Image
# Load the chart we just created
chart_image = Image.open('sales_chart.png')
questions = [
"What quarter had the best overall performance?",
"Are there any concerning trends I should investigate?"
]
for question in questions:
answer = analyze_image_with_smolvlm(chart_image, question)
print(f"Q: {question}")
print(f"A: {answer}\n")
Output:
Q: What quarter had the best overall performance?
A: Q4 2024.
Q: Are there any concerning trends I should investigate?
A: The sales figures for Q1 2024 and Q4 2024 are significantly higher than those for Q2 2024 and Q3 2024.
This workflow demonstrates SmolVLM’s value in everyday data analysis: create visualizations with your preferred tools, then get instant insights through natural language queries.
Building a Web Dashboard
To make chart analysis accessible for non-technical stakeholders, you can create an interactive web dashboard using Gradio. This Python framework enables rapid deployment of machine learning applications with just a few lines of code.
To use Gradio, start by installing it:
pip install gradio
Next, create a dashboard function that combines image upload with question input. This function uses the SmolVLM helper we defined earlier:
def dashboard_analysis(image, question):
if image is None:
return "Please upload an image to analyze."
# Use the analyze_image_with_smolvlm function from earlier in the article
return analyze_image_with_smolvlm(image, question, max_tokens=200)
If running this dashboard code independently, make sure to include the model loading and helper function from the “Getting Started” section above.
The function handles image uploads and passes user questions directly to SmolVLM. Now build the Gradio interface with three components:
import gradio as gr
dashboard = gr.Interface(
fn=dashboard_analysis,
inputs=[
gr.Image(type="pil", label="Upload Chart or Visualization"),
gr.Textbox(value="What are the key trends in this chart?", label="Ask a Question")
],
outputs=gr.Textbox(label="Analysis Results"),
title="SmolVLM Chart Analysis Dashboard"
)
Launch the dashboard to make it accessible to your team:
dashboard.launch(share=True) # share=True creates public link
The dashboard provides a clean interface where users upload charts and ask questions in natural language.
Conclusion
SmolVLM democratizes visual analysis for data scientists. No specialized computer vision expertise needed – just natural language queries on your existing charts and visualizations.
Practical applications using SmolVLM’s capabilities:
- Chart analysis: Upload any matplotlib/seaborn visualization and ask about patterns, outliers, and trends
- Document understanding: Extract information from scanned forms, receipts, and research papers automatically
- Image summarization: Generate descriptions of complex multi-panel figures from research publications
- Dashboard automation: Batch process weekly report screenshots for automated insight generation
- Visual validation: Ask “Does this chart support my hypothesis?” to strengthen analytical conclusions
Related Resources
Here are some related resources that you may find useful: