Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM
Machine Learning
Machine Learning
Machine Learning & AI
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Tips
Python Utilities
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

ai

Auto-created tag for ai

Build Production-Ready RAG Systems with MLflow Quality Metrics

Table of Contents

What is MLflow GenAI?
Article Overview
Quick Setup

Installation
Environment Configuration
Importing Libraries
RAG System with Ollama Llama3.2
Evaluation Dataset

Core RAG Metrics

Faithfulness Evaluation
Answer Relevance Evaluation

Running and Interpreting Results

Comprehensive Evaluation with MLflow
Viewing Results in MLflow Dashboard

Interpreting the Results
Next Steps

Build Production-Ready RAG Systems with MLflow Quality Metrics
How do you know if your AI model actually works? AI model outputs can be inconsistent – sometimes providing inaccurate responses, irrelevant information, or answers that don’t align with the input context. Manual evaluation of these issues is time-consuming and doesn’t scale as your system grows.
MLflow for GenAI automates evaluation across three critical areas: faithfulness (responses match retrieved context) and answer relevance (outputs address user questions). This guide teaches you to implement these evaluations and systematically improve your AI system’s performance.
What is MLflow GenAI?
MLflow is an open-source platform for managing machine learning lifecycles – tracking experiments, packaging models, and managing deployments. Traditional MLflow focuses on numerical metrics like accuracy and loss.
MLflow for GenAI extends this foundation specifically for generative AI applications. It evaluates subjective qualities that numerical metrics can’t capture:

Response relevance: Measures whether outputs address user questions
Factual accuracy: Checks if responses stay truthful to source material
Context adherence: Evaluates whether answers stick to retrieved information
Automated scoring: Uses AI judges instead of manual evaluation
Scalable assessment: Handles large datasets without human reviewers

Article Overview
This guide walks you through a complete AI evaluation workflow. You’ll build a RAG (Retrieval-Augmented Generation) system, test it with real data, and measure its performance using automated tools. For comprehensive RAG fundamentals, see our LangChain and Ollama guide.
What you’ll build:

RAG system: Create a question-answering system using Ollama’s Llama3.
Test dataset: Design evaluation data that reveals system strengths and weaknesses
Automated evaluation: Use OpenAI-powered metrics to score response quality
MLflow interface: Track experiments and visualize results in an interactive dashboard
Results analysis: Interpret scores and identify areas for improvement

Quick Setup
Installation
Start by installing the necessary packages for this guide.
pip install 'mlflow>=3.0.0rc0' langchain-ollama pandas

Environment Configuration
We’ll use Ollama to run Llama3.2 locally for our RAG system. Ollama lets you download and run AI models on your computer, keeping your question-answering data private while eliminating API costs.
Ensure you have Ollama installed locally and the Llama3.2 model downloaded.
# Install Ollama (if not already installed)
# Visit https://ollama.ai for installation instructions

# Pull the Llama3.2 model
ollama pull llama3.2

Importing Libraries
Import the necessary libraries for our RAG system and MLflow evaluation.
import os
import pandas as pd
import mlflow
from mlflow.metrics.genai import faithfulness, answer_relevance, make_genai_metric
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Note: Ensure Ollama is installed and llama3.2 model is available
# Run: ollama pull llama3.2

RAG System with Ollama Llama3.2
We’ll create a real RAG (Retrieval-Augmented Generation) system using Ollama’s Llama3.2 model that retrieves context and generates answers.
This function creates a question-answering system that:

Takes a question and available documents as input
Uses the most relevant documents to provide context
Generates accurate answers using the Llama3.2 model
Returns both the answer and the sources used

def ollama_rag_system(question, context_docs):
"""Real RAG system using Ollama Llama3.2"""
# Retrieve top 2 most relevant documents
retrieved_context = "\n".join(context_docs[:2])

# Create prompt template
prompt = ChatPromptTemplate.from_template(
"""Answer the question based on the provided context.
Be concise and accurate.

Context: {context}
Question: {question}

Answer:"""
)

# Initialize Llama3.2 model
llm = ChatOllama(model="llama3.2", temperature=0)

# Create chain and get response
chain = prompt | llm | StrOutputParser()
answer = chain.invoke({"context": retrieved_context, "question": question})

return {
"answer": answer,
"retrieved_context": retrieved_context,
"retrieved_docs": context_docs[:2],
}

Evaluation Dataset
An evaluation dataset helps you measure system quality systematically. It reveals how well your RAG system handles different question types and identifies areas for improvement.
To create an evaluation dataset, start with a knowledge base of documents that answer questions. Build the dataset with questions, expected answers, and context from this knowledge base.
knowledge_base = [
"MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides experiment tracking, model packaging, versioning, and deployment capabilities.",
"RAG systems combine retrieval and generation to provide accurate, contextual responses. They first retrieve relevant documents then generate answers.",
"Vector databases store document embeddings for efficient similarity search. They enable fast retrieval of relevant information."
]

eval_data = pd.DataFrame({
"question": [
"What is MLflow?",
"How does RAG work?",
"What are vector databases used for?"
],
"expected_answer": [
"MLflow is an open-source platform for managing machine learning workflows",
"RAG combines retrieval and generation for contextual responses",
"Vector databases store embeddings for similarity search"
],
"context": [
knowledge_base[0],
knowledge_base[1],
knowledge_base[2]
]
})

eval_data

Index
Question
Expected Answer
Context

0
What is MLflow?
Open-source ML workflow platform
MLflow manages ML lifecycles with tracking, packaging…

1
How does RAG work?
Combines retrieval and generation
RAG systems retrieve documents then generate answers…

2
What are vector databases used for?
Store embeddings for similarity search
Vector databases enable fast retrieval of information…

Generate answers for each question using the RAG system. This creates the responses we’ll evaluate for quality and accuracy.
# Generate answers for evaluation
def generate_answers(row):
result = ollama_rag_system(row['question'], [row['context']])
return result['answer']

eval_data['generated_answer'] = eval_data.apply(generate_answers, axis=1)

Print the first row to see the question, context, and generated answer.
# Display the first row to see question, context, and answer
print(f"Question: {eval_data.iloc[0]['question']}")
print(f"Context: {eval_data.iloc[0]['context']}")
print(f"Generated Answer: {eval_data.iloc[0]['generated_answer']}")

The output displays three key components:

The question shows what we asked.
The context shows which documents the system used to generate the answer.
The answer contains the RAG system’s response.

Question: What is MLflow?
Context: MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides experiment tracking, model packaging, versioning, and deployment capabilities.
Generated Answer: MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, providing features such as experiment tracking, model packaging, versioning, and deployment capabilities.

Core RAG Metrics
Faithfulness Evaluation
Faithfulness measures whether the generated answer stays true to the retrieved context, preventing hallucination:
In the code below, we define the function evaluate_faithfulness that:

Creates an AI judge using GPT-4 to evaluate faithfulness.
Takes the generated answer, question, and context as input.
Returns a score from 1-5, where 5 indicates perfect faithfulness.

We then apply this function to the evaluation dataset to get the faithfulness score for each question.
# Evaluate faithfulness for each answer
def evaluate_faithfulness(row):
# Initialize faithfulness metric with OpenAI GPT-4 as judge
faithfulness_metric = faithfulness(model="openai:/gpt-4")
score = faithfulness_metric(
predictions=[row['generated_answer']],
inputs=[row['question']],
context=[row['context']],
)
return score.scores[0]

eval_data['faithfulness_score'] = eval_data.apply(evaluate_faithfulness, axis=1)
print("Faithfulness Evaluation Results:")
print(eval_data[['question', 'faithfulness_score']])

Faithfulness Evaluation Results:

Question
Faithfulness Score

What is MLflow?
5

How does RAG work?
5

What are vector databases used for?
5

Perfect scores of 5 show the RAG system answers remain faithful to the source material. No hallucination or unsupported claims were detected.
Answer Relevance Evaluation
Answer relevance measures whether the response actually addresses the question asked:
# Evaluate answer relevance
def evaluate_relevance(row):
# Initialize answer relevance metric
relevance_metric = answer_relevance(model="openai:/gpt-4")
score = relevance_metric(
predictions=[row['generated_answer']],
inputs=[row['question']]
)
return score.scores[0]

eval_data['relevance_score'] = eval_data.apply(evaluate_relevance, axis=1)
print("Answer Relevance Results:")
print(eval_data[['question', 'relevance_score']])

Answer Relevance Results:

Question
Relevance Score

What is MLflow?
5

How does RAG work?
5

What are vector databases used for?
5

Perfect scores of 5 show the RAG system’s responses directly address the questions asked. No irrelevant or off-topic answers were generated.
Running and Interpreting Results
We’ll now combine individual metrics into a comprehensive MLflow evaluation. This creates detailed reports, tracks experiments, and enables result comparison. Finally, we’ll analyze the scores to identify areas for improvement.
Comprehensive Evaluation with MLflow
Start by using MLflow’s evaluation framework to run all metrics together.
The following code:

Defines a model function that MLflow can evaluate systematically
Takes a DataFrame of questions and processes them through the RAG system
Converts results to a list format required by MLflow
Combines all metrics into a single evaluation run for comprehensive reporting

# Prepare data for MLflow evaluation
def rag_model_function(input_df):
"""Model function for MLflow evaluation"""
def process_row(row):
result = ollama_rag_system(row["question"], [row["context"]])
return result["answer"]

return input_df.apply(process_row, axis=1).tolist()

# Run comprehensive evaluation
with mlflow.start_run() as run:
evaluation_results = mlflow.evaluate(
model=rag_model_function,
data=eval_data[
["question", "context", "expected_answer"]
], # Include expected_answer column
targets="expected_answer",
extra_metrics=[faithfulness_metric, relevance_metric],
evaluator_config={
"col_mapping": {
"inputs": "question",
"context": "context",
"predictions": "predictions",
"targets": "expected_answer",
}
},
)

After running the code, the evaluation results get stored in MLflow’s tracking system. You can now compare different runs and analyze performance metrics through the dashboard.
Viewing Results in MLflow Dashboard
Launch the MLflow UI to explore evaluation results interactively:
mlflow ui

Navigate to http://localhost:5000 to access the dashboard.
The MLflow dashboard shows the Experiments table with two evaluation runs. Each run displays the run name (like “bold-slug-816”), creation time, dataset information, and duration. You can select runs to compare their performance metrics.

Click on any experiment to see the details of the evaluation. When you scroll down to the Metrics section, you will see detailed evaluation metrics including faithfulness and relevance scores for each question.

Clicking on “Traces” will show you the detailed request-response pairs for each evaluation question for debugging and analysis.

Clicking on “Artifacts” reveals the evaluation results table containing the complete evaluation data, metric scores, and a downloadable format for external analysis.

Interpreting the Results
Raw scores need interpretation to drive improvements. Use MLflow’s evaluation data to identify specific areas for enhancement.
The analysis:

Extracts performance metrics from comprehensive evaluation results
Calculates mean scores across all questions for both metrics
Identifies underperforming questions that require attention
Generates targeted feedback for systematic improvement

def interpret_evaluation_results(evaluation_results):
"""Analyze MLflow evaluation results"""

# Extract metrics and data
metrics = evaluation_results.metrics
eval_table = evaluation_results.tables['eval_results_table']

# Overall performance
avg_faithfulness = metrics.get('faithfulness/v1/mean', 0)
avg_relevance = metrics.get('answer_relevance/v1/mean', 0)

print(f"Average Scores:")
print(f"Faithfulness: {avg_faithfulness:.2f}")
print(f"Answer Relevance: {avg_relevance:.2f}")

# Identify problematic questions
low_performing = eval_table[
(eval_table['faithfulness/v1/score'] < 3) |
(eval_table['answer_relevance/v1/score'] < 3)
]

if not low_performing.empty:
print(f"\nQuestions needing improvement: {len(low_performing)}")
for _, row in low_performing.iterrows():
print(f"- {row['inputs']}")
else:
print("\nAll questions performing well!")

# Usage
interpret_evaluation_results(evaluation_results)

Average Scores:
Faithfulness: 5.00
Answer Relevance: 5.00

All questions performing well!

Perfect scores indicate the RAG system generates accurate, contextual responses without hallucination. This baseline establishes a benchmark for future system modifications and more complex evaluation datasets.
Next Steps
This evaluation framework provides the foundation for systematically improving your RAG system:

Regular Evaluation: Run these metrics on your test dataset with each system change
Threshold Setting: Establish minimum acceptable scores for each metric based on your requirements
Automated Monitoring: Integrate these evaluations into your CI/CD pipeline
Iterative Improvement: Use the insights to guide retrieval improvements, prompt engineering, and model selection

The combination of faithfulness, answer relevance, and retrieval quality metrics gives you a comprehensive view of your RAG system’s performance, enabling data-driven improvements and reliable quality assurance.
Favorite

Build Production-Ready RAG Systems with MLflow Quality Metrics Read More »

Transform Any PDF into Searchable AI Data with Docling

Table of Contents

Setting Up Your Document Processing Pipeline

What is Docling?
What is RAG?

Quick Start: Your First Document Conversion
Export Options for Different Use Cases
Configuring PdfPipelineOptions for Advanced Processing

Enable Image Extraction
Table Recognition Enhancement
AI-Powered Content Understanding
Performance and Memory Management

Building Your RAG Pipeline

Tools for RAG Pipelines
Document Processing
Chunking
Creating a Vector Store

Conclusion

What if complex research papers could be transformed into AI-searchable data using fewer than 10 lines of Python?
Financial reports, research documents, and analytical papers often contain vital tables and formulas that traditional PDF tools fail to extract properly. This results in the loss of structured data that could inform key decisions.
Docling, developed by IBM Research, is an AI-first document processing tool that preserves the relationships between text, tables, and formulas. With just three lines of code, you can convert any document into structured data.
In this tutorial, you’ll learn how to build a complete pipeline that takes documents in any format and turns them into high-quality RAG-ready chunks for AI applications.
Setting Up Your Document Processing Pipeline
What is Docling?
Docling is an AI-first document processing tool developed by IBM Research. It transforms complex documents—like PDFs, Excel spreadsheets, and Word files—into structured data while preserving their original structure, including text, tables, and formulas.
To install Docling, run the following command:
pip install docling

What is RAG?
RAG (Retrieval-Augmented Generation) is an AI technique that combines document retrieval with language generation. Instead of relying solely on training data, RAG systems search through external documents to find relevant information, then use that context to generate accurate, up-to-date responses.
This process requires converting documents into structured, searchable chunks. Docling handles this conversion seamlessly.

Quick Start: Your First Document Conversion
Docling transforms any document into structured data with just three lines of code. Let’s see this in action by converting a PDF document – specifically, Docling’s own technical report from arXiv. This is a good example because it contains a lot of different types of elements, including tables, formulas, and text.
from docling.document_converter import DocumentConverter
import pandas as pd

# Initialize converter with default settings
converter = DocumentConverter()

# Convert any document format – we'll use the Docling technical report itself
source_url = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source_url)

# Access structured data immediately
doc = result.document
print(f"Successfully processed document from: {source_url}")

To iterate through each document element, we will use the doc.iterate_items() method. This method returns tuples of (item, level). For example:

(TextItem(label='paragraph', text='Introduction text…'), 0) – top-level paragraph
(TableItem(label='table', text='| Col1 | Col2 |…'), 1) – table at depth 1
(TextItem(label='heading', text='Section 2'), 0) – section heading

from collections import defaultdict

# Create a dictionary to categorize all document elements by type
element_types = defaultdict(list)

# Iterate through all document elements and group them by label
for item, _ in doc.iterate_items():
element_type = item.label
element_types[element_type].append(item)

# Display the breakdown of document structure
print("Document structure breakdown:")
for element_type, items in element_types.items():
print(f" {element_type}: {len(items)} elements")

The output shows the different types of elements Docling extracted from the document.
Document structure breakdown:
picture: 13 elements
section_header: 31 elements
text: 102 elements
list_item: 22 elements
code: 2 elements
footnote: 1 elements
caption: 3 elements
table: 5 elements

Let’s look specifically for structured elements like tables and formulas that are crucial for RAG applications:
first_table = element_types["table"][0]
print(first_table.export_to_dataframe().to_markdown())

CPU.
Thread budget.
native backend.TTS
native backend.Pages/s
native backend.Mem
pypdfium backend.TTS
pypdfium backend.Pages/s
pypdfium backend.Mem

0
Apple M3 Max
4
177 s 167 s
1.27 1.34
6.20 GB
103 s 92 s
2.18 2.45
2.56 GB

1
(16 cores) Intel(R) E5-2690
16 4 16
375 s 244 s
0.60 0.92
6.16 GB
239 s 143 s
0.94 1.57
2.42 GB

Here is how the table looks in the original PDF:

The extracted table shows Docling’s accuracy and structural differences from the original PDF. Docling captured all numerical data and text perfectly but flattened the merged cell structure into separate columns.
While this loses visual formatting, it benefits RAG applications since each row contains complete information without complex cell merging logic.
Next, look at the first list item element:
first_list_items = element_types["list_item"][0:6]
for list_item in first_list_items:
print(list_item.text)

· Converts PDF documents to JSON or Markdown format, stable and lightning fast
· Understands detailed page layout, reading order, locates figures and recovers table structures
· Extracts metadata from the document, such as title, authors, references and language
· Optionally applies OCR, e.g. for scanned PDFs
· Can be configured to be optimal for batch-mode (i.e high throughput, low time-to-solution) or interactive mode (compromise on efficiency, low time-to-solution)
· Can leverage different accelerators (GPU, MPS, etc).

This matches the original PDF list item.

Look at the first caption element:
first_caption = element_types["caption"][0]
print(first_caption.text)

This matches the image caption in the original PDF.

This matches the image caption in the original PDF.
Export Options for Different Use Cases
Docling provides multiple ways to export the document data, including Markdown, JSON, and dictionary formats.
For human review and documentation, Markdown format preserves the document structure beautifully.
# Human-readable markdown for review
markdown_content = doc.export_to_markdown()
print(markdown_content[:500] + "…")

<!– image –>

## Docling Technical Report

Version 1.0

Christoph Auer Maksym Lysak Ahmed Nassar Michele Dolfi Nikolaos Livathinos Panos Vagenas Cesar Berrospi Ramis Matteo Omenetti Fabian Lindlbauer Kasper Dinkla Lokesh Mishra Yusik Kim Shubham Gupta Rafael Teixeira de Lima Valery Weber Lucas Morin Ingmar Meijer Viktor Kuropiatnyk Peter W. J. Staar

AI4K Group, IBM Research R¨ uschlikon, Switzerland

## Abstract

This technical report introduces Docling , an easy to use, self-contained, MITli…

Compare this to the original PDF:

Docling preserves all original content while converting complex PDF formatting into clean markdown. Every author name, title, and abstract text remains intact, creating searchable structure perfect for RAG applications.
For programmatic processing and API integrations, JSON format provides structured access to all document elements:
import json

# JSON for programmatic processing
json_dict = doc.export_to_dict()

print('JSON keys:', json_dict.keys())

JSON keys: dict_keys(['schema_name', 'version', 'name', 'origin', 'furniture', 'body', 'groups', 'texts', 'pictures', 'tables', 'key_value_items', 'form_items', 'pages'])

The JSON structure reveals Docling’s comprehensive document analysis. Key sections include texts for paragraphs, tables for structured data, pictures for images, and pages for layout information.
For Python development workflows, the dictionary format enables immediate access to all document elements.
# Python dictionary for immediate use
dict_repr = doc.export_to_dict()

# Preview the structure
num_texts = len(dict_repr['texts'])
num_tables = len(dict_repr['tables'])

print(f"Text elements: {num_texts}")
print(f"Table elements: {num_tables}")

Text elements: 985
Table elements: 5

Configuring PdfPipelineOptions for Advanced Processing
The default Docling configuration works well for most documents, but PdfPipelineOptions unlocks advanced processing capabilities. These options control OCR engines, table recognition, AI enrichments, and performance settings.
PdfPipelineOptions becomes essential when working with scanned documents, complex layouts, or specialized content requiring AI-powered understanding.
Enable Image Extraction
By default, Docling does not extract images from the document. However, you can enable image extraction by setting the generate_picture_images option to True.
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import PdfFormatOption

pipeline_options = PdfPipelineOptions(generate_picture_images=True)

# Create converter with enhanced table processing
converter_enhanced = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
}
)

result_enhanced = converter_enhanced.convert("https://arxiv.org/pdf/2408.09869")
doc_enhanced = result_enhanced.document

Display the first image:
# Extract and display the first image
from IPython.display import Image, display

for item, _ in doc_enhanced.iterate_items():
if item.label == "picture":
image_data = item.image

# Get the image URI
uri = str(image_data.uri)

# Display the image using IPython
display(Image(url=uri))
break

The output image matches the first image of the PDF.
Table Recognition Enhancement
To use the more sophisticated AI model for table extraction instead of the default fast model, you can set the table_structure_options.mode to TableFormerMode.ACCURATE.
from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode
from docling.datamodel.base_models import InputFormat
from docling.document_converter import PdfFormatOption

# Enhanced table processing for complex layouts
pipeline_options = PdfPipelineOptions()
pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE

# Create converter with enhanced table processing
converter_enhanced = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
}
)

result_enhanced = converter_enhanced.convert("https://arxiv.org/pdf/2408.09869")
doc_enhanced = result_enhanced.document

AI-Powered Content Understanding
AI enrichments enhance extracted content with semantic understanding. Picture descriptions, formula detection, and code parsing improve RAG accuracy by adding crucial context.
In the code below, we:

Set the do_picture_description option to True to enable picture description extraction
Set the picture_description_options option to use the SmolVLM-256M-Instruct model from Hugging Face.

from docling.datamodel.pipeline_options import PictureDescriptionVlmOptions

# AI-powered content enrichment
pipeline_options = PdfPipelineOptions(
do_picture_description=True, # AI-generated image descriptions
picture_description_options=PictureDescriptionVlmOptions(
repo_id="HuggingFaceTB/SmolVLM-256M-Instruct",
prompt="Describe this picture. Be precise and concise.",
),
generate_picture_images=True,
)

converter_enhanced = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
}
)

result_enhanced = converter_enhanced.convert("https://arxiv.org/pdf/2408.09869")
doc_enhanced = result_enhanced.document

Extract the picture description from the second picture:
second_picture = doc_enhanced.pictures[1]

print(f"Caption: {second_picture.caption_text(doc=doc_enhanced)}")

# Check for annotations
for annotation in second_picture.annotations:
print(annotation.text)

Caption: Figure 1: Sketch of Docling's default processing pipeline. The inner part of the model pipeline is easily customizable and extensible.
### Image Description

The image is a flowchart that depicts a sequence of steps from a document, likely a report or a document. The flowchart is structured with various elements such as text, icons, and arrows. Here is a detailed description of the flowchart:

#### Step 1: Parse
– **Description:** The first step in the process is to parse the document. This involves converting the text into a format that can be easily understood by the user.

#### Step 2: Ocr
– **Description:** The second step is to perform OCR (Optical Character Recognition) on the document. This involves converting the text into a format that can be easily read by the OCR software.

#### Step 3: Layout Analysis
– **Description:** The third step is to analyze the document's layout. This involves examining the document's structure, including the layout of the text, the alignment of the text, and the alignment of the document's content

Here is the original image:

The detailed description shows how Docling’s picture analysis transforms visual content into text that can be indexed and searched, making diagrams accessible to RAG systems.
Performance and Memory Management
Processing a large document can be time-consuming. To speed up the process, we can use:

The page_range option to process only a specific page range.
The max_num_pages option to limit the number of pages to process.
The images_scale option to reduce the image resolution for speed.
The generate_page_images option to skip page images to save memory.
The do_table_structure option to skip table structure extraction.
The enable_parallel_processing option to use multiple cores.

# Optimized for large documents
pipeline_options = PdfPipelineOptions(
max_num_pages=4, # Limit processing to first 4 pages
page_range=[1, 3], # Process specific page range
generate_page_images=False, # Skip page images to save memory
do_table_structure=False, # Skip table structure extraction
enable_parallel_processing=True # Use multiple cores
)

Building Your RAG Pipeline
We’ll build our RAG pipeline in five steps:

Document Processing: Use Docling to convert documents into structured data
Chunking: Break documents into smaller, searchable pieces
Create Embeddings: Convert text chunks into vector representations
Store in Vector Database: Save embeddings in FAISS for fast similarity search
Query: Retrieve relevant chunks and generate contextual responses

Tools for RAG Pipelines
Building RAG pipelines requires four essential tools:

Docling: converts documents into structured data
LangChain: manages document workflows, chain orchestration, and provides embedding models
FAISS: stores and retrieves document chunks

These tools work together to create complete RAG pipelines that can process, store, and retrieve document content intelligently.
LangChain
LangChain simplifies building AI applications by providing components for document loading, text processing, and chain orchestration. It integrates seamlessly with vector stores and language models.
For a comprehensive introduction to LangChain fundamentals and local AI workflows, see our LangChain and Ollama guide.
FAISS
FAISS (Facebook AI Similarity Search) is a library for efficient similarity search in high-dimensional spaces. It enables fast retrieval of the most relevant document chunks based on embedding similarity.
For production use cases requiring robust database integration, consider implementing semantic search with pgvector in PostgreSQL or using Pinecone for cloud-based vector search as alternatives to FAISS.
Let’s install the additional packages for RAG functionality:
# Install additional packages for RAG functionality
pip install docling sentence-transformers langchain-community langchain-huggingface faiss-cpu
# Note: Use faiss-gpu if you have CUDA support

Document Processing
Convert the document into structured data using Docling.
from docling.document_converter import DocumentConverter

# Initialize converter with default settings
converter = DocumentConverter()

# Convert the document into structured data
source_url = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source_url)

# Access structured data immediately
doc = result.document

Chunking
AI models have limited context windows that can’t process entire documents at once. Chunking solves this by breaking documents into smaller, searchable pieces that fit within these constraints. This improves retrieval accuracy by finding the most relevant sections rather than entire documents.
Docling provides two main chunking strategies:

HierarchicalChunker: Focuses purely on document structure, creating chunks based on headings and sections
HybridChunker: Combines structure-aware chunking with token-based limits, preserving document hierarchy while respecting model constraints

Let’s compare how these chunkers process the same document.
First, create a helper function to print the chunk content:
def print_chunk(chunk):
print(f"Chunk length: {len(chunk.text)} characters")
if len(chunk.text) > 30:
print(f"Chunk content: {chunk.text[:30]}…{chunk.text[-30:]}")
else:
print(f"Chunk content: {chunk.text}")
print("-" * 50)

Next, process the document with the HierarchicalChunker:
from docling.chunking import HierarchicalChunker

# Process with HierarchicalChunker (structure-based)
hierarchical_chunker = HierarchicalChunker()
hierarchical_chunks = list(hierarchical_chunker.chunk(doc))

print(f"HierarchicalChunker: {len(hierarchical_chunks)} chunks")

# Print the first 3 chunks
for chunk in hierarchical_chunks[:5]:
print_chunk(chunk)

HierarchicalChunker: 114 chunks
Chunk length: 11 characters
Chunk content: Version 1.0
————————————————–
Chunk length: 295 characters
Chunk content: Christoph Auer Maksym Lysak Ah… Kuropiatnyk Peter W. J. Staar
————————————————–
Chunk length: 50 characters
Chunk content: AI4K Group, IBM Research R¨ us…arch R¨ uschlikon, Switzerland
————————————————–
Chunk length: 431 characters
Chunk content: This technical report introduc…on of new features and models.
————————————————–
Chunk length: 792 characters
Chunk content: Converting PDF documents back … gap to proprietary solutions.
————————————————–

Compare this to the HybridChunker:
from docling.chunking import HybridChunker

# Process with HybridChunker (token-aware)
hybrid_chunker = HybridChunker(max_tokens=512, overlap_tokens=50)
hybrid_chunks = list(hybrid_chunker.chunk(doc))

print(f"HybridChunker: {len(hybrid_chunks)} chunks")

# Print the first 3 chunks
for chunk in hybrid_chunks[:5]:
print_chunk(chunk)

HybridChunker: 50 chunks
Chunk length: 358 characters
Chunk content: Version 1.0
Christoph Auer Mak…arch R¨ uschlikon, Switzerland
————————————————–
Chunk length: 431 characters
Chunk content: This technical report introduc…on of new features and models.
————————————————–
Chunk length: 1858 characters
Chunk content: Converting PDF documents back … accelerators (GPU, MPS, etc).
————————————————–
Chunk length: 1436 characters
Chunk content: To use Docling, you can simply…and run it inside a container.
————————————————–
Chunk length: 796 characters
Chunk content: Docling implements a linear pi…erialized to JSON or Markdown.
————————————————–

The comparison shows key differences:

HierarchicalChunker: Creates many small chunks by splitting at every section boundary
HybridChunker: Creates fewer, larger chunks by combining related sections within token limits

We will use HybridChunker because it respects document boundaries (won’t split tables inappropriately) while ensuring chunks fit within embedding model constraints.
from docling.chunking import HybridChunker

# Initialize the chunker
chunker = HybridChunker(max_tokens=512, overlap_tokens=50)

# Create the chunks
rag_chunks = list(chunker.chunk(doc))

print(f"Created {len(rag_chunks)} intelligent chunks")

Created 50 intelligent chunks

Creating a Vector Store
A vector store is a database that converts text into numerical vectors called embeddings. These vectors capture semantic meaning, allowing the system to find related content even when different words are used.
When you search for “document processing,” the vector store finds chunks about “PDF parsing” or “text extraction” because their embeddings are mathematically close. This enables semantic search beyond exact keyword matching.
Create the vector store for semantic search across your document chunks:
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings

# Create embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Create the vector store
texts = [chunk.text for chunk in rag_chunks]
vectorstore = FAISS.from_texts(texts, embeddings)

print(f"Built vector store with {len(texts)} chunks")

Built vector store with 50 chunks

Now you can search your knowledge base with semantic similarity:
# Search the knowledge base
query = "How does document processing work?"
relevant_docs = vectorstore.similarity_search(query, k=3)

print(f"Query: '{query}'")
print(f"Found {len(relevant_docs)} relevant chunks:")

for i, doc in enumerate(relevant_docs, 1):
print(f"\nResult {i}:")
print(f"Content: {doc.page_content[:150]}…")

Query: 'How does document processing work?'
Found 3 relevant chunks:

Result 1:
Content: Docling implements a linear pipeline of operations, which execute sequentially on each given document (see Fig. 1). Each document is first parsed by a…

Result 2:
Content: In the final pipeline stage, Docling assembles all prediction results produced on each page into a well-defined datatype that encapsulates a converted…

Result 3:
Content: Docling is designed to allow easy extension of the model library and pipelines. In the future, we plan to extend Docling with several more models, suc…

The search results show effective semantic retrieval. The vector store found relevant chunks about Docling’s architecture and design when searching for “document processing” – demonstrating how RAG systems match meaning, not just keywords.
Conclusion
This tutorial demonstrated building a robust document processing pipeline that handles complex, real-world documents. Your pipeline preserves critical elements like tables, mathematical formulas, and document structure while generating semantically meaningful chunks for retrieval-augmented generation systems.
The capability to transform any document format into AI-ready data using minimal code—at no cost—represents a significant advancement in document processing workflows. For enhanced reasoning capabilities in your RAG workflows, explore our guide on building data science workflows with DeepSeek and LangChain which combines advanced language models with document processing pipelines.
Favorite

Transform Any PDF into Searchable AI Data with Docling Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran