Turn Receipt Images into Spreadsheets with LlamaIndex

October 3, 2025

Turn Receipt Images into Spreadsheets with LlamaIndex

Khuyen Tran

Introduction
What You Will Learn
Introduction to LlamaIndex
Basic Image Processing with LlamaParse
Structured Data Extraction with Pydantic
Compare Extraction with Ground Truth
Process the Images for Better Extraction
Export Clean Data to CSV or Excel
Speed Up Processing with Async Parallel Execution
Try It Yourself
Conclusion and Next Steps

Introduction

Manual data entry from receipts, invoices, and contracts wastes hours and introduces errors. What if you could automatically extract structured data from these documents in minutes?

In this article, you’ll learn how to transform receipt images into structured data using LlamaIndex, then export the results to a spreadsheet for analysis.

💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!

What You Will Learn

Convert scanned receipts to structured data with LlamaParse and Pydantic models
Validate extraction accuracy by comparing results against ground truth annotations
Fix parsing errors by preprocessing low-quality images
Export clean receipt data to spreadsheet format

Introduction to LlamaIndex

LlamaIndex is a framework that connects LLMs with your data through three core capabilities:

Data ingestion: Built-in readers for PDFs, images, web pages, and databases that automatically parse content into processable nodes.
Structured extraction: LLM-powered conversion of unstructured text into Pydantic models with automatic validation.
Retrieval and indexing: Vector stores and semantic search that enable context-augmented queries over your documents.

It eliminates boilerplate code for loading, parsing, and querying data, letting you focus on building LLM applications.

The table below compares LlamaIndex with two other popular frameworks for LLM applications:

Framework	Purpose	Best For
LlamaIndex	Document ingestion and structured extraction	Converting unstructured documents into query-ready data
LangChain	LLM orchestration and tool integration	Building conversational agents with multiple LLM calls
LangGraph	Stateful workflow management	Coordinating long-running, multi-agent processes

Installation

Start with installing the required packages for this tutorial, including:

llama-index: Core LlamaIndex framework with base indexing and retrieval functionality
llama-parse: Document parsing service for PDFs, images, and complex layouts
llama-index-program-openai: OpenAI integration for structured data extraction with Pydantic
python-dotenv: Load environment variables from .env files
rapidfuzz: Fuzzy string matching library for comparing company names with minor variations

pip install llama-index llama-parse llama-index-program-openai python-dotenv rapidfuzz

Environment Setup

Create a .env file to store your API keys:

# .env
LLAMA_CLOUD_API_KEY="your-llama-parse-key"
OPENAI_API_KEY="your-openai-key"

Get your API keys from:

LlamaParse API: cloud.llamaindex.ai
OpenAI API: platform.openai.com/api-keys

Load the environment variables from the .env file with load_dotenv:

from dotenv import load_dotenv
import os

load_dotenv()

Configure the default LLM with Settings:

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.context_window = 8000

Settings stores global defaults so every query engine and program reuses the same LLM configuration. Keeping temperature at 0 nudges the model to return deterministic, structured outputs.

Basic Image Processing with LlamaParse

In this tutorial, we will use the SROIE Dataset v2 from Kaggle. This dataset contains real-world receipt scans from the ICDAR 2019 competition.

You can download the dataset directly from Kaggle’s website or use the Kaggle CLI:

# Install the Kaggle CLI once
uv pip install kaggle

# Configure Kaggle credentials (run once per environment)
export KAGGLE_USERNAME=your_username
export KAGGLE_KEY=your_api_key

# Create a workspace folder and download the full archive (~1 GB)
mkdir -p data
kaggle datasets download urbikn/sroie-datasetv2 -p data

# Extract everything and inspect a few image files
unzip -q -o data/sroie-datasetv2.zip -d data

This tutorial uses data from the data/SROIE2019/train/ directory, which contains:

img: Original receipt images
entities: Ground truth annotations for validation

Load the first 10 receipts into a list of paths:

from pathlib import Path

receipt_dir = Path("data/SROIE2019/train/img")
num_receipts = 10
receipt_paths = sorted(receipt_dir.glob("*.jpg"))[:num_receipts]

Take a look at the first receipt:

from IPython.display import Image

first_receipt_path = receipt_paths[0]
Image(filename=first_receipt_path)

Next, use LlamaParse to convert the first receipt into markdown.

from llama_parse import LlamaParse


# Parse receipts with LlamaParse
parser = LlamaParse(
    api_key=os.environ["LLAMA_CLOUD_API_KEY"],
    result_type="markdown",  # Output format
    num_workers=4,  # Number of parallel workers for faster processing
    language="en",  # Language hint for OCR accuracy
    skip_diagonal_text=True,  # Ignore rotated or diagonal text
)
first_receipt = parser.load_data(first_receipt_path)[0]

Preview the markdown for the first receipt:

# Preview the first receipt
preview = "\n".join(first_receipt.text.splitlines()[:10])
print(preview)

Output:

tan woon yann
BOOK TA K (TAMAN DAYA) SDN BHD
789417-W
NO.5: 55,57 & 59, JALAN SAGU 18,
TAMAN DaYA,
81100 JOHOR BAHRU,
JOHOR.

LlamaParse successfully converts receipt images to text, but there is no structure: vendor names, dates, and totals are all mixed together in plain text. This format is not ideal for exporting to spreadsheets or analytics tools for further analysis.

The next section uses Pydantic models to extract structured fields like company, total, and purchase_date automatically.

Structured Data Extraction with Pydantic

Pydantic is a Python library that uses type hints for data validation and automatic type conversion. By defining a receipt schema once, you can extract consistent structured data from receipts regardless of their format or layout.

Start by defining two Pydantic models that represent receipt structure:

from datetime import date
from typing import List, Optional
from pydantic import BaseModel, Field, ValidationInfo, model_validator


class ReceiptItem(BaseModel):
    """Represents a single line item extracted from a receipt."""

    description: str = Field(description="Item name exactly as shown on the receipt")
    quantity: int = Field(default=1, ge=1, description="Integer quantity of the item")
    unit_price: Optional[float] = Field(
        default=None, ge=0, description="Price per unit in the receipt currency"
    )
    discount_amount: float = Field(
        default=0.0, ge=0, description="Discount applied to this line item"
    )


class Receipt(BaseModel):
    """Structured fields extracted from a retail receipt."""

    company: str = Field(description="Business or merchant name")
    purchase_date: Optional[date] = Field(
        default=None, description="Date in YYYY-MM-DD format"
    )
    address: Optional[str] = Field(default=None, description="Address of the business")
    total: float = Field(description="Final charged amount")
    items: List[ReceiptItem] = Field(default_factory=list)

Create an OpenAIPydanticProgram that instructs the LLM to extract data according to our Receipt model:

from llama_index.program.openai import OpenAIPydanticProgram

prompt = """
You are extracting structured data from a receipt.
Use the provided text to populate the Receipt model.
Interpret every receipt date as day-first.
If a field is missing, return null.

{context_str}
"""

receipt_program = OpenAIPydanticProgram.from_defaults(
    output_cls=Receipt,
    llm=Settings.llm,
    prompt_template_str=prompt,
)

Process the first parsed document to make sure everything works before scaling to the full batch:

# Process the first receipt
structured_first_receipt = receipt_program(context_str=first_receipt.text)

# Print the receipt as a JSON string for better readability
print(structured_first_receipt.model_dump_json(indent=2))

Output:

{
  "company": "tan woon yann BOOK TA K (TAMAN DAYA) SDN BHD",
  "purchase_date": "2018-12-25",
  "address": "NO.5: 55,57 & 59, JALAN SAGU 18, TAMAN DaYA, 81100 JOHOR BAHRU, JOHOR.",
  "total": 9.0,
  "items": [
    {
      "description": "KF MODELLING CLAY KIDDY FISH",
      "quantity": 1,
      "unit_price": 9.0,
      "discount_amount": 0.0
    }
  ]
}

LlamaIndex populates the Pydantic schema with extracted values:

company: Vendor name from the receipt header
purchase_date: Parsed date (2018-12-25)
total: Final amount (9.0)
items: Line items with description, quantity, and price

Now that the extraction works, let’s scale it to process all receipts in a batch. The function uses each receipt’s filename as a unique identifier:

def extract_documents(paths: List[str], prompt: str, id_column: str = "receipt_id") -> List[dict]:
    """Extract structured data from documents using LlamaParse and LLM."""
    results: List[dict] = []

    # Initialize parser with OCR settings
    parser = LlamaParse(
        api_key=os.environ["LLAMA_CLOUD_API_KEY"],
        result_type="markdown",
        num_workers=4,
        language="en",
        skip_diagonal_text=True,
    )

    # Convert images to markdown text
    documents = parser.load_data(paths)

    # Create structured extraction program
    program = OpenAIPydanticProgram.from_defaults(
        output_cls=Receipt,
        llm=Settings.llm,
        prompt_template_str=prompt,
    )

    # Extract structured data from each document
    for path, doc in zip(paths, documents):
        document_id = Path(path).stem
        parsed_document = program(context_str=doc.text)
        results.append(
            {
                id_column: document_id,
                "data": parsed_document,
            }
        )
    return results

# Extract structured data from all receipts
structured_receipts = extract_documents(receipt_paths, prompt)

Convert the extracted receipts into a DataFrame for easier inspection:

import pandas as pd


def transform_receipt_columns(df: pd.DataFrame) -> pd.DataFrame:
    """Apply standard transformations to receipt DataFrame columns."""
    df = df.copy()
    df["company"] = df["company"].str.upper()
    df["total"] = pd.to_numeric(df["total"], errors="coerce")
    df["purchase_date"] = pd.to_datetime(
        df["purchase_date"], errors="coerce", dayfirst=True
    ).dt.date
    return df


def create_extracted_df(records: List[dict], id_column: str = "receipt_id") -> pd.DataFrame:
    df = pd.DataFrame(
        [
            {
                id_column: record[id_column],
                "company": record["data"].company,
                "total": record["data"].total,
                "purchase_date": record["data"].purchase_date,
            }
            for record in records
        ]
    )
    return transform_receipt_columns(df)


extracted_df = create_extracted_df(structured_receipts)
extracted_df

	receipt_id	company	total	purchase_date
0	X00016469612	TAN WOON YANN BOOK TA K (TAMAN DAYA) SDN BHD	9	2018-12-25
1	X00016469619	INDAH GIFT & HOME DECO	60.3	2018-10-19
2	X00016469620	MR D.I.Y. (JOHOR) SDN BHD	33.9	2019-01-12
3	X00016469622	YONGFATT ENTERPRISE	80.9	2018-12-25
4	X00016469623	MR D.I.Y. (M) SDN BHD	30.9	2018-11-18
5	X00016469669	ABC HO TRADING	31	2019-01-09
6	X00016469672	SOON HUAT MACHINERY ENTERPRISE	327	2019-01-11
7	X00016469676	S.H.H. MOTOR (SUNGAI RENGIT SN. BHD. (801580-T)	20	2019-01-23
8	X51005200938	TH MNAN	0	2023-10-11
9	X51005230617	GERBANG ALAF RESTAURANTS SDN BHD	26.6	2018-01-18

Most receipts are extracted correctly, but receipt X51005200938 shows issues:

The company name is incomplete (“TH MNAN”)
Total is 0 instead of the actual amount
Date (2023-10-11) appears incorrect

Compare Extraction with Ground Truth

To verify the extraction accuracy, load the ground-truth annotations from data/SROIE2019/train/entities:

def normalize_date(value: str) -> str:
    """Normalize date strings to consistent format."""
    value = (value or "").strip()
    if not value:
        return value
    # Convert hyphens to slashes
    value = value.replace("-", "/")
    parts = value.split("/")
    # Convert 2-digit years to 4-digit (e.g., 18 -> 2018)
    if len(parts[-1]) == 2:
        parts[-1] = f"20{parts[-1]}"
    return "/".join(parts)


def create_ground_truth_df(
    label_paths: List[str], id_column: str = "receipt_id"
) -> pd.DataFrame:
    """Create ground truth DataFrame from label JSON files."""
    records = []
    # Load each JSON file and extract key fields
    for path in label_paths:
        payload = pd.read_json(Path(path), typ="series").to_dict()
        records.append(
            {
                id_column: Path(path).stem,
                "company": payload.get("company"),
                "total": payload.get("total"),
                "purchase_date": normalize_date(payload.get("date")),
            }
        )

    df = pd.DataFrame(records)
    # Apply same transformations as extracted data
    return transform_receipt_columns(df)


# Load ground truth annotations
label_dir = Path("data/SROIE2019/train/entities")
label_paths = sorted(label_dir.glob("*.txt"))[:num_receipts]

ground_truth_df = create_ground_truth_df(label_paths)
ground_truth_df

	receipt_id	company	total	purchase_date
0	X00016469612	BOOK TA .K (TAMAN DAYA) SDN BHD	9	2018-12-25
1	X00016469619	INDAH GIFT & HOME DECO	60.3	2018-10-19
2	X00016469620	MR D.I.Y. (JOHOR) SDN BHD	33.9	2019-01-12
3	X00016469622	YONGFATT ENTERPRISE	80.9	2018-12-25
4	X00016469623	MR D.I.Y. (M) SDN BHD	30.9	2018-11-18
5	X00016469669	ABC HO TRADING	31	2019-01-09
6	X00016469672	SOON HUAT MACHINERY ENTERPRISE	327	2019-01-11
7	X00016469676	S.H.H. MOTOR (SUNGAI RENGIT) SDN. BHD.	20	2019-01-23
8	X51005200938	PERNIAGAAN ZHENG HUI	112.45	2018-02-12
9	X51005230617	GERBANG ALAF RESTAURANTS SDN BHD	26.6	2018-01-18

Let’s validate extraction accuracy by comparing results against ground truth.

Company names often have minor variations (spacing, punctuation, extra characters), so we’ll use fuzzy matching to tolerate these formatting differences.

from rapidfuzz import fuzz


def fuzzy_match_score(text1: str, text2: str) -> int:
    """Calculate fuzzy match score between two strings."""
    return fuzz.token_set_ratio(str(text1), str(text2))

Test the fuzzy matching with sample company names:

# Nearly identical strings score high
print(f"Score: {fuzzy_match_score('BOOK TA K SDN BHD', 'BOOK TA .K SDN BHD'):.2f}")

# Different punctuation still matches well
print(f"Score: {fuzzy_match_score('MR D.I.Y. JOHOR', 'MR DIY JOHOR'):.2f}")

# Completely different strings score low
print(f"Score: {fuzzy_match_score('ABC TRADING', 'XYZ COMPANY'):.2f}")

Output:

Score: 97.14
Score: 55.17
Score: 27.27

Now build a comparison function that merges extracted and ground truth data, then applies fuzzy matching for company names and exact matching for numeric fields:

def compare_receipts(
    extracted_df: pd.DataFrame,
    ground_truth_df: pd.DataFrame,
    id_column: str,
    fuzzy_match_cols: List[str],
    exact_match_cols: List[str],
    fuzzy_threshold: int = 80,
) -> pd.DataFrame:
    """Compare extracted and ground truth data with explicit column specifications."""
    comparison_df = extracted_df.merge(
        ground_truth_df,
        on=id_column,
        how="inner",
        suffixes=("_extracted", "_truth"),
    )

    # Fuzzy matching
    for col in fuzzy_match_cols:
        extracted_col = f"{col}_extracted"
        truth_col = f"{col}_truth"
        comparison_df[f"{col}_score"] = comparison_df.apply(
            lambda row: fuzzy_match_score(row[extracted_col], row[truth_col]),
            axis=1,
        )
        comparison_df[f"{col}_match"] = comparison_df[f"{col}_score"] >= fuzzy_threshold

    # Exact matching
    for col in exact_match_cols:
        extracted_col = f"{col}_extracted"
        truth_col = f"{col}_truth"
        comparison_df[f"{col}_match"] = (
            comparison_df[extracted_col] == comparison_df[truth_col]
        )

    return comparison_df


comparison_df = compare_receipts(
    extracted_df,
    ground_truth_df,
    id_column="receipt_id",
    fuzzy_match_cols=["company"],
    exact_match_cols=["total", "purchase_date"],
)

Inspect any rows where the company, total, or purchase-date checks fail:

def get_mismatch_rows(comparison_df: pd.DataFrame) -> pd.DataFrame:
    """Get mismatched rows, excluding match indicator columns."""
    # Extract match columns and data columns
    match_columns = [col for col in comparison_df.columns if col.endswith("_match")]
    data_columns = sorted([col for col in comparison_df.columns if col.endswith("_extracted") or col.endswith("_truth")])

    # Check for rows where not all matches are True
    has_mismatch = comparison_df[match_columns].all(axis=1).eq(False)

    return comparison_df[has_mismatch][data_columns]


mismatch_df = get_mismatch_rows(comparison_df)


mismatch_df

	company_extracted	company_truth	purchase_date_extracted	purchase_date_truth	total_extracted	total_truth
8	TH MNAN	PERNIAGAAN ZHENG HUI	2023-10-11	2018-02-12	0	112.45

This confirms what we saw earlier. All receipts match the ground truth annotations except for receipt ID X51005200938 for the following fields:

Company name
Total
Purchase date

Let’s take a closer look at this receipt to see if we can identify the issue.

import IPython.display as display

file_to_inspect = receipt_dir / "X51005200938.jpg"

display.Image(filename=file_to_inspect)

This receipt appears smaller than the others in the dataset, which may affect OCR readability. In the next section, we will scale up the receipt to improve the extraction.

Process the Images for Better Extraction

Create a function to scale up the receipt:

from PIL import Image


def scale_image(image_path: Path, output_dir: Path, scale_factor: int = 3) -> Path:
    """Scale up an image using high-quality resampling.

    Args:
        image_path: Path to the original image
        output_dir: Directory to save the scaled image
        scale_factor: Factor to scale up the image (default: 3x)

    Returns:
        Path to the scaled image
    """
    # Load the image
    img = Image.open(image_path)

    # Scale up the image using high-quality resampling
    new_size = (img.width * scale_factor, img.height * scale_factor)
    img_resized = img.resize(new_size, Image.Resampling.LANCZOS)

    # Save to output directory with same filename
    output_dir.mkdir(parents=True, exist_ok=True)
    output_path = output_dir / image_path.name
    img_resized.save(output_path, quality=95)

    return output_path

Apply the function to the problematic receipt:

problematic_receipt_path = receipt_dir / "X51005200938.jpg"
adjusted_receipt_dir = Path("data/SROIE2019/train/img_adjusted")

scaled_image_path = scale_image(problematic_receipt_path, adjusted_receipt_dir, scale_factor=3)

Let’s extract the structured data from the scaled image:

problematic_structured_receipts = extract_documents([scaled_image_path], prompt)
problematic_extracted_df = create_extracted_df(problematic_structured_receipts)

problematic_extracted_df

	receipt_id	company	total	purchase_date
0	X51005200938	PERNIAGAAN ZHENG HUI	112.46	2018-02-12

Nice! Scaling fixes the extraction. Company name and purchase date are now accurate. The total is 112.46 vs 112.45, acceptable since 112.45 actually looks like 112.46 when printed on the receipt.

Export Clean Data to CSV or Excel

Apply the scaling fix to all receipts. Copy the remaining images to the processed directory, excluding the already-scaled receipt:

import shutil

clean_receipt_paths = [scaled_image_path]
# Copy all receipts except the already processed one
for receipt_path in receipt_paths:
    if receipt_path != problematic_receipt_path:  # Skip the already scaled image
        output_path = adjusted_receipt_dir / receipt_path.name
        shutil.copy2(receipt_path, output_path)
        clean_receipt_paths.append(output_path)
        print(f"Copied {receipt_path.name}")

Let’s run the pipeline again with the processed images:

clean_structured_receipts = extract_documents(clean_receipt_paths, prompt)
clean_extracted_df = create_extracted_df(clean_structured_receipts)
clean_extracted_df

	receipt_id	company	total	purchase_date
0	X51005200938	PERNIAGAAN ZHENG HUI	112.46	2018-02-12
1	X00016469612	TAN WOON YANN	9	2018-12-25
2	X00016469619	INDAH GIFT & HOME DECO	60.3	2018-10-19
3	X00016469620	MR D.I.Y. (JOHOR) SDN BHD	33.9	2019-01-12
4	X00016469622	YONGFATT ENTERPRISE	80.9	2018-12-25
5	X00016469623	MR D.I.Y. (M) SDN BHD	30.9	2018-11-18
6	X00016469669	ABC HO TRADING	31	2019-01-09
7	X00016469672	SOON HUAT MACHINERY ENTERPRISE	327	2019-01-11
8	X00016469676	S.H.H. MOTOR (SUNGAI RENGIT SN. BHD. (801580-T)	20	2019-01-23
9	X51005230617	GERBANG ALAF RESTAURANTS SDN BHD	26.6	2018-01-18

Awesome! All receipts now match the ground truth annotations.

Now we can export the dataset to a spreadsheet with just a few lines of code:

import pandas as pd

# Export to CSV
output_path = Path("reports/receipts.csv")
output_path.parent.mkdir(parents=True, exist_ok=True)
clean_extracted_df.to_csv(output_path, index=False)
print(f"Exported {len(clean_extracted_df)} receipts to {output_path}")

Output:

Exported 10 receipts to reports/receipts.csv

The exported data can now be imported into spreadsheet applications, analytics tools, or business intelligence platforms.

Speed Up Processing with Async Parallel Execution

LlamaIndex supports asynchronous processing to handle multiple receipts concurrently. By using async/await with the aget_nodes_from_documents() method, you can process receipts in parallel instead of sequentially, significantly reducing total processing time.

Here’s how to modify the extraction function to use async processing. Setting num_workers=10 means the parser will process up to 10 receipts concurrently:

import asyncio


async def extract_documents_async(
    paths: List[str], prompt: str, id_column: str = "receipt_id"
) -> List[dict]:
    """Extract structured data from documents using async LlamaParse."""
    results: List[dict] = []

    parser = LlamaParse(
        api_key=os.environ["LLAMA_CLOUD_API_KEY"],
        result_type="markdown",
        num_workers=10,  # Process 10 receipts concurrently
        language="en",
        skip_diagonal_text=True,
    )

    # Use async method for parallel processing
    documents = await parser.aload_data(paths)

    program = OpenAIPydanticProgram.from_defaults(
        output_cls=Receipt,
        llm=Settings.llm,
        prompt_template_str=prompt,
    )

    for path, doc in zip(paths, documents):
        document_id = Path(path).stem
        parsed_document = program(context_str=doc.text)
        results.append({id_column: document_id, "data": parsed_document})

    return results


# Run with asyncio
structured_receipts = await extract_documents_async(receipt_paths, prompt)

See the LlamaIndex async documentation for more details.

Try It Yourself

The concepts from this tutorial are available as a reusable pipeline in this GitHub repository. The code includes both synchronous and asynchronous versions:

Synchronous pipelines (simple, sequential processing):

Generic pipeline (document_extraction_pipeline.py): Reusable extraction function that works with any Pydantic schema
Receipt pipeline (extract_receipts_pipeline.py): Complete example with Receipt schema, image scaling, and data transformations

Asynchronous pipelines (parallel processing with 3-10x speedup):

Async generic pipeline (async_document_extraction_pipeline.py): Concurrent document processing
Async receipt pipeline (async_extract_receipts_pipeline.py): Batch receipt processing with progress tracking

Run the receipt extraction example:

# Synchronous version (simple, sequential)
uv run extract_receipts_pipeline.py

# Asynchronous version (parallel processing, 3-10x faster)
uv run async_extract_receipts_pipeline.py

Or create your own extractor by importing extract_structured_data() and providing your custom Pydantic schema, extraction prompt, and optional preprocessing functions.

Learn production-ready practices for data science and AI projects in Production-Ready Data Science.

Conclusion and Next Steps

This tutorial demonstrated how LlamaIndex automates receipt data extraction with minimal code. You converted scanned images to structured data, validated results against ground truth, and exported a clean CSV ready for analysis.

Here are some ideas to enhance this receipt extraction pipeline:

Richer schemas: Add nested Pydantic models for vendor details, payment methods, and itemized line items
Validation rules: Flag outliers like totals over $500 or future dates for manual review
Multi-stage workflows: Create custom workflows that combine image preprocessing, extraction, validation, and export steps with error handling

Coiled: Scale Python Data Pipeline to the Cloud in Minutes

November 25, 2025

What’s New in PySpark 4.0: Arrow UDFs, Native Visualization, and Dynamic UDTFs

November 23, 2025

Manim: Create Mathematical Animations Like 3Blue1Brown Using Python

November 16, 2025

Turn Receipt Images into Spreadsheets with LlamaIndex

Turn Receipt Images into Spreadsheets with LlamaIndex

Khuyen Tran

Table of Contents

Introduction

What You Will Learn

Introduction to LlamaIndex

Installation

Environment Setup

Basic Image Processing with LlamaParse

Structured Data Extraction with Pydantic

Compare Extraction with Ground Truth

Process the Images for Better Extraction

Export Clean Data to CSV or Excel

Speed Up Processing with Async Parallel Execution

Try It Yourself

Conclusion and Next Steps

Related Posts

Leave a Comment Cancel Reply

Drop a line

Get in touch

Follow Us on Social Media

Turn Receipt Images into Spreadsheets with LlamaIndex

Turn Receipt Images into Spreadsheets with LlamaIndex

Khuyen Tran

Table of Contents

Introduction

What You Will Learn

Introduction to LlamaIndex

Installation

Environment Setup

Basic Image Processing with LlamaParse

Structured Data Extraction with Pydantic

Compare Extraction with Ground Truth

Process the Images for Better Extraction

Export Clean Data to CSV or Excel

Speed Up Processing with Async Parallel Execution

Try It Yourself

Conclusion and Next Steps

Related Posts

Leave a Comment Cancel Reply

Work with Khuyen Tran

Work with Khuyen Tran