Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM
Machine Learning
Machine Learning
Machine Learning & AI
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Tips
Python Utilities
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Validating Polars DataFrames with Pandera

Table of Contents

Validating Polars DataFrames with Pandera

Pandera is a Python library that provides a simple and efficient way to validate pandas DataFrames. Recently, Pandera has added support for Polars, a fast and lightweight DataFrame library written in Rust. In this example, we will demonstrate how to use Pandera to validate Polars DataFrames.

New to Polars? Before diving into schema validation, check out my in-depth comparison Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames to understand how Polars differs from Pandas in terms of speed, architecture, and multi-core processing.

Defining a Schema

To validate a Polars DataFrame, we first need to define a schema using the pandera.polars module. A schema is a class that defines the structure and constraints of the DataFrame.

import pandera.polars as pa
import polars as pl

class Schema(pa.DataFrameModel):
    state: str = pa.Field(isin=["FL", "CA"])
    city: str
    price: int = pa.Field(in_range={"min_value": 5, "max_value": 20})

In this example, the schema defines three columns: state, city, and price. The price column has an additional constraint that its values must be between 5 and 20.

Validating a Polars DataFrame

Once we have defined the schema, we can validate a Polars DataFrame using the validate() method.

lf = pl.LazyFrame(
    {
        "state": ["FL", "FL", "FL", "CA", "CA", "CA"],
        "city": [
            "Orlando",
            "Miami",
            "Tampa",
            "San Francisco",
            "Los Angeles",
            "San Diego",
        ],
        "price": [8, 12, 10, 16, 20, 18],
    }
)
Schema.validate(lf).collect()
statecityprice
strstri64
“FL”“Orlando”8
“FL”“Miami”12
“FL”“Tampa”10
“CA”“San Francisco”16
“CA”“Los Angeles”20
“CA”“San Diego”18

The validate() method checks if the DataFrame conforms to the schema and returns a new DataFrame with the validated data.

Using the check_types() Decorator

Pandera also provides a check_types() decorator that can be used to validate Polars DataFrame function annotations at runtime.

from pandera.typing.polars import LazyFrame

@pa.check_types
def filter_state(lf: LazyFrame[Schema], state: str) -> LazyFrame[Schema]:
    return lf.filter(pl.col("state").eq(state))

filter_state(lf, "CA").collect()
statecityprice
strstri64
“CA”“San Francisco”16
“CA”“Los Angeles”20
“CA”“San Diego”18

In this example, the filter_state() function is decorated with @pa.check_types, which checks if the input and output DataFrames conform to the schema defined in the function annotations.

To better understand why Polars is an excellent choice for efficient data processing and how it compares to Pandas, check out my in-depth article: Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames.

Conclusion

Pandera provides a simple and efficient way to validate Polars DataFrames. By defining a schema and using the validate() method or the check_types() decorator, you can ensure that your DataFrames conform to a specific structure and set of constraints. This can help prevent errors and make your code more robust and maintainable.

Link to Pandera.

Want the full walkthrough?

Check out our in-depth guide on Polars vs Pandas: A Fast, Multi-Core Alternative for DataFrames

Leave a Comment

Your email address will not be published. Required fields are marked *

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran