What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

January 10, 2026

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

Introduction
Setup
Cleaner Column Operations with pd.col
Copy-on-Write Is Now the Default
A Dedicated String Dtype
Final Thoughts

Introduction

pandas 3.0 brings some of the most significant changes to the library in years. This article covers:

pd.col expressions: Cleaner column operations without lambdas
Copy-on-Write: Predictable copy behavior by default
PyArrow-backed strings: Faster operations and better type safety

💻 Get the Code: Open the notebook in Google Colab to run it in your browser, or grab the source from GitHub.

Stay Current with CodeCut

Easy-to-digest articles on Python, AI, and open-source tools. Delivered twice a week.

Setup

pandas 3.0 requires Python 3.11 or higher. Install it with:

pip install --upgrade pandas

To test these features before upgrading, enable them in pandas 2.3:

pd.options.future.infer_string = True
pd.options.mode.copy_on_write = True

Cleaner Column Operations with pd.col

The Traditional Approaches

If you’ve ever had to modify an existing column or create a new one, you may be used to one of these approaches.

Square-bracket notation is the most common way to add a column. You reference the new column name and assign the result:

import pandas as pd

df = pd.DataFrame({"temp_c": [0, 20, 30, 100]})
df['temp_f'] = df['temp_c'] * 9/5 + 32
df

	temp_c	temp_f
0	0	32.0
1	20	68.0
2	30	86.0
3	100	212.0

This overwrites your original DataFrame, which means you can’t compare before and after without first making a copy.

df_original = pd.DataFrame({"temp_c": [0, 20, 30]})
df_original['temp_f'] = df_original['temp_c'] * 9/5 + 32
# df_original is now modified - no way to see the original state
df_original

	temp_c	temp_f
0	0	32.0
1	20	68.0
2	30	86.0

It also doesn’t return anything, so you can’t chain it with other operations. Method-chaining lets you write df.assign(...).query(...).sort_values(...) in one expression instead of multiple separate statements.

df = pd.DataFrame({"temp_c": [0, 20, 30]})

# This doesn't work - square-bracket assignment returns None
# df['temp_f'] = df['temp_c'] * 9/5 + 32.query('temp_f > 50')

# You need separate statements instead
df['temp_f'] = df['temp_c'] * 9/5 + 32
df = df.query('temp_f > 50')
df

	temp_c	temp_f
1	20	68.0
2	30	86.0

Using assign solves the chaining problem by returning a new DataFrame instead of modifying in-place:

df = pd.DataFrame({"temp_c": [0, 20, 30, 100]})
df = (
    df.assign(temp_f=lambda x: x['temp_c'] * 9/5 + 32)
    .query('temp_f > 50')
)
df

	temp_c	temp_f
1	20	68.0
2	30	86.0
3	100	212.0

This works for chaining but relies on lambda functions. Lambda functions capture variables by reference, not by value, which can cause bugs:

df = pd.DataFrame({"x": [1, 2, 3]})
results = {}
for factor in [10, 20, 30]:
    results[f'x_times_{factor}'] = lambda df: df['x'] * factor

df = df.assign(**results)
df

	x	x_times_10	x_times_20	x_times_30
0	1	30	30	30
1	2	60	60	60
2	3	90	90	90

What went wrong: We expected x_times_10 to multiply by 10, x_times_20 by 20, and x_times_30 by 30. Instead, all three columns multiply by 30.

Why: Lambdas don’t save values, they save variable names. All three lambdas point to the same variable factor. After the loop ends, factor = 30. When assign() executes the lambdas, they all read factor and get 30.

The pandas 3.0 Solution: pd.col

pandas 3.0 introduces pd.col, which lets you reference columns without lambda functions. The syntax is borrowed from PySpark and Polars.

Here’s the temp_f conversion rewritten with pd.col:

df = pd.DataFrame({"temp_c": [0, 20, 30, 100]})
df = df.assign(temp_f=pd.col('temp_c') * 9/5 + 32)
df

	temp_c	temp_f
0	0	32.0
1	20	68.0
2	30	86.0
3	100	212.0

Unlike square-bracket notation, pd.col supports method-chaining. Unlike lambdas, it doesn’t capture variables by reference, so you avoid the scoping bugs shown earlier.

Remember the lambda scoping bug? With pd.col, each multiplier is captured correctly:

df = pd.DataFrame({"x": [1, 2, 3]})
results = {}
for factor in [10, 20, 30]:
    results[f'x_times_{factor}'] = pd.col('x') * factor

df = df.assign(**results)
df

	x	x_times_10	x_times_20	x_times_30
0	1	10	20	30
1	2	20	40	60
2	3	30	60	90

Filtering with Expressions

Traditional filtering repeats df twice:

df = pd.DataFrame({"temp_c": [-10, 0, 15, 25, 30]})
df = df.loc[df['temp_c'] >= 0]  # df appears twice
df

	temp_c
1	0
2	15
3	25
4	30

With pd.col, you reference the column directly:

df = pd.DataFrame({"temp_c": [-10, 0, 15, 25, 30]})
df = df.loc[pd.col('temp_c') >= 0]  # cleaner
df

	temp_c
1	0
2	15
3	25
4	30

Combining Multiple Columns

With lambdas, you need to repeat lambda x: x[...] for every column:

df = pd.DataFrame({
    "price": [100, 200, 150],
    "quantity": [2, 3, 4]
})

df = df.assign(
    total=lambda x: x["price"] * x["quantity"],
    discounted=lambda x: x["price"] * x["quantity"] * 0.9
)
df

	price	quantity	total	discounted
0	100	2	200	180.0
1	200	3	600	540.0
2	150	4	600	540.0

With pd.col, the same logic is more readable:

df = pd.DataFrame({
    "price": [100, 200, 150],
    "quantity": [2, 3, 4]
})

df = df.assign(
    total=pd.col("price") * pd.col("quantity"),
    discounted=pd.col("price") * pd.col("quantity") * 0.9
)
df

	price	quantity	total	discounted
0	100	2	200	180.0
1	200	3	600	540.0
2	150	4	600	540.0

Note that, unlike Polars and PySpark, pd.col cannot yet be used in groupby operations:

# This works in Polars: df.group_by("category").agg(pl.col("value").mean())
# But this doesn't work in pandas 3.0:
df.groupby("category").agg(pd.col("value").mean())  # Not supported yet

This limitation may be removed in future versions.

Copy-on-Write Is Now the Default

If you’ve used pandas, you’ve probably seen the SettingWithCopyWarning at some point. It appears when pandas can’t tell if you’re modifying a view or a copy of your data:

# This pattern caused confusion in pandas < 3.0
df2 = df[df["value"] > 10]
df2["status"] = "high"  # SettingWithCopyWarning!

Did this modify df or just df2? The answer depends on whether df2 is a view or a copy, and pandas can’t always predict which one it created. That’s what the warning is telling you.

pandas 3.0 makes the answer simple: filtering with df[...] always returns a copy. Modifying df2 never affects df.

This is called Copy-on-Write (CoW). If you just read df2, pandas shares memory with df. Only when you change df2 does pandas create a separate copy.

Now when you filter and modify, there’s no warning and no uncertainty:

df = pd.DataFrame({"value": [5, 15, 25], "status": ["low", "low", "low"]})

# pandas 3.0: just works, no warning
df2 = df[df["value"] > 10]
df2["status"] = "high"  # Modifies df2 only, not df

df2

	value	status
1	15	high
2	25	high

df

	value	status
0	5	low
1	15	low
2	25	low

We can see that df is unchanged and no warning was raised.

Breaking Change: Chained Assignment

One pattern that breaks is chained assignment. With CoW, df["foo"] is a copy, so assigning to it only modifies the copy and doesn’t modify the original:

# This NO LONGER modifies df in pandas 3.0:
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 6, 8]})

df["foo"][df["bar"] > 5] = 100
df

	foo	bar
0	1	4
1	2	6
2	3	8

Notice foo still contains [1, 2, 3]. This is because the value 100 was assigned to a copy that was immediately discarded.

Use .loc instead to modify the original DataFrame:

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 6, 8]})
df.loc[df["bar"] > 5, "foo"] = 100
df

	foo	bar
0	1	4
1	100	6
2	100	8

A Dedicated String Dtype

pandas 2.x stores strings as object dtype, which is both slow and ambiguous. You can’t tell from the dtype alone whether a column is purely strings:

pd.options.future.infer_string = False  # pandas 2.x behavior

text = pd.Series(["hello", "world"])
messy = pd.Series(["hello", 42, {"key": "value"}])

print(f"text dtype: {text.dtype}")
print(f"messy dtype: {messy.dtype}")

text dtype: object
messy dtype: object

pandas 3.0 introduces a dedicated str dtype that only holds strings, making the type immediately clear:

pd.options.future.infer_string = True  # pandas 3.0 behavior

ser = pd.Series(["a", "b", "c"])
print(f"dtype: {ser.dtype}")

dtype: str

Performance Gains

The new string dtype is backed by PyArrow (if installed), which provides significant performance improvements:

String operations run 5-10x faster because PyArrow processes data in contiguous memory blocks instead of individual Python objects
Memory usage reduced by up to 50% since strings are stored in a compact binary format rather than as Python objects with overhead

Arrow Ecosystem Interoperability

DataFrames can be passed to Arrow-based tools like Polars and DuckDB without copying or converting data:

import polars as pl

pandas_df = pd.DataFrame({"name": ["alice", "bob", "charlie"]})
polars_df = pl.from_pandas(pandas_df)  # Zero-copy - data already in Arrow format
polars_df

	name
0	alice
1	bob
2	charlie

Final Thoughts

pandas 3.0 brings meaningful improvements to your daily workflow:

Write cleaner code with pd.col expressions instead of lambdas
Avoid SettingWithCopyWarning confusion with Copy-on-Write as the default
Get 5-10x faster string operations with the new PyArrow-backed str dtype
Pass DataFrames to Polars and DuckDB without data conversion

Related Resources

For more on DataFrame tools and performance optimization:

Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames – Compare pandas with Polars for performance-critical workflows
Scaling Pandas Workflows with PySpark’s Pandas API – Use familiar pandas syntax on distributed data
pandas vs Polars vs DuckDB: A Data Scientist’s Guide – Choose the right tool for your data analysis needs

💡 The expressions section was inspired by a blog post contributed by Marco Gorelli, Senior Software Engineer at Quansight Labs.

Stay Current with CodeCut

Easy-to-digest articles on Python, AI, and open-source tools. Delivered twice a week.

Hermes Agent Can Write Its Own Skills. I Tested How Well It Works

July 17, 2026

Build a Private Email Q&A System with Local and Cloud LLMs

July 8, 2026

Stop Hand-Tuning Prompts: Auto-Optimize an LLM Classifier with DSPy

June 11, 2026

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

Khuyen Tran

Marco Gorelli

Table of Contents

Introduction

Stay Current with CodeCut

Setup

Cleaner Column Operations with pd.col

The Traditional Approaches

The pandas 3.0 Solution: pd.col

Filtering with Expressions

Combining Multiple Columns

Copy-on-Write Is Now the Default

Breaking Change: Chained Assignment

A Dedicated String Dtype

Performance Gains

Arrow Ecosystem Interoperability

Final Thoughts

Related Resources

Stay Current with CodeCut

Related Posts

Leave a Comment Cancel Reply

Get in touch

Join the Newsletter

Follow Us on Social Media

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

Khuyen Tran

Marco Gorelli

Table of Contents

Introduction

Stay Current with CodeCut

Setup

Cleaner Column Operations with pd.col

The Traditional Approaches

The pandas 3.0 Solution: pd.col

Filtering with Expressions

Combining Multiple Columns

Copy-on-Write Is Now the Default

Breaking Change: Chained Assignment

A Dedicated String Dtype

Performance Gains

Arrow Ecosystem Interoperability

Final Thoughts

Related Resources

Stay Current with CodeCut

Related Posts

Leave a Comment Cancel Reply

Work with Khuyen Tran

Work with Khuyen Tran