What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

January 10, 2026

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

Introduction
Setup
Cleaner Column Operations with pd.col
Copy-on-Write Is Now the Default
A Dedicated String Dtype
Final Thoughts

Introduction

pandas 3.0 brings some of the most significant changes to the library in years. This article covers:

pd.col expressions: Cleaner column operations without lambdas
Copy-on-Write: Predictable copy behavior by default
PyArrow-backed strings: Faster operations and better type safety

💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

Setup

pandas 3.0 requires Python 3.11 or higher. Install it with:

pip install --upgrade pandas

To test these features before upgrading, enable them in pandas 2.3:

pd.options.future.infer_string = True
pd.options.mode.copy_on_write = True

Cleaner Column Operations with pd.col

The Traditional Approaches

If you’ve ever had to modify an existing column or create a new one, you may be used to one of these approaches.

Square-bracket notation is the most common way to add a column. You reference the new column name and assign the result:

import pandas as pd

df = pd.DataFrame({"temp_c": [0, 20, 30, 100]})
df['temp_f'] = df['temp_c'] * 9/5 + 32
df

	temp_c	temp_f
0	0	32.0
1	20	68.0
2	30	86.0
3	100	212.0

This overwrites your original DataFrame, which means you can’t compare before and after without first making a copy.

df_original = pd.DataFrame({"temp_c": [0, 20, 30]})
df_original['temp_f'] = df_original['temp_c'] * 9/5 + 32
# df_original is now modified - no way to see the original state
df_original

	temp_c	temp_f
0	0	32.0
1	20	68.0
2	30	86.0

It also doesn’t return anything, so you can’t chain it with other operations. Method-chaining lets you write df.assign(...).query(...).sort_values(...) in one expression instead of multiple separate statements.

df = pd.DataFrame({"temp_c": [0, 20, 30]})

# This doesn't work - square-bracket assignment returns None
# df['temp_f'] = df['temp_c'] * 9/5 + 32.query('temp_f > 50')

# You need separate statements instead
df['temp_f'] = df['temp_c'] * 9/5 + 32
df = df.query('temp_f > 50')
df

	temp_c	temp_f
1	20	68.0
2	30	86.0

Using assign solves the chaining problem by returning a new DataFrame instead of modifying in-place:

df = pd.DataFrame({"temp_c": [0, 20, 30, 100]})
df = (
    df.assign(temp_f=lambda x: x['temp_c'] * 9/5 + 32)
    .query('temp_f > 50')
)
df

	temp_c	temp_f
1	20	68.0
2	30	86.0
3	100	212.0

This works for chaining but relies on lambda functions. Lambda functions capture variables by reference, not by value, which can cause bugs:

df = pd.DataFrame({"x": [1, 2, 3]})
results = {}
for factor in [10, 20, 30]:
    results[f'x_times_{factor}'] = lambda df: df['x'] * factor

df = df.assign(**results)
df

	x	x_times_10	x_times_20	x_times_30
0	1	30	30	30
1	2	60	60	60
2	3	90	90	90

What went wrong: We expected x_times_10 to multiply by 10, x_times_20 by 20, and x_times_30 by 30. Instead, all three columns multiply by 30.

Why: Lambdas don’t save values, they save variable names. All three lambdas point to the same variable factor. After the loop ends, factor = 30. When assign() executes the lambdas, they all read factor and get 30.

The pandas 3.0 Solution: pd.col

pandas 3.0 introduces pd.col, which lets you reference columns without lambda functions. The syntax is borrowed from PySpark and Polars.

Here’s the temp_f conversion rewritten with pd.col:

df = pd.DataFrame({"temp_c": [0, 20, 30, 100]})
df = df.assign(temp_f=pd.col('temp_c') * 9/5 + 32)
df

	temp_c	temp_f
0	0	32.0
1	20	68.0
2	30	86.0
3	100	212.0

Unlike square-bracket notation, pd.col supports method-chaining. Unlike lambdas, it doesn’t capture variables by reference, so you avoid the scoping bugs shown earlier.

Remember the lambda scoping bug? With pd.col, each multiplier is captured correctly:

df = pd.DataFrame({"x": [1, 2, 3]})
results = {}
for factor in [10, 20, 30]:
    results[f'x_times_{factor}'] = pd.col('x') * factor

df = df.assign(**results)
df

	x	x_times_10	x_times_20	x_times_30
0	1	10	20	30
1	2	20	40	60
2	3	30	60	90

Filtering with Expressions

Traditional filtering repeats df twice:

df = pd.DataFrame({"temp_c": [-10, 0, 15, 25, 30]})
df = df.loc[df['temp_c'] >= 0]  # df appears twice
df

	temp_c
1	0
2	15
3	25
4	30

With pd.col, you reference the column directly:

df = pd.DataFrame({"temp_c": [-10, 0, 15, 25, 30]})
df = df.loc[pd.col('temp_c') >= 0]  # cleaner
df

	temp_c
1	0
2	15
3	25
4	30

Combining Multiple Columns

With lambdas, you need to repeat lambda x: x[...] for every column:

df = pd.DataFrame({
    "price": [100, 200, 150],
    "quantity": [2, 3, 4]
})

df = df.assign(
    total=lambda x: x["price"] * x["quantity"],
    discounted=lambda x: x["price"] * x["quantity"] * 0.9
)
df

	price	quantity	total	discounted
0	100	2	200	180.0
1	200	3	600	540.0
2	150	4	600	540.0

With pd.col, the same logic is more readable:

df = pd.DataFrame({
    "price": [100, 200, 150],
    "quantity": [2, 3, 4]
})

df = df.assign(
    total=pd.col("price") * pd.col("quantity"),
    discounted=pd.col("price") * pd.col("quantity") * 0.9
)
df

	price	quantity	total	discounted
0	100	2	200	180.0
1	200	3	600	540.0
2	150	4	600	540.0

Note that, unlike Polars and PySpark, pd.col cannot yet be used in groupby operations:

# This works in Polars: df.group_by("category").agg(pl.col("value").mean())
# But this doesn't work in pandas 3.0:
df.groupby("category").agg(pd.col("value").mean())  # Not supported yet

This limitation may be removed in future versions.

Copy-on-Write Is Now the Default

If you’ve used pandas, you’ve probably seen the SettingWithCopyWarning at some point. It appears when pandas can’t tell if you’re modifying a view or a copy of your data:

# This pattern caused confusion in pandas < 3.0
df2 = df[df["value"] > 10]
df2["status"] = "high"  # SettingWithCopyWarning!

Did this modify df or just df2? The answer depends on whether df2 is a view or a copy, and pandas can’t always predict which one it created. That’s what the warning is telling you.

pandas 3.0 makes the answer simple: filtering with df[...] always returns a copy. Modifying df2 never affects df.

This is called Copy-on-Write (CoW). If you just read df2, pandas shares memory with df. Only when you change df2 does pandas create a separate copy.

Now when you filter and modify, there’s no warning and no uncertainty:

df = pd.DataFrame({"value": [5, 15, 25], "status": ["low", "low", "low"]})

# pandas 3.0: just works, no warning
df2 = df[df["value"] > 10]
df2["status"] = "high"  # Modifies df2 only, not df

df2

	value	status
1	15	high
2	25	high

df

	value	status
0	5	low
1	15	low
2	25	low

We can see that df is unchanged and no warning was raised.

Breaking Change: Chained Assignment

One pattern that breaks is chained assignment. With CoW, df["foo"] is a copy, so assigning to it only modifies the copy and doesn’t modify the original:

# This NO LONGER modifies df in pandas 3.0:
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 6, 8]})

df["foo"][df["bar"] > 5] = 100
df

	foo	bar
0	1	4
1	2	6
2	3	8

Notice foo still contains [1, 2, 3]. This is because the value 100 was assigned to a copy that was immediately discarded.

Use .loc instead to modify the original DataFrame:

df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 6, 8]})
df.loc[df["bar"] > 5, "foo"] = 100
df

	foo	bar
0	1	4
1	100	6
2	100	8

A Dedicated String Dtype

pandas 2.x stores strings as object dtype, which is both slow and ambiguous. You can’t tell from the dtype alone whether a column is purely strings:

pd.options.future.infer_string = False  # pandas 2.x behavior

text = pd.Series(["hello", "world"])
messy = pd.Series(["hello", 42, {"key": "value"}])

print(f"text dtype: {text.dtype}")
print(f"messy dtype: {messy.dtype}")

text dtype: object
messy dtype: object

pandas 3.0 introduces a dedicated str dtype that only holds strings, making the type immediately clear:

pd.options.future.infer_string = True  # pandas 3.0 behavior

ser = pd.Series(["a", "b", "c"])
print(f"dtype: {ser.dtype}")

dtype: str

Performance Gains

The new string dtype is backed by PyArrow (if installed), which provides significant performance improvements:

String operations run 5-10x faster because PyArrow processes data in contiguous memory blocks instead of individual Python objects
Memory usage reduced by up to 50% since strings are stored in a compact binary format rather than as Python objects with overhead

Arrow Ecosystem Interoperability

DataFrames can be passed to Arrow-based tools like Polars and DuckDB without copying or converting data:

import polars as pl

pandas_df = pd.DataFrame({"name": ["alice", "bob", "charlie"]})
polars_df = pl.from_pandas(pandas_df)  # Zero-copy - data already in Arrow format
polars_df

	name
0	alice
1	bob
2	charlie

Final Thoughts

pandas 3.0 brings meaningful improvements to your daily workflow:

Write cleaner code with pd.col expressions instead of lambdas
Avoid SettingWithCopyWarning confusion with Copy-on-Write as the default
Get 5-10x faster string operations with the new PyArrow-backed str dtype
Pass DataFrames to Polars and DuckDB without data conversion

Related Resources

For more on DataFrame tools and performance optimization:

Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames – Compare pandas with Polars for performance-critical workflows
Scaling Pandas Workflows with PySpark’s Pandas API – Use familiar pandas syntax on distributed data
pandas vs Polars vs DuckDB: A Data Scientist’s Guide – Choose the right tool for your data analysis needs

💡 The expressions section was inspired by a blog post contributed by Marco Gorelli, Senior Software Engineer at Quansight Labs.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

PDF Table Extraction: Docling vs Marker vs LlamaParse Compared

March 7, 2026

Portable DataFrames in Python: When to Use Ibis, Narwhals, or Fugue

February 21, 2026

5 Python Tools for Structured LLM Outputs: A Practical Comparison

January 30, 2026

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

Khuyen Tran

Marco Gorelli

Table of Contents

Introduction

Stay Current with CodeCut

Setup

Cleaner Column Operations with pd.col

The Traditional Approaches

The pandas 3.0 Solution: pd.col

Filtering with Expressions

Combining Multiple Columns

Copy-on-Write Is Now the Default

Breaking Change: Chained Assignment

A Dedicated String Dtype

Performance Gains

Arrow Ecosystem Interoperability

Final Thoughts

Related Resources

Stay Current with CodeCut

Related Posts

Leave a Comment Cancel Reply

Drop a line

Get in touch

Follow Us on Social Media

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings

Khuyen Tran

Marco Gorelli

Table of Contents

Introduction

Stay Current with CodeCut

Setup

Cleaner Column Operations with pd.col

The Traditional Approaches

The pandas 3.0 Solution: pd.col

Filtering with Expressions

Combining Multiple Columns

Copy-on-Write Is Now the Default

Breaking Change: Chained Assignment

A Dedicated String Dtype

Performance Gains

Arrow Ecosystem Interoperability

Final Thoughts

Related Resources

Stay Current with CodeCut

Related Posts

Leave a Comment Cancel Reply

Work with Khuyen Tran

Work with Khuyen Tran