What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings
Table of Contents
Introduction
Setup
Cleaner Column Operations with pd.col
Copy-on-Write Is Now the Default
A Dedicated String Dtype
Final Thoughts
Introduction
pandas 3.0 brings some of the most significant changes to the library in years. This article covers:
pd.col expressions: Cleaner column operations without lambdas
Copy-on-Write: Predictable copy behavior by default
PyArrow-backed strings: Faster operations and better type safety
💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!
Stay Current with CodeCut
Easy-to-digest articles on Python, AI, and open-source tools. Delivered twice a week.
.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
/* WordPress dark-theme overrides */
.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}
/* Mobile responsive */
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}
Subscribe
Setup
pandas 3.0 requires Python 3.11 or higher. Install it with:
pip install –upgrade pandas
To test these features before upgrading, enable them in pandas 2.3:
pd.options.future.infer_string = True
pd.options.mode.copy_on_write = True
Cleaner Column Operations with pd.col
The Traditional Approaches
If you’ve ever had to modify an existing column or create a new one, you may be used to one of these approaches.
Square-bracket notation is the most common way to add a column. You reference the new column name and assign the result:
import pandas as pd
df = pd.DataFrame({"temp_c": [0, 20, 30, 100]})
df['temp_f'] = df['temp_c'] * 9/5 + 32
df
temp_c
temp_f
0
0
32.0
1
20
68.0
2
30
86.0
3
100
212.0
This overwrites your original DataFrame, which means you can’t compare before and after without first making a copy.
df_original = pd.DataFrame({"temp_c": [0, 20, 30]})
df_original['temp_f'] = df_original['temp_c'] * 9/5 + 32
# df_original is now modified – no way to see the original state
df_original
temp_c
temp_f
0
0
32.0
1
20
68.0
2
30
86.0
It also doesn’t return anything, so you can’t chain it with other operations. Method-chaining lets you write df.assign(…).query(…).sort_values(…) in one expression instead of multiple separate statements.
df = pd.DataFrame({"temp_c": [0, 20, 30]})
# This doesn't work – square-bracket assignment returns None
# df['temp_f'] = df['temp_c'] * 9/5 + 32.query('temp_f > 50')
# You need separate statements instead
df['temp_f'] = df['temp_c'] * 9/5 + 32
df = df.query('temp_f > 50')
df
temp_c
temp_f
1
20
68.0
2
30
86.0
Using assign solves the chaining problem by returning a new DataFrame instead of modifying in-place:
df = pd.DataFrame({"temp_c": [0, 20, 30, 100]})
df = (
df.assign(temp_f=lambda x: x['temp_c'] * 9/5 + 32)
.query('temp_f > 50')
)
df
temp_c
temp_f
1
20
68.0
2
30
86.0
3
100
212.0
This works for chaining but relies on lambda functions. Lambda functions capture variables by reference, not by value, which can cause bugs:
df = pd.DataFrame({"x": [1, 2, 3]})
results = {}
for factor in [10, 20, 30]:
results[f'x_times_{factor}'] = lambda df: df['x'] * factor
df = df.assign(**results)
df
x
x_times_10
x_times_20
x_times_30
0
1
30
30
30
1
2
60
60
60
2
3
90
90
90
What went wrong: We expected x_times_10 to multiply by 10, x_times_20 by 20, and x_times_30 by 30. Instead, all three columns multiply by 30.
Why: Lambdas don’t save values, they save variable names. All three lambdas point to the same variable factor. After the loop ends, factor = 30. When assign() executes the lambdas, they all read factor and get 30.
The pandas 3.0 Solution: pd.col
pandas 3.0 introduces pd.col, which lets you reference columns without lambda functions. The syntax is borrowed from PySpark and Polars.
Here’s the temp_f conversion rewritten with pd.col:
df = pd.DataFrame({"temp_c": [0, 20, 30, 100]})
df = df.assign(temp_f=pd.col('temp_c') * 9/5 + 32)
df
temp_c
temp_f
0
0
32.0
1
20
68.0
2
30
86.0
3
100
212.0
Unlike square-bracket notation, pd.col supports method-chaining. Unlike lambdas, it doesn’t capture variables by reference, so you avoid the scoping bugs shown earlier.
Remember the lambda scoping bug? With pd.col, each multiplier is captured correctly:
df = pd.DataFrame({"x": [1, 2, 3]})
results = {}
for factor in [10, 20, 30]:
results[f'x_times_{factor}'] = pd.col('x') * factor
df = df.assign(**results)
df
x
x_times_10
x_times_20
x_times_30
0
1
10
20
30
1
2
20
40
60
2
3
30
60
90
Filtering with Expressions
Traditional filtering repeats df twice:
df = pd.DataFrame({"temp_c": [-10, 0, 15, 25, 30]})
df = df.loc[df['temp_c'] >= 0] # df appears twice
df
temp_c
1
0
2
15
3
25
4
30
With pd.col, you reference the column directly:
df = pd.DataFrame({"temp_c": [-10, 0, 15, 25, 30]})
df = df.loc[pd.col('temp_c') >= 0] # cleaner
df
temp_c
1
0
2
15
3
25
4
30
Combining Multiple Columns
With lambdas, you need to repeat lambda x: x[…] for every column:
df = pd.DataFrame({
"price": [100, 200, 150],
"quantity": [2, 3, 4]
})
df = df.assign(
total=lambda x: x["price"] * x["quantity"],
discounted=lambda x: x["price"] * x["quantity"] * 0.9
)
df
price
quantity
total
discounted
0
100
2
200
180.0
1
200
3
600
540.0
2
150
4
600
540.0
With pd.col, the same logic is more readable:
df = pd.DataFrame({
"price": [100, 200, 150],
"quantity": [2, 3, 4]
})
df = df.assign(
total=pd.col("price") * pd.col("quantity"),
discounted=pd.col("price") * pd.col("quantity") * 0.9
)
df
price
quantity
total
discounted
0
100
2
200
180.0
1
200
3
600
540.0
2
150
4
600
540.0
Note that, unlike Polars and PySpark, pd.col cannot yet be used in groupby operations:
# This works in Polars: df.group_by("category").agg(pl.col("value").mean())
# But this doesn't work in pandas 3.0:
df.groupby("category").agg(pd.col("value").mean()) # Not supported yet
This limitation may be removed in future versions.
Copy-on-Write Is Now the Default
If you’ve used pandas, you’ve probably seen the SettingWithCopyWarning at some point. It appears when pandas can’t tell if you’re modifying a view or a copy of your data:
# This pattern caused confusion in pandas < 3.0
df2 = df[df["value"] > 10]
df2["status"] = "high" # SettingWithCopyWarning!
Did this modify df or just df2? The answer depends on whether df2 is a view or a copy, and pandas can’t always predict which one it created. That’s what the warning is telling you.
pandas 3.0 makes the answer simple: filtering with df[…] always returns a copy. Modifying df2 never affects df.
This is called Copy-on-Write (CoW). If you just read df2, pandas shares memory with df. Only when you change df2 does pandas create a separate copy.
Now when you filter and modify, there’s no warning and no uncertainty:
df = pd.DataFrame({"value": [5, 15, 25], "status": ["low", "low", "low"]})
# pandas 3.0: just works, no warning
df2 = df[df["value"] > 10]
df2["status"] = "high" # Modifies df2 only, not df
df2
value
status
1
15
high
2
25
high
df
value
status
0
5
low
1
15
low
2
25
low
We can see that df is unchanged and no warning was raised.
Breaking Change: Chained Assignment
One pattern that breaks is chained assignment. With CoW, df["foo"] is a copy, so assigning to it only modifies the copy and doesn’t modify the original:
# This NO LONGER modifies df in pandas 3.0:
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 6, 8]})
df["foo"][df["bar"] > 5] = 100
df
foo
bar
0
1
4
1
2
6
2
3
8
Notice foo still contains [1, 2, 3]. This is because the value 100 was assigned to a copy that was immediately discarded.
Use .loc instead to modify the original DataFrame:
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 6, 8]})
df.loc[df["bar"] > 5, "foo"] = 100
df
foo
bar
0
1
4
1
100
6
2
100
8
A Dedicated String Dtype
pandas 2.x stores strings as object dtype, which is both slow and ambiguous. You can’t tell from the dtype alone whether a column is purely strings:
pd.options.future.infer_string = False # pandas 2.x behavior
text = pd.Series(["hello", "world"])
messy = pd.Series(["hello", 42, {"key": "value"}])
print(f"text dtype: {text.dtype}")
print(f"messy dtype: {messy.dtype}")
text dtype: object
messy dtype: object
pandas 3.0 introduces a dedicated str dtype that only holds strings, making the type immediately clear:
pd.options.future.infer_string = True # pandas 3.0 behavior
ser = pd.Series(["a", "b", "c"])
print(f"dtype: {ser.dtype}")
dtype: str
Performance Gains
The new string dtype is backed by PyArrow (if installed), which provides significant performance improvements:
String operations run 5-10x faster because PyArrow processes data in contiguous memory blocks instead of individual Python objects
Memory usage reduced by up to 50% since strings are stored in a compact binary format rather than as Python objects with overhead
Arrow Ecosystem Interoperability
DataFrames can be passed to Arrow-based tools like Polars and DuckDB without copying or converting data:
import polars as pl
pandas_df = pd.DataFrame({"name": ["alice", "bob", "charlie"]})
polars_df = pl.from_pandas(pandas_df) # Zero-copy – data already in Arrow format
polars_df
name
0
alice
1
bob
2
charlie
Final Thoughts
pandas 3.0 brings meaningful improvements to your daily workflow:
Write cleaner code with pd.col expressions instead of lambdas
Avoid SettingWithCopyWarning confusion with Copy-on-Write as the default
Get 5-10x faster string operations with the new PyArrow-backed str dtype
Pass DataFrames to Polars and DuckDB without data conversion
Related Resources
For more on DataFrame tools and performance optimization:
Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames – Compare pandas with Polars for performance-critical workflows
Scaling Pandas Workflows with PySpark’s Pandas API – Use familiar pandas syntax on distributed data
pandas vs Polars vs DuckDB: A Data Scientist’s Guide – Choose the right tool for your data analysis needs
💡 The expressions section was inspired by a blog post contributed by Marco Gorelli, Senior Software Engineer at Quansight Labs.
Stay Current with CodeCut
Easy-to-digest articles on Python, AI, and open-source tools. Delivered twice a week.
.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
/* WordPress dark-theme overrides */
.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}
/* Mobile responsive */
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}
Subscribe
What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings Read More »
