Newsletter #256: Build Scalable Pipelines with DuckDB Memory Spilling
📅 Today’s Picks
Marimo: Keep All Notebook Cells in Sync Without Manual Reruns
Problem
In Jupyter notebooks, changing an input value doesn’t automatically update dependent cells.
Forget to rerun one cell, and you might make decisions based on outdated results without realizing anything is wrong.
Solution
Marimo automatically detects changes and re-executes all dependent cells.
When you change a variable like threshold from 50 to 30, every downstream cell that uses it updates immediately.
Build Scalable Pipelines with DuckDB Memory Spilling
Problem
When datasets exceed available RAM, most tools crash mid-operation.
This forces manual data chunking or expensive hardware upgrades just to complete basic queries.
Solution
DuckDB automatically spills intermediate results to temporary files when data exceeds configured memory limits.
Key benefits:
Process datasets larger than RAM without code changes
Configure memory limits to prevent system crashes
Automatic disk spillover when memory fills
No manual chunking or batching required
📢 ANNOUNCEMENTS
Cyber Monday: 30% Off Production-Ready Data Science
My book Production-Ready Data Science is on sale for Cyber Monday.
Get 58% off the ebook or 10% off the paperback through December 8th.
The book covers everything I’ve learned about taking data science from prototype to production: dependency management, testing, CI/CD, and workflow automation.
☕️ Weekly Finds
Nano-PDF
[LLM]
– Natural language PDF editing using Gemini with multi-page parallel processing
Codon
[Python Utils]
– High-performance Python compiler that generates native machine code for 10-100x speedups
lm-evaluation-harness
[LLM]
– Unified framework for few-shot evaluation of language models across 200+ tasks
Looking for a specific tool? Explore 70+ Python tools →
Stay Current with CodeCut
Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.
.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}
.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}
Subscribe
Newsletter #256: Build Scalable Pipelines with DuckDB Memory Spilling Read More »









