Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter Archive

Automated newsletter archive from Klaviyo campaigns

Newsletter #277: Handle Messy Data with RapidFuzz Fuzzy Matching

📅 Today’s Picks

Swap AI Prompts Instantly with MLflow Prompt Registry

Problem
Finding the right prompt often takes experimentation: tweaking wording, adjusting tone, testing different instructions.
But with prompts hardcoded in your codebase, each test requires a code change and redeployment.
Solution
MLflow Prompt Registry solves this with aliases. Your code references an alias like “production” instead of a version number, so you can swap versions without changing it.
Here’s how it works:

Every prompt edit creates a new immutable version with a commit message
Register prompts once, then assign aliases to specific versions
Deploy to different environments by creating aliases like “staging” and “production”
Track full version history with metadata and tags for each prompt

⭐ View GitHub

🔄 Worth Revisiting

Automate LLM Evaluation at Scale with MLflow make_judge()

Problem
When you ship LLM features without evaluating them, models might hallucinate, violate safety guidelines, or return incorrectly formatted responses.
Manual review doesn’t scale. Reviewers might miss subtle issues when evaluating thousands of outputs, and scoring standards often vary between people.
Solution
MLflow make_judge() applies the same evaluation standards to every output, whether you’re checking 10 or 10,000 responses.
Key capabilities:

Define evaluation criteria once, reuse everywhere
Automatic rationale explaining each judgment
Built-in judges for safety, toxicity, and hallucination detection
Typed outputs that never return unexpected formats

⭐ View GitHub

☕️ Weekly Finds

gspread
[Data Processing]
– Google Sheets Python API for reading, writing, and formatting spreadsheets

zeppelin
[Data Analysis]
– Web-based notebook for interactive data analytics with SQL, Scala, and more

vectorbt
[Data Science]
– Fast engine for backtesting, algorithmic trading, and research in Python

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #277: Handle Messy Data with RapidFuzz Fuzzy Matching Read More »

Newsletter #296: Scrapling: Adaptive Web Scraping in Python

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

Scrapling: Adaptive Web Scraping in Python

Problem
Traditional scraping with BeautifulSoup uses hardcoded CSS selectors to find elements on a page.
If the site updates its layout, those selectors no longer match and the scraper ends up returning empty data.
Solution
Instead of relying only on selectors, Scrapling records how elements appear during the initial scrape.
If the site is redesigned later, it can use that stored structure to find the same elements again.

Ibis: One Python API for 25+ Database Backends

Problem
Many data workflows begin with pandas for quick experimentation, while production pipelines might run on databases like PostgreSQL or BigQuery.
Moving from prototype to production usually means rewriting the same transformation logic in SQL. That translation takes time and can easily introduce errors.
Solution
Ibis solves this by letting you define transformations once in Python and compiling them into native SQL for 25+ backends automatically.

📖 View Full Article

☕️ Weekly Finds

Kronos
[Machine Learning]
– A decoder-only foundation model pre-trained on K-line sequences for financial market forecasting

pixi
[Environment Management]
– Fast, cross-platform package manager built on the Conda ecosystem, written in Rust

MinerU
[OCR/PDF Processing]
– One-stop tool for converting PDFs, webpages, and e-books into machine-readable markdown and JSON

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #296: Scrapling: Adaptive Web Scraping in Python Read More »

Newsletter #295: Marker: Smart PDF Extraction with Hybrid LLM Mode

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

Marker: Smart PDF Extraction with Hybrid LLM Mode

Problem
Standard OCR pipelines often miss inline math, split tables across pages, and lose the relationships between form fields.
Sending the full document to an LLM can improve accuracy, but it’s slow and expensive at scale.
Solution
Marker‘s hybrid mode takes a more targeted approach:

Its deep learning pipeline handles the bulk of conversion
Then an LLM steps in only for the hard parts: table merging, LaTeX formatting, and form extraction

Marker supports OpenAI, Gemini, Claude, Ollama, and Azure out of the box.

📖 View Full Article

Qdrant: Fast Vector Search in Rust with a Python API

Problem
Building semantic search typically starts with storing vectors in Python lists and computing cosine similarity manually.
But brute-force comparison scales linearly with your dataset, making every query slower as your data grows.
Solution
Qdrant is a vector search engine built in Rust that indexes your vectors for fast retrieval.
Key features:

In-memory mode for local prototyping with no server setup
Seamlessly scale to millions of vectors in production with the same Python API
Built-in support for cosine, dot product, and Euclidean distance
Sub-second query times even for millions of vectors

🧪 Run code

📚 Latest Deep Dives

PDF Table Extraction: Docling vs Marker vs LlamaParse Compared

Extracting tables from PDFs can be surprisingly difficult. A table that looks neatly structured in a PDF is actually saved as text placed at specific coordinates on the page. This makes it difficult to preserve the original layout when extracting the table.
This article will introduce three Python tools that attempt to solve this problem: Docling, Marker, and LlamaParse.

📖 View Full Article

☕️ Weekly Finds

Dify
[LLM]
– Open-source LLM app development platform with AI workflow, RAG pipeline, and agent capabilities

PageIndex
[RAG]
– Document index for vectorless, reasoning-based RAG

MCP Server Chart
[Data Visualization]
– A visualization MCP server for generating 25+ visual charts using AntV

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #295: Marker: Smart PDF Extraction with Hybrid LLM Mode Read More »

Newsletter #294: pandas 3.0: 5-10x Faster String Operations with PyArrow

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

pandas 3.0: 5-10x Faster String Operations with PyArrow

Problem
Traditionally, pandas stores strings as object dtype, where each string is a separate Python object scattered across memory.
This makes string operations slow and the dtype ambiguous, since both pure string columns and mixed-type columns show up as object.
Solution
pandas 3.0 introduces a dedicated str dtype backed by PyArrow, which stores strings in contiguous memory blocks instead of individual Python objects.
Key benefits:

5-10x faster string operations because data is stored contiguously
50% lower memory by eliminating Python object overhead
Clear distinction between string and mixed-type columns

📖 View Full Article

🧪 Run code

Build Self-Documenting Regex with Pregex

Problem
Regex patterns like [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} are difficult to read and intimidating.
Team members without regex expertise might struggle to understand and modify these validation patterns.
Solution
Team members without regex expertise might struggle to understand and modify these validation patterns.
Pregex transforms regex into readable Python code using descriptive components.
Key benefits:

Code that explains its intent without comments
Easy modification without regex expertise
Composable patterns for complex validation
Export to regex format when needed

📖 View Full Article

🧪 Run code

📚 Latest Deep Dives

PDF Table Extraction: Docling vs Marker vs LlamaParse Compared

PDF files do not store tables as structured data. Instead, they position text at specific coordinates on the page.
Table extraction tools must reconstruct the structure by determining which values belong in which rows and columns.
The problem becomes even harder when tables include multi-level headers, merged cells, or complex layouts.
To explore this problem, I experimented with three tools designed for PDF table extraction: LlamaParse, Marker, and Docling. Each tool takes a different approach.
Performance overview:

Docling: Fastest local option, but struggles with complex tables
Marker: Handles complex layouts well and runs locally, but is much slower
LlamaParse: Most accurate on complex tables and fastest overall, but requires a cloud API

In this article, I share the code, examples, and results from testing each tool.
📖 View Full Article

☕️ Weekly Finds

Lance
[Data Processing]
– Modern columnar data format for ML with 100x faster random access than Parquet

Mathesar
[Dashboard]
– Spreadsheet-like interface for PostgreSQL that lets anyone view, edit, and query data

dotenvx
[DevOps]
– A better dotenv with encryption, multiple environments, and cross-platform support

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #294: pandas 3.0: 5-10x Faster String Operations with PyArrow Read More »

Newsletter #293: act: Run GitHub Actions Locally with Docker

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

act: Run GitHub Actions Locally with Docker

Problem
GitHub Actions has no local execution mode. You can’t test a step, inspect an environment variable, or reproduce a runner-specific failure on your own machine.
Each change requires a commit and a wait for the cloud runner. A small mistake like a missing secret means starting the loop again.
Solution
With act, you can execute workflows locally using Docker. Failures surface immediately, making it easier to iterate and commit only when the workflow passes.

ScrapeGraphAI: Research Multiple Sites with One Prompt

Problem
With BeautifulSoup, every site needs its own selectors, and you need to manually combine the results into a unified format.
When any site redesigns its layout, those selectors break and you are back to fixing code.
Solution
ScrapeGraphAI‘s SearchGraph fixes this by replacing selectors with a natural language prompt.
Here’s what it handles:

Automatic web search for relevant pages
AI-powered scraping that adapts to any layout
Structured output with source URLs for verification
Works with any LLM provider (OpenAI, Ollama, etc.)

📖 View Full Article

🎓 Latest Interactive Course

Python Data Modeling with Dataclasses and Pydantic

Choosing between dict, NamedTuple, dataclass, and Pydantic comes down to how much safety you need. In this free interactive course, you’ll learn when to use each:

Dictionary: Flexible, but no built-in field checks. Typos and missing keys only show up at runtime.
NamedTuple: Immutable with fixed fields, helping catch mistakes early.
dataclass: Mutable data containers with defaults and optional validation logic.
Pydantic: Strong type validation, automatic coercion, and detailed error reporting.

All exercises run directly in your browser. No installation required.

☕️ Weekly Finds

agent-browser
[Agents]
– Headless browser automation CLI for AI agents, built on Playwright

pyscn
[Code Quality]
– Intelligent Python code quality analyzer with dead code detection and complexity analysis

pyupgrade
[Code Quality]
– Automatically upgrade Python syntax to newer versions of the language

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #293: act: Run GitHub Actions Locally with Docker Read More »

Newsletter #292: SQLFluff: Auto-Fix Messy SQL with One Command

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

Evaluate LLM Apps in One Line with PydanticAI

Problem
Testing LLM apps means validating multiple factors at once: is the answer correct, properly structured, fast enough, and natural sounding?
Rewriting this logic for every project is inefficient and error-prone.
Solution
pydantic-ai includes pydantic-evals, which provides these capabilities out of the box. Simply choose the evaluators you need and add them to your evaluation suite.
Built-in evaluators:

Deterministic: validate that outputs are correct, properly typed, and fast enough
LLM-as-judge: have another LLM grade qualities like helpfulness or tone
Report-level: generate classification metrics across all cases automatically

🧪 Run code

SQLFluff: Auto-Fix Messy SQL with One Command

Problem
Consistent SQL style matters. It improves readability, speeds up code reviews, and makes bugs easier to identify.
Manual reviews can catch formatting issues, but they’re time-consuming and often inconsistent.
Solution
SQLFluff solves this with automated linting and formatting across 30+ SQL dialects. It identifies violations, applies consistent standards, and auto-corrects many problems.
SQLFluff also supports the following templates:

Jinja
SQL placeholders (e.g. SQLAlchemy parameters)
Python format strings
dbt (requires plugin)

🧪 Run code

🎓 Latest Interactive Course

Python Data Modeling with Dataclasses and Pydantic

Choosing between dict, NamedTuple, dataclass, and Pydantic comes down to how much safety you need. In this free interactive course, you’ll learn when to use each:

Dictionary: Flexible, but no built-in field checks. Typos and missing keys only show up at runtime.
NamedTuple: Immutable with fixed fields, helping catch mistakes early.
dataclass: Mutable data containers with defaults and optional validation logic.
Pydantic: Strong type validation, automatic coercion, and detailed error reporting.

All exercises run directly in your browser. No installation required.

☕️ Weekly Finds

spec-kit
[Dev Tools]
– Toolkit for Spec-Driven Development that helps define specs, generate plans and tasks, and implement code with AI coding tools

ty
[Code Quality]
– Extremely fast Python type checker and language server written in Rust, by the creators of uv and Ruff

nbQA
[Code Quality]
– Run ruff, isort, pyupgrade, mypy, pylint, flake8, and more on Jupyter Notebooks

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #292: SQLFluff: Auto-Fix Messy SQL with One Command Read More »

Newsletter #291: Docling: Turn DOCX Reviewer Feedback into Structured Data

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

Narwhals: One Decorator for pandas, Polars, and DuckDB

Problem
Writing a DataFrame function that supports multiple libraries usually means maintaining separate versions of the same logic for each one.
If changes are needed, they need to be applied to every version.
Solution
With Narwhals‘ @narwhalify decorator, you write the logic once using a unified API.
The function then works with whatever DataFrame type is passed in and returns the same type, reducing friction when switching tools.
How is this different from Ibis? Ibis is built for data scientists switching between SQL backends. Narwhals is built for library authors who need their code to work with any DataFrame type.

📖 View Full Article

🧪 Run code

Docling: Turn DOCX Reviewer Feedback into Structured Data

Problem
Pulling comments from Word files turns informal feedback into data you can analyze, manage, and act on in code.
Traditionally, this requires parsing raw XML and manually mapping each comment back to its referenced text.
Solution
Docling v2.71.0 simplifies this process. Converted documents now attach a comments field to every text item, making reviewer annotations accessible without manual XML handling.
This opens up workflows that were previously too tedious to automate:

Flag unresolved comments before merging document versions
Build dashboards tracking reviewer feedback across teams
Feed comment data into LLMs for sentiment analysis or summarization

📖 View Full Article

📚 Latest Deep Dives

Portable DataFrames in Python: When to Use Ibis, Narwhals, or Fugue
– Write your DataFrame logic once and run it on any backend. Compare Ibis, Narwhals, and Fugue to find the right portability strategy for your Python workflow.

☕️ Weekly Finds

pdfGPT
[LLM]
– Chat with the contents of your PDF files using GPT capabilities and semantic search with sentence embeddings

SandDance
[Data Viz]
– Microsoft Research data visualization tool that maps every data row to a visual mark for interactive exploration

trafilatura
[Web Scraping]
– Python package and CLI for web crawling, scraping, and text extraction with output as CSV, JSON, HTML, or XML

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #291: Docling: Turn DOCX Reviewer Feedback into Structured Data Read More »

Newsletter #290: Quarkdown: Build LaTeX-Quality Docs with Just Markdown

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

Quarkdown: Build LaTeX-Quality Docs with Just Markdown

Problem
LaTeX produces beautiful academic papers, but its verbose syntax and nested environments make even simple layouts painful to write.
Solution
Quarkdown generates the same professional paged output using clean Markdown syntax you already know.
Key features:

Write once, export as paged documents, presentation slides, or websites
Define reusable functions with conditionals and loops inside your documents
Embed Mermaid diagrams and charts without external tools
Live preview in VS Code as you type

Ibis: One Python API for 22+ Database Backends

Problem
Running queries across multiple databases often means rewriting the same logic for each backend’s SQL dialect. A query that works in DuckDB may require syntax changes for PostgreSQL, and another rewrite for BigQuery.
Solution
Ibis removes that friction by compiling Python expressions into each backend’s native SQL. Swap the connection, and the same code runs across 22+ databases.
Key features:

Write once, run on DuckDB, PostgreSQL, BigQuery, Snowflake, and 18+ more
Lazy execution that builds and optimizes the query plan before sending it to the database
Intuitive chaining syntax similar to Polars

📖 View Full Article

📚 Latest Deep Dives

Portable DataFrames in Python: When to Use Ibis, Narwhals, or Fugue
– Write your DataFrame logic once and run it on any backend. Compare Ibis, Narwhals, and Fugue to find the right portability strategy for your Python workflow.

☕️ Weekly Finds

graphiti
[LLM]
– Build real-time, temporally-aware knowledge graphs for AI agents with automatic entity and relationship extraction

doris
[SQL]
– High-performance MPP analytics database with MySQL compatibility that handles real-time ingestion and sub-second queries at scale

smallpond
[Data Processing]
– Lightweight distributed data processing framework by DeepSeek that scales DuckDB to PB-scale datasets using Ray

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #290: Quarkdown: Build LaTeX-Quality Docs with Just Markdown Read More »

Newsletter #289: Python 3.14: Type-Safe String Interpolation with t-strings

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

Loguru: From Print Statements to Production Logging in One Line

Problem
Data scientists often rely on print statements to monitor data processing pipelines during development.
But print provides no timestamps or severity levels.
Python’s built-in logging fixes that, but demands boilerplate: handlers, formatters, and log-level configuration just to get started.
Solution
Loguru replaces both with a single import: one line gives you structured, colored logging with no setup required.
Key features:

Modern {} formatting that matches Python f-string syntax
One-line file logging with automatic rotation and retention
Readable tracebacks that show variable values at each stack level
Custom sinks to route logs to Slack, email, or databases

📖 View Full Article

🧪 Run code

Python 3.14: Type-Safe String Interpolation with t-strings

Problem
Building SQL queries with f-strings directly embeds user input into the query string, allowing attackers to inject malicious SQL commands.
Parameterized queries are secure but require you to maintain query templates and value lists separately.
Solution
Python 3.14 introduces template string literals (t-strings). Instead of returning strings, they return Template objects that safely expose interpolated values.
This lets you validate and sanitize interpolated values before building the final query.

🧪 Run code

☕️ Weekly Finds

mistune
[Python Utilities]
– Fast Python Markdown parser with custom renderers and plugins that converts Markdown to HTML with minimal overhead

pyparsing
[Python Utilities]
– Python library for creating readable PEG parsers that handles whitespace, quoted strings, and comments without regex complexity

fastlite
[SQL]
– Lightweight SQLite wrapper by Jeremy Howard that adds Pythonic syntax, dataclass support, and diagram visualization for interactive use

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #289: Python 3.14: Type-Safe String Interpolation with t-strings Read More »

Newsletter #288: MLflow: Track Every LLM API Call with 1 Line of Code

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

MLflow: Track Every LLM API Call with 1 Line of Code

Problem
Most teams building LLM apps don’t set up tracking because they see API calls as simple request-response operations.
But every API call costs money, and without token-level visibility, you can’t tell where your budget is going until it’s already spent.
Solution
MLflow‘s autolog traces every OpenAI API call with just one line of code, so you always know what was sent, what came back, and how many tokens it used.
Key capabilities:

Track token usage per call to identify which requests consume the most
View full prompt and response content for every call
Measure latency per call to find which requests are slowing down your app
Works with OpenAI, Anthropic, LangChain, LlamaIndex, and DSPy

Rembg: Remove Image Backgrounds in 2 Lines of Python

Problem
Removing backgrounds from images typically requires Photoshop, online tools, or AI assistants like ChatGPT.
But these options come with subscription costs, upload limits, or privacy concerns with your images on external servers.
Solution
Rembg uses AI models to remove backgrounds locally with just 2 lines of Python.
It’s also open source and compatible with common Python imaging libraries.

🧪 Run code

☕️ Weekly Finds

awesome-claude-skills
[AI Tools]
– Curated list of Claude Skills, resources, and tools for customizing Claude AI workflows with community-contributed templates and integrations

zvec
[Vector Database]
– In-process vector database built on Alibaba’s Proxima engine that searches billions of vectors in milliseconds with zero server setup

langflow
[AI Agents]
– Visual platform for building and deploying AI-powered agents and workflows with drag-and-drop interface, multi-agent orchestration, and MCP server support

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #288: MLflow: Track Every LLM API Call with 1 Line of Code Read More »

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran