Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter Archive

Automated newsletter archive from Klaviyo campaigns

Newsletter #276: Polars v1.37.0: Faster Lookups with min_by and max_by

📅 Today’s Picks

Stop Manually Tracing Dependencies with uv tree

Problem
Debugging version conflicts requires knowing which packages depend on what. But tracing these relationships manually through nested dependencies is tedious.
Solution
uv tree handles this automatically, displaying the full dependency graph so you can trace any package back to its source.
Key capabilities:

Complete dependency visualization
Flag dependencies with available updates
Find which packages depend on a specific library
Filter the tree to show only a specific package’s dependencies

📖 View Full Article

⭐ View GitHub

Polars v1.37.0: Faster Lookups with min_by and max_by

Problem
Finding the row with the minimum or maximum value based on another column requires sorting, grouping, or complex filter expressions.
Solution
Polars v1.37.0 adds min_by and max_by expression methods. These methods find minimum or maximum values based on any column in a single, readable expression.

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

lmql
[LLM]
– A programming language for constraint-guided and efficient LLM programming based on a superset of Python.

helicone
[MLOps]
– Open-source LLM observability platform with one-line integration for monitoring, analytics, and management.

responses
[Python Utils]
– A utility library for mocking out the Python Requests library in tests.

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings
– Learn what’s new in pandas 3.0: pd.col expressions for cleaner code, Copy-on-Write for predictable behavior, and PyArrow-backed strings for 5-10x faster operations.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #276: Polars v1.37.0: Faster Lookups with min_by and max_by Read More »

Newsletter #275: DrawDB: Visual Schema Design to Production SQL in Minutes

📅 Today’s Picks

pd.col: Polars-Like Column References in pandas 3.0

Problem
Before pandas 3.0, creating columns meant:

Bracket notation: repeats DataFrame name, breaks chaining
assign() with lambdas: verbose syntax, scoping bugs from variable capture

Solution
pandas 3.0 solves this with pd.col expressions: clean column references that chain naturally, with syntax as readable as Polars and PySpark.

📖 View Full Article

🧪 Run code

⭐ View GitHub

DrawDB: Visual Schema Design to Production SQL in Minutes

Problem
Have you ever sketched a database schema on a whiteboard, then spent hours converting it to SQL?
There’s a faster way to go from diagram to production-ready code.
Solution
With DrawDB, your database diagram becomes the code. Just drag tables onto a canvas, connect them visually, and export SQL for 6 databases.
Key benefits:

Draw tables and relationships on a visual canvas
Export production-ready SQL for MySQL, PostgreSQL, SQLite, MariaDB, MSSQL, and Oracle
No account or subscription required
Share diagrams with your team instantly

⭐ View GitHub

☕️ Weekly Finds

timescaledb
[Data Engineer]
– A time-series database for high-performance real-time analytics packaged as a Postgres extension

rembg
[Python Utils]
– A tool to remove images background with Python

grip
[Python Utils]
– Preview GitHub README.md files locally before committing them

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings
– Learn what’s new in pandas 3.0: pd.col expressions for cleaner code, Copy-on-Write for predictable behavior, and PyArrow-backed strings for 5-10x faster operations.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #275: DrawDB: Visual Schema Design to Production SQL in Minutes Read More »

Newsletter #274: ChromaDB: Metadata Filtering for Precise Semantic Search

📅 Today’s Picks

ChromaDB: Metadata Filtering for Precise Semantic Search

Problem
Search for “latest ML research” and semantic search might return highly relevant papers from 2019.
That’s because similarity doesn’t understand constraints. You need metadata filtering to enforce “year >= 2024” at the database level.
Solution
ChromaDB’s where clause lets you combine “find similar” with “but only from 2024.” The database filters first, then ranks by similarity.
Key operators:

$eq, $ne for exact matching
$gt, $gte, $lt, $lte for range queries
$in, $nin for set membership
$and, $or for combining conditions

📖 View Full Article

🧪 Run code

⭐ View GitHub

🔄 Worth Revisiting

Semantic Search in PostgreSQL with pgvector

Problem
Traditional PostgreSQL keyword queries return limited results because they require exact string matches. This approach misses semantically related data that shares meaning but uses different terminology.
Solution
pgvector enables vector search within PostgreSQL. This allows semantic matching of contextually similar content.
Key benefits:

Native PostgreSQL integration with existing databases
Fast exact and approximate nearest neighbor search
Six distance metrics including L2, cosine, inner product, and Hamming
Seamless Python integration via SQLAlchemy or psycopg2

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

RAGxplorer
[LLM]
– Open-source tool to visualize RAG embeddings and explore retrieval augmented generation pipelines interactively

CAMEL
[LLM]
– The first multi-agent framework enabling AI agents to communicate and collaborate while assuming different roles

claude-scientific-skills
[LLM]
– A set of ready-to-use scientific skills for Claude, enabling advanced research and analysis workflows

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings
– Learn what’s new in pandas 3.0: pd.col expressions for cleaner code, Copy-on-Write for predictable behavior, and PyArrow-backed strings for 5-10x faster operations.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #274: ChromaDB: Metadata Filtering for Precise Semantic Search Read More »

Newsletter #273: MarkItDown: YouTube Transcripts to Markdown in One Line

📅 Today’s Picks

MarkItDown: YouTube Transcripts to Markdown in One Line

Problem
Videos contain rich information that’s difficult to search or analyze programmatically.
Manually transcribing and formatting them into structured text is tedious and error-prone.
Solution
MarkItDown eliminates manual transcription by converting YouTube URLs to structured Markdown automatically.
Key benefits:

Output ready for RAG systems or content summarization
Multi-format support: same API for PDFs, Word docs, Excel, and images
Lightweight with minimal dependencies
Consistent Markdown output across all file types

Build question-answering systems over video content without manual transcription.

📖 View Full Article

🧪 Run code

⭐ View GitHub

UV: Define Conflicting Dependencies in One Project

Problem
What happens when your project needs two incompatible versions of the same package?
Version conflicts are a frequent issue in many projects. A typical solution is to split dependencies across different requirements files or environments, which works but adds ongoing maintenance overhead.
Solution
UV’s conflicts declaration lets you define both versions in one project. Just add a flag to switch between them.
Key benefits:

One pyproject.toml for all configurations
Separate resolution paths in a single lockfile
Flag-based switching between environments
Protection from accidentally installing both

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

owl
[LLM]
– Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

dexter
[LLM]
– An autonomous agent for deep financial research

bandit
[Python Utils]
– A tool designed to find common security issues in Python code

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #273: MarkItDown: YouTube Transcripts to Markdown in One Line Read More »

Newsletter #272: Split Large Parquet Files Automatically with Polars

📅 Today’s Picks

Split Large Parquet Files Automatically with Polars

Problem
When writing large datasets to Parquet, you end up with either one massive file that is slow to read or must manually split data into smaller files.
Solution
With Polars PartitionMaxSize, output is automatically broken into multiple Parquet files according to a defined size limit.
This enables:

Parallel reads across multiple cores
Faster, more reliable cloud storage transfers

📖 View Full Article

🧪 Run code

⭐ View GitHub

Coiled: One Decorator Replaces Your Entire Docker Workflow (Sponsored)

Problem
Have you ever had code work locally but fail on cloud VMs because of missing dependencies or version mismatches?
Docker solves this by freezing dependencies, but introduces friction: Dockerfiles, slow builds, registry pushes, and full redeploys for minor package changes.
Solution
Coiled can remove Docker from the workflow entirely. With a single decorator, it automatically syncs your local environment to the cloud.
Key features:

Exact dependency replication from local to cloud
No need for container builds or registry management
Compatible with pandas, Polars, DuckDB, Dask, and more
Faster deployments through smart caching

📖 View Full Article

🌐 Visit website

☕️ Weekly Finds

crewAI
[LLM]
– Framework for orchestrating role-playing autonomous AI agents that work together to accomplish complex tasks

Ray
[MLOps]
– Unified framework for scaling AI and Python applications from laptop to cluster with distributed runtime and ML libraries

Metabase
[Data Viz]
– Open-source business intelligence tool that lets everyone visualize, analyze, and share data insights

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #272: Split Large Parquet Files Automatically with Polars Read More »

Newsletter #271: Automate LLM Evaluation at Scale with MLflow make_judge()

📅 Today’s Picks

Automate LLM Evaluation at Scale with MLflow make_judge()

Problem
When you ship LLM features without evaluating them, models might hallucinate, violate safety guidelines, or return incorrectly formatted responses.
Manual review doesn’t scale. Reviewers might miss subtle issues when evaluating thousands of outputs, and scoring standards often vary between people.
Solution
MLflow make_judge() applies the same evaluation standards to every output, whether you’re checking 10 or 10,000 responses.
Key capabilities:

Define evaluation criteria once, reuse everywhere
Automatic rationale explaining each judgment
Built-in judges for safety, toxicity, and hallucination detection
Typed outputs that never return unexpected formats

🧪 Run code

⭐ View GitHub

🔄 Worth Revisiting

LangChain v1.0: Auto-Protect Sensitive Data with PIIMiddleware

Problem
User messages often contain sensitive information like emails and phone numbers.
Logging or storing this data without protection creates compliance and security risks.
Solution
LangChain v1.0 introduces PIIMiddleware to automatically protect sensitive data before model processing.
PIIMiddleware supports multiple protection modes:

5 built-in detectors (email, credit card, IP, MAC, URL)
Custom regex for any PII pattern
Replace with [REDACTED], mask as ****1234, or block entirely

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

litellm
[LLM]
– Python SDK and Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI format with cost tracking, guardrails, and logging.

parlant
[LLM]
– LLM agents built for control with behavioral guidelines, ensuring predictable and consistent agent behavior.

GLiNER2
[ML]
– Unified schema-based information extraction for NER, text classification, and structured data parsing in one pass.

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #271: Automate LLM Evaluation at Scale with MLflow make_judge() Read More »

Newsletter #270: PydanticAI: Type-Safe LLM Outputs with Auto-Validation

📅 Today’s Picks

Yellowbrick: Detect Overfitting vs Underfitting Visually

Problem
Hyperparameter tuning requires finding the sweet spot between underfitting (model too simple) and overfitting (model memorizes training data).
You could write the loop, run cross-validation for each value, collect scores, and format the plot yourself. But that’s boilerplate you’ll repeat across projects.
Solution
Yellowbrick is a machine learning visualization library built for exactly this.
Its ValidationCurve shows you what’s working, what’s not, and what to fix next without the boilerplate or inconsistent formatting.
How to read the plot in this example:

Training score (blue) stays high as max_depth increases
Validation score (green) drops after depth 4
The growing gap means the model memorizes training data but fails on new data

Action: Pick max_depth around 3-4 where validation score peaks before the gap widens.

📖 View Full Article

🧪 Run code

⭐ View GitHub

PydanticAI: Type-Safe LLM Outputs with Auto-Validation

Problem
Without structured outputs, you’re working with raw text that might not match your expected format.
Unexpected responses, missing fields, or wrong data types can cause errors that are easy to miss during development.
Solution
PydanticAI uses Pydantic models to automatically validate and structure LLM responses.
Key benefits:

Type safety at runtime with validated Python objects
Automatic retry on validation failures
Direct field access without manual parsing
Integration with existing Pydantic workflows

LangChain works too, but PydanticAI is a lighter alternative when you just need structured outputs.

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

pdfplumber
[Data Processing]
– Plumb a PDF for detailed information about each char, rectangle, line, et cetera – and easily extract text and tables.

cognee
[LLM]
– Memory for AI Agents in 6 lines of code – transforms data into knowledge graphs for persistent, scalable AI memory.

featuretools
[ML]
– An open source Python library for automated feature engineering from relational and temporal datasets.

Looking for a specific tool? Explore 70+ Python tools →

📚 Top 5 Articles of 2025

A Deep Dive into DuckDB for Data Scientists
– Query billions of rows on your laptop with DuckDB. Learn SQL analytics, Parquet integration, and when to choose DuckDB over pandas.

Top 6 Python Libraries for Visualization: Which One to Use?
– Compare Matplotlib, Seaborn, Plotly, Altair, Bokeh, and PyGWalker. Find the right visualization library for your data science workflow.

Transform Any PDF into Searchable AI Data with Docling
– Extract text, tables, and structure from PDFs for RAG pipelines. Docling handles complex layouts that break traditional parsers.

Narwhals: Unified DataFrame Functions for pandas, Polars, and PySpark
– Write DataFrame code once, run it on pandas, Polars, or PySpark. Narwhals provides a unified API without vendor lock-in.

Goodbye Pip and Poetry. Why UV Might Be All You Need
– Replace pip, virtualenv, pyenv, and Poetry with one tool. UV handles Python versions, dependencies, and reproducible builds in a single workflow.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #270: PydanticAI: Type-Safe LLM Outputs with Auto-Validation Read More »

Newsletter #269: LangChain v1.2.0: Build Multi-Provider Agents with Extras

📅 Today’s Picks

LangChain v1.2.0: Build Multi-Provider Agents with Extras

Problem
Different LLM providers require different tool configurations: parallel vs sequential execution, strict mode, token limits.
This creates scattered configs and manual provider switching throughout your code.
Solution
LangChain v1.2.0 introduces the extras attribute that attaches provider-specific configurations directly to tool definitions.
With extras, you can:

Define all provider configs in one place
Switch providers without touching multiple files
Keep configs in sync across environments

📖 View Full Article

⭐ View GitHub

GLiNER: Extract Any Entity Type with Zero-Shot NER

Problem
Named Entity Recognition (NER) extracts key information like names, dates, and organizations from text. But standard models are limited to predefined entity types like PERSON, ORG, and DATE.
If you need to extract something specific, you’d normally have to train a custom model with thousands of labeled examples.
Solution
GLiNER changes that with zero-shot entity extraction, allowing you to extract any entity type without training.
Key benefits:

Works out-of-the-box with any text domain
Handles multiple entity types in a single pass
Returns confidence scores for each extraction
Integrates with spaCy and other NLP pipelines

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

timescaledb
[Data Engineer]
– PostgreSQL extension for high-performance real-time analytics on time-series and event data

slim
[MLOps]
– Inspect, optimize, and minify Docker container images without sacrificing functionality

drawdb
[Data Engineer]
– Free, simple, and intuitive online database diagram editor and SQL generator

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #269: LangChain v1.2.0: Build Multi-Provider Agents with Extras Read More »

Newsletter #268: Faster Table Joins with Polars Multi-Threading

📅 Today’s Picks

Faster Table Joins with Polars Multi-Threading

Problem
pandas processes joins on a single CPU core, leaving other cores idle during large table operations.
Solution
Polars distributes join operations across all available CPU cores, achieving significantly faster joins than pandas on large datasets.
What makes Polars fast:

Processes rows in parallel batches
Uses all available CPU cores
Zero configuration required

📖 View Full Article

🧪 Run code

⭐ View GitHub

🔄 Worth Revisiting

Faster Polars Queries with Programmatic Expressions

Problem
When you want to use for loops to apply similar transformations, each Polars with_columns() call processes sequentially.
This prevents the optimizer from seeing the full computation plan.
Solution
Instead, generate all Polars expressions programmatically before applying them together.
This enables Polars to:

See the complete computation plan upfront
Optimize across all expressions simultaneously
Parallelize operations across CPU cores

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

Mole
[Python Utils]
– Deep clean and optimize your Mac with a simple command-line tool.

marker
[LLM]
– Convert PDF, DOCX, PPTX, and other documents to markdown with high speed and accuracy.

pathway
[Data Engineer]
– Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #268: Faster Table Joins with Polars Multi-Threading Read More »

Code example: Build Professional Python Packages with UV --package

Newsletter #267: Build Professional Python Packages with UV –package

🔄 Worth Revisiting

Build Professional Python Packages with UV –package

Problem
Python packages turn your code into reusable modules you can share across projects.
But building them requires complex setup with setuptools, managing build systems, and understanding distribution mechanics.
Solution
UV, a fast Python package installer and resolver, reduces the entire process to 2 simple commands:

uv init –package sets up your package structure instantly
uv build and uv publish to create and distribute to PyPI

📖 Learn more

⭐ View GitHub

Generate Time-Sortable IDs with Python 3.14’s UUID v7

Problem
UUID4 generates purely random identifiers that lack chronological ordering.
Without embedded timestamps, you need separate timestamp fields and custom sorting logic to organize records by creation time.
Solution
Python 3.14 introduces UUID version 7 with built-in timestamp ordering.
Key features:

Determine creation order by comparing two UUIDs directly
Retrieve exact creation time by extracting the embedded timestamp

☕️ Weekly Finds

smolagents
[LLM]
– A barebones library for agents that think in code

rembg
[ML]
– A tool to remove images background

Scrapegraph-ai
[LLM]
– Python scraper based on AI

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

Visualize Machine Learning Results with Yellowbrick
– Learn to visualize ML model performance with Yellowbrick. Create confusion matrices, ROC curves, and feature importance plots in scikit-learn pipelines.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #267: Build Professional Python Packages with UV –package Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran