Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter Archive

Automated newsletter archive from Klaviyo campaigns

Newsletter #281: MarkItDown: From Images to Searchable Text in Seconds

Grab your coffee. Here are this week’s highlights.

🤝 COLLABORATION

What Data Engineers Really Think About Airflow (5.8K Surveyed)
Astronomer analyzed 5.8k+ responses from data engineers on how they are navigating Airflow today and the findings might surprise you.
You’ll learn:

How early adopters are using Airflow 3 features in production
Which teams are bringing AI into production and what’s holding others back
35.6% believe that Airflow is beneficial to their career

🔗 Download the State of Airflow 2026 Report

📅 Today’s Picks

Query Multiple Databases at Once with DuckDB

Problem
Working with data across PostgreSQL, MySQL, and SQLite often means managing multiple database connections and additional integration overhead.
That overhead adds up quickly when your goal is simply to analyze data across sources.
Solution
DuckDB removes the friction by allowing you to join tables across databases with a single query.
Key benefits:

Join SQLite, PostgreSQL, MySQL, and Parquet files in a single SQL statement
Automatic connection handling across all sources
Filters run at the source database, so only matching rows are transferred

⭐ View GitHub

MarkItDown: From Images to Searchable Text in Seconds

Problem
Charts, diagrams, and screenshots in your documents need text descriptions to be searchable and processable.
But writing descriptions manually is slow and produces inconsistent results across large document sets.
Solution
MarkItDown, an open-source library from Microsoft, integrates with OpenAI to automatically generate detailed descriptions of images.
Key capabilities:

Generate consistent descriptions across hundreds of images
Process images from documents like PowerPoint and PDF files
Customize the description prompt for your specific needs

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

Skill_Seekers
[LLM]
– Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection

sqlit
[Data]
– A user-friendly TUI for SQL databases supporting SQL Server, MySQL, PostgreSQL, SQLite, Turso and more

giskard
[ML]
– Open-source CI/CD platform for ML teams to eliminate AI bias and deliver quality ML products faster

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

From CSS Selectors to Natural Language: Web Scraping with ScrapeGraphAI
– Web scraping without selector maintenance. ScrapeGraphAI uses LLMs to extract data from any site using plain English prompts and Pydantic schemas.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #281: MarkItDown: From Images to Searchable Text in Seconds Read More »

Newsletter #280: ScrapeGraphAI: Scrape Any Website with Natural Language

🤝 COLLABORATION

What Data Engineers Really Think About Airflow (5.8K Surveyed)
Astronomer analyzed 5.8k+ responses from data engineers on how they are navigating Airflow today and the findings might surprise you.
You’ll learn:

How early adopters are using Airflow 3 features in production
Which teams are bringing AI into production and what’s holding others back
35.6% believe that Airflow is beneficial to their career

🔗 Download the State of Airflow 2026 Report

📅 Today’s Picks

ScrapeGraphAI: Scrape Any Website with Natural Language

Problem
Traditional scraping with BeautifulSoup follows a familiar pattern: fetch HTML, inspect elements in DevTools, and write CSS selectors to extract your data.
But websites don’t stay static. When the HTML structure changes, your selectors break and you’re back to rewriting code.
Solution
ScrapeGraphAI uses LLMs to extract data from natural language descriptions. Simply describe what you want in plain English, and the LLM figures out the extraction logic automatically.
Key features:

Self-healing scrapers that adapt when websites are redesigned
Type-safe output with Pydantic schema validation
Built-in JavaScript rendering for React, Vue, and Angular sites
Multi-page scraping with SearchGraph for research tasks
Cloud or local models via OpenAI, Anthropic, or Ollama

Plus, ScrapeGraphAI is open source! Install it with “pip install scrapegraphai”.

📖 View Full Article

🧪 Run code

⭐ View GitHub

🔄 Worth Revisiting

Analyze GitHub Repositories with LangChain Document Loaders

Problem
Are you tired of manually searching through hundreds of GitHub issues with keyword search to find what you need?
Solution
With LangChain’s GitHubIssuesLoader, you can load repository issues into a vector store and query them with natural language instead of exact keywords.
You can ask questions like “What feature requests are related to video?” and get instant, relevant answers from your issue history.

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

hf-mem
[ML]
– CLI to estimate inference memory requirements for Hugging Face models before downloading

fake2db
[Testing]
– Create custom test databases populated with fake data for SQLite, MySQL, PostgreSQL, and MongoDB

MiraTTS
[LLM]
– High-quality text-to-speech model fine-tuned from Spark-TTS with enhanced realism and stability

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

From CSS Selectors to Natural Language: Web Scraping with ScrapeGraphAI
– Web scraping without selector maintenance. ScrapeGraphAI uses LLMs to extract data from any site using plain English prompts and Pydantic schemas.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #280: ScrapeGraphAI: Scrape Any Website with Natural Language Read More »

Newsletter #279: LlamaIndex: From Documents to AI Chatbot in 4 Lines

🤝 COLLABORATION

Building Data Apps with Streamlit
Streamlit makes it easy to turn Python scripts into interactive web apps. But building production-ready applications requires more than basic widgets.
This hands-on guide covers Streamlit’s architecture, caching, session state, and multipage workflows. You’ll learn to handle secrets, work with APIs and databases, and deploy polished apps to the cloud.
By the end, you’ll build a complete solution that analyzes datasets, trains ML models, and powers an AI chatbot with Google Gemini.
🔗 Get the book

📅 Today’s Picks

Slim: Reduce Docker Images by 30x Without Dockerfile Changes

Problem
Docker images include the entire OS layer. For a simple Python app, this is unnecessary because it never touches the shells, compilers, and system utilities bundled inside.
This inflates images to hundreds of megabytes, wasting storage and adding time to every deploy.
Solution
Slim automatically analyzes your container at runtime to identify which files are actually used, then builds a minimal image with only essential components.
Slim works alongside Docker, not instead of it:

Step 1: Build your image with docker build
Step 2: Minify with slim build your-image
Step 3: Push the .slim image to your registry
Your Dockerfile and workflow stay the same

⭐ View GitHub

LlamaIndex: From Documents to AI Chatbot in 4 Lines


Problem
Building LLM applications from scratch requires managing document loading across different formats, configuring embeddings, setting up vector stores, and orchestrating queries. You end up writing boilerplate code instead of focusing on your application logic.
Solution
LlamaIndex provides a unified framework that handles the entire RAG pipeline with minimal code.
Here’s what it gives you:

Auto-detect and load any document format (PDF, TXT, CSV, DOCX)
Create searchable vector indexes instantly
Query with natural language or multi-turn conversations
Built-in memory management for chat applications

🧪 Run code ⭐ View GitHub

☕️ Weekly Finds
fiftyone [ML] – Open source toolkit for building high-quality datasets and computer vision models with visualization and data management
everything-claude-code [LLM] – Complete Claude Code configuration collection with agents, skills, hooks, commands, rules, and MCP servers
qsv [Data Processing] – Ultra-fast CSV command line toolkit for indexing, slicing, analyzing, and transforming CSV files
Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut
Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #279: LlamaIndex: From Documents to AI Chatbot in 4 Lines Read More »

Newsletter #278: LangExtract: LLM-Powered Entity Extraction with One Example

📅 Today’s Picks

Skip Freshly Released Packages Automatically with uv

Problem
Installing updated package versions is essential to benefit from new features and bug fixes.
However, freshly released versions can introduce bugs or incompatibilities before the community has time to catch them.
Solution
uv’s exclude-newer option lets you set a cooldown period to skip packages released within a specified timeframe.
To use it, add exclude-newer = "7 days" to pyproject.toml and customize the duration as needed.

📖 View Full Article

⭐ View GitHub

LangExtract: LLM-Powered Entity Extraction with One Example

Problem
Named entity recognition extracts entities like names, dates, and organizations from text.
But pre-trained NER models can fail on domain-specific text. They weren’t trained on medical terms, so “Metformin 500mg” gets labeled as “LAW” instead of “medication”.
Fixing this means retraining with thousands of labeled examples.
Solution
LangExtract is Google’s LLM-powered extraction library that skips retraining entirely. It works on any domain with just one example.
Plus, every extraction includes:

Exact character positions for source verification
Attribute grouping to link related entities
Interactive visualizations to review results

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

pypdf
[Python Utils]
– Pure-Python PDF library for splitting, merging, cropping, and transforming PDF files

buzz
[ML]
– Transcribe and translate audio offline using OpenAI’s Whisper on your personal computer

autogluon
[ML]
– AWS AutoML toolkit for automating machine learning tasks with strong predictive performance

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #278: LangExtract: LLM-Powered Entity Extraction with One Example Read More »

Newsletter #277: Swap AI Prompts Instantly with MLflow Prompt Registry

📅 Today’s Picks

Swap AI Prompts Instantly with MLflow Prompt Registry

Problem
Finding the right prompt often takes experimentation: tweaking wording, adjusting tone, testing different instructions.
But with prompts hardcoded in your codebase, each test requires a code change and redeployment.
Solution
MLflow Prompt Registry solves this with aliases. Your code references an alias like “production” instead of a version number, so you can swap versions without changing it.
Here’s how it works:

Every prompt edit creates a new immutable version with a commit message
Register prompts once, then assign aliases to specific versions
Deploy to different environments by creating aliases like “staging” and “production”
Track full version history with metadata and tags for each prompt

⭐ View GitHub

🔄 Worth Revisiting

Automate LLM Evaluation at Scale with MLflow make_judge()

Problem
When you ship LLM features without evaluating them, models might hallucinate, violate safety guidelines, or return incorrectly formatted responses.
Manual review doesn’t scale. Reviewers might miss subtle issues when evaluating thousands of outputs, and scoring standards often vary between people.
Solution
MLflow make_judge() applies the same evaluation standards to every output, whether you’re checking 10 or 10,000 responses.
Key capabilities:

Define evaluation criteria once, reuse everywhere
Automatic rationale explaining each judgment
Built-in judges for safety, toxicity, and hallucination detection
Typed outputs that never return unexpected formats

⭐ View GitHub

☕️ Weekly Finds

gspread
[Data Processing]
– Google Sheets Python API for reading, writing, and formatting spreadsheets

zeppelin
[Data Analysis]
– Web-based notebook for interactive data analytics with SQL, Scala, and more

vectorbt
[Data Science]
– Fast engine for backtesting, algorithmic trading, and research in Python

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #277: Swap AI Prompts Instantly with MLflow Prompt Registry Read More »

Newsletter #276: Polars v1.37.0: Faster Lookups with min_by and max_by

📅 Today’s Picks

Stop Manually Tracing Dependencies with uv tree

Problem
Debugging version conflicts requires knowing which packages depend on what. But tracing these relationships manually through nested dependencies is tedious.
Solution
uv tree handles this automatically, displaying the full dependency graph so you can trace any package back to its source.
Key capabilities:

Complete dependency visualization
Flag dependencies with available updates
Find which packages depend on a specific library
Filter the tree to show only a specific package’s dependencies

📖 View Full Article

⭐ View GitHub

Polars v1.37.0: Faster Lookups with min_by and max_by

Problem
Finding the row with the minimum or maximum value based on another column requires sorting, grouping, or complex filter expressions.
Solution
Polars v1.37.0 adds min_by and max_by expression methods. These methods find minimum or maximum values based on any column in a single, readable expression.

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

lmql
[LLM]
– A programming language for constraint-guided and efficient LLM programming based on a superset of Python.

helicone
[MLOps]
– Open-source LLM observability platform with one-line integration for monitoring, analytics, and management.

responses
[Python Utils]
– A utility library for mocking out the Python Requests library in tests.

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings
– Learn what’s new in pandas 3.0: pd.col expressions for cleaner code, Copy-on-Write for predictable behavior, and PyArrow-backed strings for 5-10x faster operations.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #276: Polars v1.37.0: Faster Lookups with min_by and max_by Read More »

Newsletter #275: DrawDB: Visual Schema Design to Production SQL in Minutes

📅 Today’s Picks

pd.col: Polars-Like Column References in pandas 3.0

Problem
Before pandas 3.0, creating columns meant:

Bracket notation: repeats DataFrame name, breaks chaining
assign() with lambdas: verbose syntax, scoping bugs from variable capture

Solution
pandas 3.0 solves this with pd.col expressions: clean column references that chain naturally, with syntax as readable as Polars and PySpark.

📖 View Full Article

🧪 Run code

⭐ View GitHub

DrawDB: Visual Schema Design to Production SQL in Minutes

Problem
Have you ever sketched a database schema on a whiteboard, then spent hours converting it to SQL?
There’s a faster way to go from diagram to production-ready code.
Solution
With DrawDB, your database diagram becomes the code. Just drag tables onto a canvas, connect them visually, and export SQL for 6 databases.
Key benefits:

Draw tables and relationships on a visual canvas
Export production-ready SQL for MySQL, PostgreSQL, SQLite, MariaDB, MSSQL, and Oracle
No account or subscription required
Share diagrams with your team instantly

⭐ View GitHub

☕️ Weekly Finds

timescaledb
[Data Engineer]
– A time-series database for high-performance real-time analytics packaged as a Postgres extension

rembg
[Python Utils]
– A tool to remove images background with Python

grip
[Python Utils]
– Preview GitHub README.md files locally before committing them

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings
– Learn what’s new in pandas 3.0: pd.col expressions for cleaner code, Copy-on-Write for predictable behavior, and PyArrow-backed strings for 5-10x faster operations.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #275: DrawDB: Visual Schema Design to Production SQL in Minutes Read More »

Newsletter #274: ChromaDB: Metadata Filtering for Precise Semantic Search

📅 Today’s Picks

ChromaDB: Metadata Filtering for Precise Semantic Search

Problem
Search for “latest ML research” and semantic search might return highly relevant papers from 2019.
That’s because similarity doesn’t understand constraints. You need metadata filtering to enforce “year >= 2024” at the database level.
Solution
ChromaDB’s where clause lets you combine “find similar” with “but only from 2024.” The database filters first, then ranks by similarity.
Key operators:

$eq, $ne for exact matching
$gt, $gte, $lt, $lte for range queries
$in, $nin for set membership
$and, $or for combining conditions

📖 View Full Article

🧪 Run code

⭐ View GitHub

🔄 Worth Revisiting

Semantic Search in PostgreSQL with pgvector

Problem
Traditional PostgreSQL keyword queries return limited results because they require exact string matches. This approach misses semantically related data that shares meaning but uses different terminology.
Solution
pgvector enables vector search within PostgreSQL. This allows semantic matching of contextually similar content.
Key benefits:

Native PostgreSQL integration with existing databases
Fast exact and approximate nearest neighbor search
Six distance metrics including L2, cosine, inner product, and Hamming
Seamless Python integration via SQLAlchemy or psycopg2

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

RAGxplorer
[LLM]
– Open-source tool to visualize RAG embeddings and explore retrieval augmented generation pipelines interactively

CAMEL
[LLM]
– The first multi-agent framework enabling AI agents to communicate and collaborate while assuming different roles

claude-scientific-skills
[LLM]
– A set of ready-to-use scientific skills for Claude, enabling advanced research and analysis workflows

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings
– Learn what’s new in pandas 3.0: pd.col expressions for cleaner code, Copy-on-Write for predictable behavior, and PyArrow-backed strings for 5-10x faster operations.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #274: ChromaDB: Metadata Filtering for Precise Semantic Search Read More »

Newsletter #273: MarkItDown: YouTube Transcripts to Markdown in One Line

📅 Today’s Picks

MarkItDown: YouTube Transcripts to Markdown in One Line

Problem
Videos contain rich information that’s difficult to search or analyze programmatically.
Manually transcribing and formatting them into structured text is tedious and error-prone.
Solution
MarkItDown eliminates manual transcription by converting YouTube URLs to structured Markdown automatically.
Key benefits:

Output ready for RAG systems or content summarization
Multi-format support: same API for PDFs, Word docs, Excel, and images
Lightweight with minimal dependencies
Consistent Markdown output across all file types

Build question-answering systems over video content without manual transcription.

📖 View Full Article

🧪 Run code

⭐ View GitHub

UV: Define Conflicting Dependencies in One Project

Problem
What happens when your project needs two incompatible versions of the same package?
Version conflicts are a frequent issue in many projects. A typical solution is to split dependencies across different requirements files or environments, which works but adds ongoing maintenance overhead.
Solution
UV’s conflicts declaration lets you define both versions in one project. Just add a flag to switch between them.
Key benefits:

One pyproject.toml for all configurations
Separate resolution paths in a single lockfile
Flag-based switching between environments
Protection from accidentally installing both

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

owl
[LLM]
– Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

dexter
[LLM]
– An autonomous agent for deep financial research

bandit
[Python Utils]
– A tool designed to find common security issues in Python code

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #273: MarkItDown: YouTube Transcripts to Markdown in One Line Read More »

Newsletter #272: Split Large Parquet Files Automatically with Polars

📅 Today’s Picks

Split Large Parquet Files Automatically with Polars

Problem
When writing large datasets to Parquet, you end up with either one massive file that is slow to read or must manually split data into smaller files.
Solution
With Polars PartitionMaxSize, output is automatically broken into multiple Parquet files according to a defined size limit.
This enables:

Parallel reads across multiple cores
Faster, more reliable cloud storage transfers

📖 View Full Article

🧪 Run code

⭐ View GitHub

Coiled: One Decorator Replaces Your Entire Docker Workflow (Sponsored)

Problem
Have you ever had code work locally but fail on cloud VMs because of missing dependencies or version mismatches?
Docker solves this by freezing dependencies, but introduces friction: Dockerfiles, slow builds, registry pushes, and full redeploys for minor package changes.
Solution
Coiled can remove Docker from the workflow entirely. With a single decorator, it automatically syncs your local environment to the cloud.
Key features:

Exact dependency replication from local to cloud
No need for container builds or registry management
Compatible with pandas, Polars, DuckDB, Dask, and more
Faster deployments through smart caching

📖 View Full Article

🌐 Visit website

☕️ Weekly Finds

crewAI
[LLM]
– Framework for orchestrating role-playing autonomous AI agents that work together to accomplish complex tasks

Ray
[MLOps]
– Unified framework for scaling AI and Python applications from laptop to cluster with distributed runtime and ML libraries

Metabase
[Data Viz]
– Open-source business intelligence tool that lets everyone visualize, analyze, and share data insights

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #272: Split Large Parquet Files Automatically with Polars Read More »

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran