Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter Archive

Automated newsletter archive from Klaviyo campaigns

Newsletter #285: Narwhals: One Function for pandas, Polars, and DuckDB

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

Narwhals: One Function for pandas, Polars, and DuckDB

Problem
Teams today use multiple DataFrame libraries side by side. Each backend has its own syntax, so your utility functions end up full of if/elif chains checking types.
This makes even small logic changes expensive, since every backend implementation must be updated.
Solution
Narwhals removes this complexity by providing a unified DataFrame API.
How it works:

Wrap any DataFrame with nw.from_native() (pandas, Polars, DuckDB, PySpark, PyArrow)
Write transformations once using Polars-style operations
Convert back to the original type with nw.to_native()
Zero extra dependencies. Each backend keeps its native performance

📖 View Full Article

🧪 Run code

⭐ View GitHub

uv: Switch Python Versions Without Rebuilding Environments

Problem
Switching Python versions typically requires recreating virtual environments and reinstalling all dependencies from scratch.
This workflow wastes time and can introduce version conflicts when dependencies need to be resolved again.
Solution
UV allows seamless Python version upgrades with uv python pin while preserving existing dependencies.
The process is simple:

Pin the version with uv python pin 3.x
Sync dependencies with uv sync
All cached packages are preserved automatically

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

Airbyte
[Data Engineering]
– Data integration platform with 600+ connectors for ETL/ELT pipelines from APIs, databases, and files to warehouses and lakes

act
[DevOps]
– Run GitHub Actions locally for fast feedback without commit/push cycles, using Docker containers

Dash
[AI Agents]
– Self-learning text-to-SQL agent that grounds answers in six layers of context and improves automatically from failures

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

5 Python Tools for Structured LLM Outputs: A Practical Comparison
– Compare 5 Python tools for structured LLM outputs. Learn when to use Instructor, PydanticAI, LangChain, Outlines, or Guidance for JSON extraction.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #285: Narwhals: One Function for pandas, Polars, and DuckDB Read More »

Newsletter #284: Build AI Agent Memory with Graphiti Knowledge Graphs

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

Build AI Agent Memory with Graphiti Knowledge Graphs

Problem
Traditional RAG pipelines rely on batch processing and static document summaries. When data changes, you re-embed, re-index, and wait.
That delay means your agent is always working with stale information, unable to track how facts evolve over time.
Solution
Graphiti is an open-source Python framework that builds knowledge graphs with real-time, incremental updates. This lets you add new information at any time without reprocessing your entire dataset.
Key features:

Track when facts happened and when they were recorded, so you always know what’s current
Search by meaning, keywords, or relationships in one query
Get the most relevant results for a specific person, company, or entity
Works with Neo4j, FalkorDB, and Kuzu as the graph backend

⭐ View GitHub

Polars sink_csv: Stream Million-Row Exports Without Memory Spikes

Problem
Writing large DataFrames to CSV is memory-intensive because the entire dataset is serialized in memory before being written to disk.
Solution
Polars’ streaming CSV sink avoids this by writing data in chunks rather than all at once.
Key benefits:

Eliminate out-of-memory errors on large exports
Write multi-million row DataFrames with minimal RAM
Support for cloud storage destinations (S3, GCS, Azure)

Switch from write_csv to sink_csv on a lazy frame to enable streaming.

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

Flowise
[AI Agents]
– Low-code platform for building AI agents and workflows visually with drag-and-drop components

cleanlab
[Machine Learning]
– Data-centric AI package that automatically detects data quality issues, label errors, and outliers in ML datasets

OpenBB
[Finance]
– Open-source financial data platform for analysts, quants, and AI agents with dozens of data integrations

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

5 Python Tools for Structured LLM Outputs: A Practical Comparison
– Compare 5 Python tools for structured LLM outputs. Learn when to use Instructor, PydanticAI, LangChain, Outlines, or Guidance for JSON extraction.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #284: Build AI Agent Memory with Graphiti Knowledge Graphs Read More »

Newsletter #283: MLflow: Built-in Scorers for LLM Evaluation Without Custom Logic

Grab your coffee. Here are this week’s highlights.

📅 Today’s Picks

MLflow: Built-in Scorers for LLM Evaluation Without Custom Logic

Problem
Ensuring consistent LLM quality means checking correctness, relevance, and guideline adherence.
But writing custom evaluation logic for each criterion is tedious.
Solution
MLflow provides pre-built scorers for common evaluation patterns with simple decorator syntax for custom metrics.
Key capabilities:

Built-in scorers for correctness and guideline compliance
Simple @mlflow.scorer decorator for custom metrics
Standardized evaluation patterns across projects
Visual summary of all assessment results in MLflow UI

🧪 Run code

⭐ View GitHub

🔄 Worth Revisiting

Swap AI Prompts Instantly with MLflow Prompt Registry

Problem
Finding the right prompt often takes experimentation: tweaking wording, adjusting tone, testing different instructions.
But with prompts hardcoded in your codebase, each test requires a code change and redeployment.
Solution
MLflow Prompt Registry solves this with aliases. Your code references an alias like “production” instead of a version number, so you can swap versions without changing it.
Here’s how it works:

Every prompt edit creates a new immutable version with a commit message
Register prompts once, then assign aliases to specific versions
Deploy to different environments by creating aliases like “staging” and “production”
Track full version history with metadata and tags for each prompt

⭐ View GitHub

📢 ANNOUNCEMENTS

Introducing CodeCut Premium
I put a lot of effort into making every CodeCut blog clear, practical, and example-driven. Still, there’s a gap between reading code and actually writing it yourself.
CodeCut Premium bridges that gap with interactive courses that let you:

Execute code directly in your browser
Skip installation and environment setup
Test your understanding with built-in quizzes
Learn faster than sitting through long video courses

I plan to add new courses regularly, with a focus on quality and depth. The catalog is still growing, and Founding Members get early access plus exclusive perks as it expands.
Founding Members receive lifetime $12/month pricing, full access to all courses, and early influence on future content.
Founding pricing ends March 31, 2026.

🔗 Learn More

☕️ Weekly Finds

zipline
[Finance]
– Pythonic algorithmic trading library with event-driven backtesting for building and testing trading strategies

outlines
[LLM]
– Structured text generation library that constrains LLM outputs to follow specific schemas, formats, and data types

responses
[Testing]
– Utility library for mocking out the Python Requests library in tests with simple decorators and context managers

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

5 Python Tools for Structured LLM Outputs: A Practical Comparison
– Compare 5 Python tools for structured LLM outputs. Learn when to use Instructor, PydanticAI, LangChain, Outlines, or Guidance for JSON extraction.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #283: MLflow: Built-in Scorers for LLM Evaluation Without Custom Logic Read More »

Newsletter #282: Build Structured LLM Outputs with Guidance Constraints

Grab your coffee. Here are this week’s highlights.

🤝 COLLABORATION

What Data Engineers Really Think About Airflow (5.8K Surveyed)
Astronomer analyzed 5.8k+ responses from data engineers on how they are navigating Airflow today and the findings might surprise you.
You’ll learn:

How early adopters are using Airflow 3 features in production
Which teams are bringing AI into production and what’s holding others back
94% believe that Airflow is beneficial to their career

🔗 Download the State of Airflow 2026 Report

📅 Today’s Picks

Build Structured LLM Outputs with Guidance Constraints

Problem
Tools like Instructor and PydanticAI validate outputs after generation. If validation fails, they send the error back to the LLM and retry.
Each retry means paying for tokens that didn’t produce usable output.
Solution
Guidance works differently. It constrains tokens during generation, so invalid outputs can’t be produced in the first place.
Key capabilities:

Constrained outputs via regex patterns and selection functions
Python control flow (if/else, loops) during generation
JSON generation with Pydantic schema validation

📖 View Full Article

🧪 Run code

⭐ View GitHub

pandas 3.0: The End of SettingWithCopyWarning

Problem
When you filter a DataFrame and modify the result, you expect the original to stay unchanged.
But sometimes pandas modified your original data anyway, triggering the SettingWithCopyWarning.
Solution
pandas 3.0 fixes this. Filtering now always creates a separate copy, so modifying the result never affects your original data.
Upgrade to pandas 3.0 with “pip install -U pandas”.

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

fake2db
[Data]
– Create custom test databases populated with fake data across SQLite, MySQL, PostgreSQL, MongoDB, Redis, and CouchDB

POT
[ML]
– Python Optimal Transport library providing solvers for optimization problems in signal processing, image processing, and machine learning

graphic-walker
[Data Viz]
– Open-source Tableau alternative for data scientists to analyze data and visualize patterns with drag-and-drop operations

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

From CSS Selectors to Natural Language: Web Scraping with ScrapeGraphAI
– Web scraping without selector maintenance. ScrapeGraphAI uses LLMs to extract data from any site using plain English prompts and Pydantic schemas.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #282: Build Structured LLM Outputs with Guidance Constraints Read More »

Newsletter #281: MarkItDown: From Images to Searchable Text in Seconds

Grab your coffee. Here are this week’s highlights.

🤝 COLLABORATION

What Data Engineers Really Think About Airflow (5.8K Surveyed)
Astronomer analyzed 5.8k+ responses from data engineers on how they are navigating Airflow today and the findings might surprise you.
You’ll learn:

How early adopters are using Airflow 3 features in production
Which teams are bringing AI into production and what’s holding others back
35.6% believe that Airflow is beneficial to their career

🔗 Download the State of Airflow 2026 Report

📅 Today’s Picks

Query Multiple Databases at Once with DuckDB

Problem
Working with data across PostgreSQL, MySQL, and SQLite often means managing multiple database connections and additional integration overhead.
That overhead adds up quickly when your goal is simply to analyze data across sources.
Solution
DuckDB removes the friction by allowing you to join tables across databases with a single query.
Key benefits:

Join SQLite, PostgreSQL, MySQL, and Parquet files in a single SQL statement
Automatic connection handling across all sources
Filters run at the source database, so only matching rows are transferred

⭐ View GitHub

MarkItDown: From Images to Searchable Text in Seconds

Problem
Charts, diagrams, and screenshots in your documents need text descriptions to be searchable and processable.
But writing descriptions manually is slow and produces inconsistent results across large document sets.
Solution
MarkItDown, an open-source library from Microsoft, integrates with OpenAI to automatically generate detailed descriptions of images.
Key capabilities:

Generate consistent descriptions across hundreds of images
Process images from documents like PowerPoint and PDF files
Customize the description prompt for your specific needs

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

Skill_Seekers
[LLM]
– Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection

sqlit
[Data]
– A user-friendly TUI for SQL databases supporting SQL Server, MySQL, PostgreSQL, SQLite, Turso and more

giskard
[ML]
– Open-source CI/CD platform for ML teams to eliminate AI bias and deliver quality ML products faster

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

From CSS Selectors to Natural Language: Web Scraping with ScrapeGraphAI
– Web scraping without selector maintenance. ScrapeGraphAI uses LLMs to extract data from any site using plain English prompts and Pydantic schemas.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #281: MarkItDown: From Images to Searchable Text in Seconds Read More »

Newsletter #280: ScrapeGraphAI: Scrape Any Website with Natural Language

🤝 COLLABORATION

What Data Engineers Really Think About Airflow (5.8K Surveyed)
Astronomer analyzed 5.8k+ responses from data engineers on how they are navigating Airflow today and the findings might surprise you.
You’ll learn:

How early adopters are using Airflow 3 features in production
Which teams are bringing AI into production and what’s holding others back
35.6% believe that Airflow is beneficial to their career

🔗 Download the State of Airflow 2026 Report

📅 Today’s Picks

ScrapeGraphAI: Scrape Any Website with Natural Language

Problem
Traditional scraping with BeautifulSoup follows a familiar pattern: fetch HTML, inspect elements in DevTools, and write CSS selectors to extract your data.
But websites don’t stay static. When the HTML structure changes, your selectors break and you’re back to rewriting code.
Solution
ScrapeGraphAI uses LLMs to extract data from natural language descriptions. Simply describe what you want in plain English, and the LLM figures out the extraction logic automatically.
Key features:

Self-healing scrapers that adapt when websites are redesigned
Type-safe output with Pydantic schema validation
Built-in JavaScript rendering for React, Vue, and Angular sites
Multi-page scraping with SearchGraph for research tasks
Cloud or local models via OpenAI, Anthropic, or Ollama

Plus, ScrapeGraphAI is open source! Install it with “pip install scrapegraphai”.

📖 View Full Article

🧪 Run code

⭐ View GitHub

🔄 Worth Revisiting

Analyze GitHub Repositories with LangChain Document Loaders

Problem
Are you tired of manually searching through hundreds of GitHub issues with keyword search to find what you need?
Solution
With LangChain’s GitHubIssuesLoader, you can load repository issues into a vector store and query them with natural language instead of exact keywords.
You can ask questions like “What feature requests are related to video?” and get instant, relevant answers from your issue history.

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

hf-mem
[ML]
– CLI to estimate inference memory requirements for Hugging Face models before downloading

fake2db
[Testing]
– Create custom test databases populated with fake data for SQLite, MySQL, PostgreSQL, and MongoDB

MiraTTS
[LLM]
– High-quality text-to-speech model fine-tuned from Spark-TTS with enhanced realism and stability

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

From CSS Selectors to Natural Language: Web Scraping with ScrapeGraphAI
– Web scraping without selector maintenance. ScrapeGraphAI uses LLMs to extract data from any site using plain English prompts and Pydantic schemas.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #280: ScrapeGraphAI: Scrape Any Website with Natural Language Read More »

Newsletter #279: LlamaIndex: From Documents to AI Chatbot in 4 Lines

🤝 COLLABORATION

Building Data Apps with Streamlit
Streamlit makes it easy to turn Python scripts into interactive web apps. But building production-ready applications requires more than basic widgets.
This hands-on guide covers Streamlit’s architecture, caching, session state, and multipage workflows. You’ll learn to handle secrets, work with APIs and databases, and deploy polished apps to the cloud.
By the end, you’ll build a complete solution that analyzes datasets, trains ML models, and powers an AI chatbot with Google Gemini.
🔗 Get the book

📅 Today’s Picks

Slim: Reduce Docker Images by 30x Without Dockerfile Changes

Problem
Docker images include the entire OS layer. For a simple Python app, this is unnecessary because it never touches the shells, compilers, and system utilities bundled inside.
This inflates images to hundreds of megabytes, wasting storage and adding time to every deploy.
Solution
Slim automatically analyzes your container at runtime to identify which files are actually used, then builds a minimal image with only essential components.
Slim works alongside Docker, not instead of it:

Step 1: Build your image with docker build
Step 2: Minify with slim build your-image
Step 3: Push the .slim image to your registry
Your Dockerfile and workflow stay the same

⭐ View GitHub

LlamaIndex: From Documents to AI Chatbot in 4 Lines


Problem
Building LLM applications from scratch requires managing document loading across different formats, configuring embeddings, setting up vector stores, and orchestrating queries. You end up writing boilerplate code instead of focusing on your application logic.
Solution
LlamaIndex provides a unified framework that handles the entire RAG pipeline with minimal code.
Here’s what it gives you:

Auto-detect and load any document format (PDF, TXT, CSV, DOCX)
Create searchable vector indexes instantly
Query with natural language or multi-turn conversations
Built-in memory management for chat applications

🧪 Run code ⭐ View GitHub

☕️ Weekly Finds
fiftyone [ML] – Open source toolkit for building high-quality datasets and computer vision models with visualization and data management
everything-claude-code [LLM] – Complete Claude Code configuration collection with agents, skills, hooks, commands, rules, and MCP servers
qsv [Data Processing] – Ultra-fast CSV command line toolkit for indexing, slicing, analyzing, and transforming CSV files
Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut
Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #279: LlamaIndex: From Documents to AI Chatbot in 4 Lines Read More »

Newsletter #278: LangExtract: LLM-Powered Entity Extraction with One Example

📅 Today’s Picks

Skip Freshly Released Packages Automatically with uv

Problem
Installing updated package versions is essential to benefit from new features and bug fixes.
However, freshly released versions can introduce bugs or incompatibilities before the community has time to catch them.
Solution
uv’s exclude-newer option lets you set a cooldown period to skip packages released within a specified timeframe.
To use it, add exclude-newer = "7 days" to pyproject.toml and customize the duration as needed.

📖 View Full Article

⭐ View GitHub

LangExtract: LLM-Powered Entity Extraction with One Example

Problem
Named entity recognition extracts entities like names, dates, and organizations from text.
But pre-trained NER models can fail on domain-specific text. They weren’t trained on medical terms, so “Metformin 500mg” gets labeled as “LAW” instead of “medication”.
Fixing this means retraining with thousands of labeled examples.
Solution
LangExtract is Google’s LLM-powered extraction library that skips retraining entirely. It works on any domain with just one example.
Plus, every extraction includes:

Exact character positions for source verification
Attribute grouping to link related entities
Interactive visualizations to review results

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

pypdf
[Python Utils]
– Pure-Python PDF library for splitting, merging, cropping, and transforming PDF files

buzz
[ML]
– Transcribe and translate audio offline using OpenAI’s Whisper on your personal computer

autogluon
[ML]
– AWS AutoML toolkit for automating machine learning tasks with strong predictive performance

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #278: LangExtract: LLM-Powered Entity Extraction with One Example Read More »

Newsletter #277: Swap AI Prompts Instantly with MLflow Prompt Registry

📅 Today’s Picks

Swap AI Prompts Instantly with MLflow Prompt Registry

Problem
Finding the right prompt often takes experimentation: tweaking wording, adjusting tone, testing different instructions.
But with prompts hardcoded in your codebase, each test requires a code change and redeployment.
Solution
MLflow Prompt Registry solves this with aliases. Your code references an alias like “production” instead of a version number, so you can swap versions without changing it.
Here’s how it works:

Every prompt edit creates a new immutable version with a commit message
Register prompts once, then assign aliases to specific versions
Deploy to different environments by creating aliases like “staging” and “production”
Track full version history with metadata and tags for each prompt

⭐ View GitHub

🔄 Worth Revisiting

Automate LLM Evaluation at Scale with MLflow make_judge()

Problem
When you ship LLM features without evaluating them, models might hallucinate, violate safety guidelines, or return incorrectly formatted responses.
Manual review doesn’t scale. Reviewers might miss subtle issues when evaluating thousands of outputs, and scoring standards often vary between people.
Solution
MLflow make_judge() applies the same evaluation standards to every output, whether you’re checking 10 or 10,000 responses.
Key capabilities:

Define evaluation criteria once, reuse everywhere
Automatic rationale explaining each judgment
Built-in judges for safety, toxicity, and hallucination detection
Typed outputs that never return unexpected formats

⭐ View GitHub

☕️ Weekly Finds

gspread
[Data Processing]
– Google Sheets Python API for reading, writing, and formatting spreadsheets

zeppelin
[Data Analysis]
– Web-based notebook for interactive data analytics with SQL, Scala, and more

vectorbt
[Data Science]
– Fast engine for backtesting, algorithmic trading, and research in Python

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #277: Swap AI Prompts Instantly with MLflow Prompt Registry Read More »

Newsletter #276: Polars v1.37.0: Faster Lookups with min_by and max_by

📅 Today’s Picks

Stop Manually Tracing Dependencies with uv tree

Problem
Debugging version conflicts requires knowing which packages depend on what. But tracing these relationships manually through nested dependencies is tedious.
Solution
uv tree handles this automatically, displaying the full dependency graph so you can trace any package back to its source.
Key capabilities:

Complete dependency visualization
Flag dependencies with available updates
Find which packages depend on a specific library
Filter the tree to show only a specific package’s dependencies

📖 View Full Article

⭐ View GitHub

Polars v1.37.0: Faster Lookups with min_by and max_by

Problem
Finding the row with the minimum or maximum value based on another column requires sorting, grouping, or complex filter expressions.
Solution
Polars v1.37.0 adds min_by and max_by expression methods. These methods find minimum or maximum values based on any column in a single, readable expression.

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

lmql
[LLM]
– A programming language for constraint-guided and efficient LLM programming based on a superset of Python.

helicone
[MLOps]
– Open-source LLM observability platform with one-line integration for monitoring, analytics, and management.

responses
[Python Utils]
– A utility library for mocking out the Python Requests library in tests.

Looking for a specific tool? Explore 70+ Python tools →

📚 Latest Deep Dives

What’s New in pandas 3.0: Expressions, Copy-on-Write, and Faster Strings
– Learn what’s new in pandas 3.0: pd.col expressions for cleaner code, Copy-on-Write for predictable behavior, and PyArrow-backed strings for 5-10x faster operations.

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #276: Polars v1.37.0: Faster Lookups with min_by and max_by Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran