Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

newsletter-archive

Auto-created tag for newsletter-archive

Code example: Automate LLM Evaluation at Scale with MLflow make_judge()

Newsletter #271: Automate LLM Evaluation at Scale with MLflow make_judge()

📅
Today’s Picks

Automate LLM Evaluation at Scale with MLflow make_judge()

Problem:

When you ship LLM features without evaluating them, models might hallucinate, violate safety guidelines, or return incorrectly formatted responses.Manual review doesn’t scale. Reviewers might miss subtle issues when evaluating thousands of outputs, and scoring standards often vary between people.

Solution:

MLflow make_judge() applies the same evaluation standards to every output, whether you’re checking 10 or 10,000 responses.Key capabilities:
Define evaluation criteria once, reuse everywhere
Automatic rationale explaining each judgment
Built-in judges for safety, toxicity, and hallucination detection
Typed outputs that never return unexpected formats

Run Code

View GitHub


Worth Revisiting

LangChain v1.0: Auto-Protect Sensitive Data with PIIMiddleware

Problem:

User messages often contain sensitive information like emails and phone numbers.Logging or storing this data without protection creates compliance and security risks.

Solution:

LangChain v1.0 introduces PIIMiddleware to automatically protect sensitive data before model processing.PIIMiddleware supports multiple protection modes:
5 built-in detectors (email, credit card, IP, MAC, URL)
Custom regex for any PII pattern
Replace with [REDACTED], mask as ****1234, or block entirely

Full Article:

Build Production-Ready LLM Agents with LangChain 1.0 Middleware

Run Code

View GitHub

☕️
Weekly Finds

litellm

LLM

Python SDK and Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI format with cost tracking, guardrails, and logging.

parlant

LLM

LLM agents built for control with behavioral guidelines, ensuring predictable and consistent agent behavior.

GLiNER2

ML

Unified schema-based information extraction for NER, text classification, and structured data parsing in one pass.

Looking for a specific tool?
Explore 70+ Python tools →

Favorite

Newsletter #271: Automate LLM Evaluation at Scale with MLflow make_judge() Read More »

Code example: PydanticAI: Type-Safe LLM Outputs with Auto-Validation

Newsletter #270: PydanticAI: Type-Safe LLM Outputs with Auto-Validation

📅
Today’s Picks

Yellowbrick: Detect Overfitting vs Underfitting Visually

Problem:

Hyperparameter tuning requires finding the sweet spot between underfitting (model too simple) and overfitting (model memorizes training data).You could write the loop, run cross-validation for each value, collect scores, and format the plot yourself. But that’s boilerplate you’ll repeat across projects.

Solution:

Yellowbrick is a machine learning visualization library built for exactly this.Its ValidationCurve shows you what’s working, what’s not, and what to fix next without the boilerplate or inconsistent formatting.How to read the plot in this example:
Training score (blue) stays high as max_depth increases
Validation score (green) drops after depth 4
The growing gap means the model memorizes training data but fails on new data
Action: Pick max_depth around 3-4 where validation score peaks before the gap widens.

Full Article:

Visualize Machine Learning Results with Yellowbrick

Run Code

View GitHub

PydanticAI: Type-Safe LLM Outputs with Auto-Validation

Problem:

Without structured outputs, you’re working with raw text that might not match your expected format.Unexpected responses, missing fields, or wrong data types can cause errors that are easy to miss during development.

Solution:

PydanticAI uses Pydantic models to automatically validate and structure LLM responses.Key benefits:
Type safety at runtime with validated Python objects
Automatic retry on validation failures
Direct field access without manual parsing
Integration with existing Pydantic workflows
LangChain works too, but PydanticAI is a lighter alternative when you just need structured outputs.

Full Article:

Enforce Structured Outputs from LLMs with PydanticAI

Run Code

View GitHub

📚
Top 5 Articles of 2025

A Deep Dive into DuckDB for Data Scientists

Query billions of rows on your laptop with DuckDB. Learn SQL analytics, Parquet integration, and when to choose DuckDB over pandas.

Top 6 Python Libraries for Visualization: Which One to Use?

Compare Matplotlib, Seaborn, Plotly, Altair, Bokeh, and PyGWalker. Find the right visualization library for your data science workflow.

Transform Any PDF into Searchable AI Data with Docling

Extract text, tables, and structure from PDFs for RAG pipelines. Docling handles complex layouts that break traditional parsers.

Narwhals: Unified DataFrame Functions for pandas, Polars, and PySpark

Write DataFrame code once, run it on pandas, Polars, or PySpark. Narwhals provides a unified API without vendor lock-in.

Goodbye Pip and Poetry. Why UV Might Be All You Need

Replace pip, virtualenv, pyenv, and Poetry with one tool. UV handles Python versions, dependencies, and reproducible builds in a single workflow.

☕️
Weekly Finds

pdfplumber

Data Processing

Plumb a PDF for detailed information about each char, rectangle, line, et cetera – and easily extract text and tables.

cognee

LLM

Memory for AI Agents in 6 lines of code – transforms data into knowledge graphs for persistent, scalable AI memory.

featuretools

ML

An open source Python library for automated feature engineering from relational and temporal datasets.

Looking for a specific tool?
Explore 70+ Python tools →

Favorite

Newsletter #270: PydanticAI: Type-Safe LLM Outputs with Auto-Validation Read More »

Code example: LangChain v1.2.0: Build Multi-Provider Agents with Extras

Newsletter #269: LangChain v1.2.0: Build Multi-Provider Agents with Extras

📅
Today’s Picks

LangChain v1.2.0: Build Multi-Provider Agents with Extras

Problem:

Different LLM providers require different tool configurations: parallel vs sequential execution, strict mode, token limits.This creates scattered configs and manual provider switching throughout your code.

Solution:

LangChain v1.2.0 introduces the extras attribute that attaches provider-specific configurations directly to tool definitions.With extras, you can:
Define all provider configs in one place
Switch providers without touching multiple files
Keep configs in sync across environments

Full Article:

Run Private AI Workflows with LangChain and Ollama

View GitHub

GLiNER: Extract Any Entity Type with Zero-Shot NER

Problem:

Named Entity Recognition (NER) extracts key information like names, dates, and organizations from text. But standard models are limited to predefined entity types like PERSON, ORG, and DATE.If you need to extract something specific, you’d normally have to train a custom model with thousands of labeled examples.

Solution:

GLiNER changes that with zero-shot entity extraction, allowing you to extract any entity type without training.Key benefits:
Works out-of-the-box with any text domain
Handles multiple entity types in a single pass
Returns confidence scores for each extraction
Integrates with spaCy and other NLP pipelines

Full Article:

langextract vs spaCy: AI-Powered vs Rule-Based Entity Extraction

Run Code

View GitHub

☕️
Weekly Finds

timescaledb

Data Engineer

PostgreSQL extension for high-performance real-time analytics on time-series and event data

slim

MLOps

Inspect, optimize, and minify Docker container images without sacrificing functionality

drawdb

Data Engineer

Free, simple, and intuitive online database diagram editor and SQL generator

Looking for a specific tool?
Explore 70+ Python tools →

Favorite

Newsletter #269: LangChain v1.2.0: Build Multi-Provider Agents with Extras Read More »

Code example: Faster Table Joins with Polars Multi-Threading

Newsletter #268: Faster Table Joins with Polars Multi-Threading

📅
Today’s Picks

Faster Table Joins with Polars Multi-Threading

Problem:

pandas processes joins on a single CPU core, leaving other cores idle during large table operations.

Solution:

Polars distributes join operations across all available CPU cores, achieving significantly faster joins than pandas on large datasets.What makes Polars fast:
Processes rows in parallel batches
Uses all available CPU cores
Zero configuration required

Full Article:

pandas vs Polars vs DuckDB: A Data Scientist’s Guide to Choosing the Right Tool

Run Code

View GitHub


Worth Revisiting

Faster Polars Queries with Programmatic Expressions

Problem:

When you want to use for loops to apply similar transformations, each Polars with_columns() call processes sequentially.This prevents the optimizer from seeing the full computation plan.

Solution:

Instead, generate all Polars expressions programmatically before applying them together.This enables Polars to:
See the complete computation plan upfront
Optimize across all expressions simultaneously
Parallelize operations across CPU cores

Full Article:

Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames

Run Code

View GitHub

☕️
Weekly Finds

Mole

Python Utils

Deep clean and optimize your Mac with a simple command-line tool.

marker

LLM

Convert PDF, DOCX, PPTX, and other documents to markdown with high speed and accuracy.

pathway

Data Engineer

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Looking for a specific tool?
Explore 70+ Python tools →

Favorite

Newsletter #268: Faster Table Joins with Polars Multi-Threading Read More »

Code example: Build Professional Python Packages with UV --package

Newsletter #267: Build Professional Python Packages with UV –package


Worth Revisiting

Build Professional Python Packages with UV –package

Problem:

Python packages turn your code into reusable modules you can share across projects.But building them requires complex setup with setuptools, managing build systems, and understanding distribution mechanics.

Solution:

UV, a fast Python package installer and resolver, reduces the entire process to 2 simple commands:
uv init –package sets up your package structure instantly
uv build and uv publish to create and distribute to PyPI

Learn More:

Production-Ready Data Science: From Prototyping to Production with Python

View GitHub

Generate Time-Sortable IDs with Python 3.14’s UUID v7

Problem:

UUID4 generates purely random identifiers that lack chronological ordering.Without embedded timestamps, you need separate timestamp fields and custom sorting logic to organize records by creation time.

Solution:

Python 3.14 introduces UUID version 7 with built-in timestamp ordering.Key features:
Determine creation order by comparing two UUIDs directly
Retrieve exact creation time by extracting the embedded timestamp

📚
Latest Deep Dives

Visualize Machine Learning Results with Yellowbrick

Learn to visualize ML model performance with Yellowbrick. Create confusion matrices, ROC curves, and feature importance plots in scikit-learn pipelines.

☕️
Weekly Finds

smolagents

LLM

A barebones library for agents that think in code

rembg

ML

A tool to remove images background

Scrapegraph-ai

LLM

Python scraper based on AI

Looking for a specific tool?
Explore 70+ Python tools →

Favorite

Newsletter #267: Build Professional Python Packages with UV –package Read More »

Code example: Python 3.14: Type-Safe String Interpolation with t-strings

Newsletter #266: Python 3.14: Type-Safe String Interpolation with t-strings


Worth Revisiting

Python 3.14: Type-Safe String Interpolation with t-strings

Problem:

Building SQL queries with f-strings directly embeds user input into the query string, allowing attackers to inject malicious SQL commands.Parameterized queries are secure but require you to maintain query templates and value lists separately.

Solution:

Python 3.14 introduces template string literals (t-strings). Instead of returning strings, they return Template objects that safely expose interpolated values.This lets you validate and sanitize interpolated values before building the final query.

Run Code

Build Self-Documenting Regex with Pregex

Problem:

Regex patterns like [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} are difficult to read and intimidating.Team members without regex expertise might struggle to understand and modify these validation patterns.

Solution:

Team members without regex expertise might struggle to understand and modify these validation patterns.Pregex transforms regex into readable Python code using descriptive components.Key benefits:
Code that explains its intent without comments
Easy modification without regex expertise
Composable patterns for complex validation
Export to regex format when needed

Full Article:

Choose the Right Text Pattern Tool: Regex, Pregex, or Pyparsing

Run Code

View GitHub

📚
Latest Deep Dives

Visualize Machine Learning Results with Yellowbrick

Learn to visualize ML model performance with Yellowbrick. Create confusion matrices, ROC curves, and feature importance plots in scikit-learn pipelines.

☕️
Weekly Finds

MindsDB

LLM

AI data automation solution that connects and unifies enterprise data for real-time decision-making.

MarkItDown

Python Utils

Lightweight Python utility for converting various files to Markdown for use with LLMs.

Reflex

Python Utils

Open-source framework empowering Python developers to build web apps faster in a single language.

Looking for a specific tool?
Explore 70+ Python tools →

Favorite

Newsletter #266: Python 3.14: Type-Safe String Interpolation with t-strings Read More »

Code example: PySpark 4.0: Query Nested JSON Without StructType

Newsletter #265: PySpark 4.0: Query Nested JSON Without StructType

📅
Today’s Picks

PySpark 4.0: Query Nested JSON Without StructType

Problem:

Extracting nested JSON in PySpark requires defining StructType inside StructType inside StructType. This creates verbose, inflexible code that breaks when your JSON structure changes.

Solution:

PySpark 4.0’s Variant type lets you skip schema definitions entirely. All you need is parse_json() to load and variant_get() to extract with JSONPath.Key benefits:
No upfront schema definition
Handle any nesting depth with simple $.path syntax
Schema changes don’t break your code
Extract only the fields you need, when you need them

Full Article:

What’s New in PySpark 4.0

Run Code

View GitHub


Worth Revisiting

PySpark 4.0: Native Plotting API for DataFrames

Problem:

Visualizing PySpark DataFrames typically requires converting to Pandas first, adding memory overhead and extra processing steps.

Solution:

PySpark 4.0 adds native Plotly-powered plotting, enabling direct .plot() calls on DataFrames without Pandas conversion.

Full Article:

What’s New in PySpark 4.0

Run Code

View GitHub

📚
Latest Deep Dives

Visualize Machine Learning Results with Yellowbrick

Learn to visualize ML model performance with Yellowbrick. Create confusion matrices, ROC curves, and feature importance plots in scikit-learn pipelines.

☕️
Weekly Finds

toon

LLM

Compact, human-readable JSON encoding for LLM prompts with schema-aware Token-Oriented Object Notation

cocoindex

Data Processing

Ultra performant data transformation framework for AI with incremental processing

sqlfluff

Data Engineer

Modular SQL linter and auto-formatter with support for multiple dialects and templated code

Looking for a specific tool?
Explore 70+ Python tools →

Favorite

Newsletter #265: PySpark 4.0: Query Nested JSON Without StructType Read More »

Code example: Codon: One Decorator to Turn Python into C Speed

Newsletter #264: Codon: One Decorator to Turn Python into C Speed

📅
Today’s Picks

Stream Large CSVs to Parquet with Polars sink_parquet

Problem:

Traditional workflows load the full CSV into memory before writing, which crashes when the file is too large.

Solution:

Polars sink_parquet() streams data directly from CSV to Parquet without loading the entire file into memory.Instead of load-then-write, sink_parquet uses read-write-release:
Reads a chunk from CSV
Writes it to Parquet
Releases memory before next chunk
Repeats until complete

Full Article:

pandas vs Polars vs DuckDB: A Data Scientist’s Guide to Choosing the Right Tool

Run Code

View GitHub

Codon: One Decorator to Turn Python into C Speed

Problem:

Slow Python functions in large codebases are painful to optimize. You might try Numba or Cython, but Numba only works for numerical code with NumPy arrays.You might try Cython, but it needs .pyx files, variable type annotations, and build setup. That’s hours of refactoring before you see any speedup.

Solution:

Codon solves this with a single @codon.jit decorator that compiles your Python to machine code.Key benefits:
Works on any Python code, not just NumPy arrays
No type annotations required since types are inferred automatically
Compiled functions are cached for instant repeated calls
Zero code changes beyond adding the decorator

Run Code

View GitHub

☕️
Weekly Finds

metabase

Data Viz

Open-source Business Intelligence and Embedded Analytics tool that lets everyone work with data

Surprise

ML

Python scikit for building and analyzing recommender systems with SVD, KNN, and more algorithms

highdimensional-decision-boundary-plot

Data Viz

Scikit-learn compatible approach to plot high-dimensional decision boundaries for intuitive model understanding

Looking for a specific tool?
Explore 70+ Python tools →

Favorite

Newsletter #264: Codon: One Decorator to Turn Python into C Speed Read More »

Code example: Analyze GitHub Repositories with LangChain Document Loaders

Newsletter #263: Analyze GitHub Repositories with LangChain Document Loaders

📅
Today’s Picks

Build a Simple Portfolio Analyzer in Python with ffn

Problem:

If you have ever wanted a simple way to analyze your investment portfolio as a side project, you know how tedious it is to piece together multiple Python libraries.

Solution:

ffn consolidates the entire portfolio analysis workflow into one package with a Pandas-like API.Core features:
Fetch stock prices directly from Yahoo Finance
Calculate returns and risk metrics automatically
Find the best allocation across your assets
Plot performance comparisons and correlations

Run Code

View GitHub

Analyze GitHub Repositories with LangChain Document Loaders

Problem:

Are you tired of manually searching through hundreds of GitHub issues with keyword search to find what you need?

Solution:

With LangChain’s GitHubIssuesLoader, you can load repository issues into a vector store and query them with natural language instead of exact keywords.You can ask questions like “What feature requests are related to video?” and get instant, relevant answers from your issue history.

Full Article:

Run Private AI Workflows with LangChain and Ollama

Run Code

View GitHub

☕️
Weekly Finds

PlotNeuralNet

Data Viz

LaTeX code for drawing publication-quality neural network diagrams for reports and presentations

yellowbrick

ML

Visual analysis and diagnostic tools for machine learning with scikit-learn integration

TPOT

MLOps

Python Automated Machine Learning tool that optimizes ML pipelines using genetic programming

Looking for a specific tool?
Explore 70+ Python tools →

Favorite

Newsletter #263: Analyze GitHub Repositories with LangChain Document Loaders Read More »

Code example: Build Visual Tables with Great Tables Nanoplots

Newsletter #261: Build Visual Tables with Great Tables Nanoplots

🤝
COLLABORATION

Data Contracts: Developing Production Grade Pipelines at Scale

Poor data quality can cause major problems for data teams, from disrupting pipelines to losing consumer trust. Many teams struggle with this, especially when data comes from upstream workflows outside their control.The solution: data contracts. They document expectations, establish ownership, and enforce constraints within CI/CD workflows.This practical book introduces data contract architecture, explains why the industry needs it, and shares real-world production use cases. You’ll learn to implement components and build a case for adoption in your organization.

Try Chapter 7 in your browser

📅
Today’s Picks

Build Visual Tables with Great Tables Nanoplots

Problem:

Data tables with raw numbers lack visual context.You can’t spot trends or patterns at a glance when looking at columns of digits.

Solution:

Great Tables’ fmt_nanoplot() embeds mini line or bar charts directly into table cells.Key features:
Transform numeric series into scannable visualizations
Customize colors and styles for data points and lines
Switch between line plots and bar charts
Add data area shading for emphasis

Full Article:

Great Tables: Build Publication-Ready Tables in Python

Run Code

View GitHub


Related Post

Great Tables: Transform DataFrames into Publication-Ready Reports

Problem:

Standard DataFrame output can feel clunky and unfinished. Without clean headers, readable dates, or currency formatting, even great data can look unprofessional.

Solution:

Great Tables elevates your DataFrames into polished tables built for reports, dashboards, and presentations, all through one chainable interface.Key features:
Number formatting: currency, dates, compact notation
Visual enhancements: mini charts, color gradients, embedded images
Table structure: headers, subtitles, column control
Multi-format export: PNG, PDF, HTML

Full Article:

Great Tables: Build Publication-Ready Tables in Python

Run Code

View GitHub

☕️
Weekly Finds

TabPFN

ML

Foundation model for tabular data with zero-shot classification and regression capabilities

scikit-survival

ML

Survival analysis built on top of scikit-learn for time-to-event prediction

dedupe

Data Processing

Python library for fuzzy matching, record deduplication and entity resolution using machine learning

Looking for a specific tool?
Explore 70+ Python tools →

Favorite

Newsletter #261: Build Visual Tables with Great Tables Nanoplots Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran