Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter Archive

Automated newsletter archive from Klaviyo campaigns

Code example: Build Grammar Rules with PyParsing Without Regex Maintenance

Newsletter #236: Build Grammar Rules with PyParsing Without Regex Maintenance

📅
Today’s Picks

Build Grammar Rules with PyParsing Without Regex Maintenance

Problem:

Regular expressions can be powerful but often become verbose and hard to maintain, especially when accounting for variable whitespace or special characters.

Solution:

PyParsing offers a cleaner alternative. It lets you define grammar rules using Python classes, making the parsing logic explicit and easier to maintain.PyParsing advantages over regex:
Whitespace: Automatically handled without extra tokens
Readability: Self-documenting code structure
Data access: Use dot notation rather than numeric groups
Scalability: Combine reusable components to build complex grammars

Full Article:

Choose the Right Text Pattern Tool: Regex, Pregex, or Pyparsing

Run Code

View GitHub


Related Post

Build Self-Documenting Regex with Pregex

Problem:

Regex patterns like [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} are difficult to read and intimidating.Team members without regex expertise might struggle to understand and modify these validation patterns.

Solution:

Team members without regex expertise might struggle to understand and modify these validation patterns.Pregex transforms regex into readable Python code using descriptive components.Key benefits:
Code that explains its intent without comments
Easy modification without regex expertise
Composable patterns for complex validation
Export to regex format when needed

Full Article:

Choose the Right Text Pattern Tool: Regex, Pregex, or Pyparsing

Run Code

View GitHub

☕️
Weekly Finds

superduper

LLM

End-to-end framework for building custom AI applications and agents

pgai

LLM

A Python library that transforms PostgreSQL into a robust, production-ready retrieval engine for RAG and Agentic applications

lakeFS

Data Engineer

An open-source tool that transforms your object storage into a Git-like repository, enabling you to manage your data lake the way you manage your code

Favorite

Newsletter #236: Build Grammar Rules with PyParsing Without Regex Maintenance Read More »

Code example: Python 3.14: Type-Safe String Interpolation with t-strings

Newsletter #235: Python 3.14: Type-Safe String Interpolation with t-strings

📅
Today’s Picks

Python 3.14: Type-Safe String Interpolation with t-strings

Problem:

Building SQL queries with f-strings directly embeds user input into the query string, allowing attackers to inject malicious SQL commands.Parameterized queries are secure but require you to maintain query templates and value lists separately.

Solution:

Python 3.14 introduces template string literals (t-strings). Instead of returning strings, they return Template objects that safely expose interpolated values.This lets you validate and sanitize interpolated values before building the final query.

Run Code

Sync Only Changed Database Records with CloudQuery (Sponsored)

Problem:

Syncing data frequently is essential for real-time analytics and data pipelines.However, transferring large datasets between providers is resource-intensive and time-consuming, especially when syncing frequently.

Solution:

However, transferring large datasets between providers is resource-intensive and time-consuming, especially when syncing frequently.CloudQuery’s incremental sync tracks what’s already synced and fetches only the changes.How incremental sync works:
Stores last sync timestamp in a state table
Queries the source for records modified after that timestamp
Updates only changed data in the destination database
In the example above, after the initial full sync of 33 seconds, incremental runs complete in just 5 seconds.

Full Article:

Hacker News Semantic Search: Production RAG with CloudQuery and Postgres

Run Code

View GitHub

☕️
Weekly Finds

pyscn

Data Engineer

An Intelligent Python Code Quality Analyzer that performs structural analysis to help maintain code quality for AI-assisted development.

TradingAgents

LLM

A multi-agent trading framework that uses LLM-powered agents to collaboratively evaluate market conditions and inform trading decisions.

vulture

Data Engineer

Vulture finds unused code in Python programs to help clean up and improve code quality by identifying dead or unreachable code.

Favorite

Newsletter #235: Python 3.14: Type-Safe String Interpolation with t-strings Read More »

Code example: Faker: Generate Realistic Test Data with One Command

Newsletter #234: Faker: Generate Realistic Test Data with One Command

📅
Today’s Picks

Faker: Generate Realistic Test Data with One Command

Problem:

Creating realistic test data manually is time-consuming.

Solution:

Faker generates authentic-looking test data with single-line commands.Key features:
Realistic names, emails, and addresses
50+ language locales (en_US, vi_VN, etc.)
One-line profile generation with custom fields

Full Article:

Faker: Generate Realistic Test Data in Python with One Line of Code

Run Code

View GitHub

Persist Agent State Across Restarts with LangGraph Checkpointing

Problem:

Checkpointing is a persistence layer that maintains agent workflow state between executions.Without checkpointing, agents lose all state when systems restart, requiring users to start over with new conversations.

Solution:

With LangGraph’s checkpointing, you can persist agent state to databases, enabling:
Conversation continuity through restarts
Same conversation accessible from any application instance
Flexible persistence with PostgreSQL, SQLite, or MongoDB backends

Full Article:

Building Coordinated AI Agents with LangGraph: A Hands-On Tutorial

Run Code

View GitHub

☕️
Weekly Finds

git-who

Data Engineer

Git blame for file trees – visualize code authorship and contributions across entire directory structures

nanochat

LLM

The best ChatGPT that $100 can buy – minimal, hackable LLM implementation with full training pipeline

ManimML

ML

Animate and visualize machine learning concepts with Manim – create neural network visualizations and educational content

Favorite

Newsletter #234: Faker: Generate Realistic Test Data with One Command Read More »

Code example: Build Self-Documenting Regex with Pregex

Newsletter #233: Build Self-Documenting Regex with Pregex

📅
Today’s Picks

Build Self-Documenting Regex with Pregex

Problem:

Regex patterns like [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} are difficult to read and intimidating.Team members without regex expertise might struggle to understand and modify these validation patterns.

Solution:

Team members without regex expertise might struggle to understand and modify these validation patterns.Pregex transforms regex into readable Python code using descriptive components.Key benefits:
Code that explains its intent without comments
Easy modification without regex expertise
Composable patterns for complex validation
Export to regex format when needed

Full Article:

Choose the Right Text Pattern Tool: Regex, Pregex, or Pyparsing

Run Code

View GitHub


Related Post

Handle Messy Data with RapidFuzz Fuzzy Matching

Problem:

Traditional regex approaches require hours of preprocessing but still break with common data variations like missing spaces, typos, or inconsistent formatting.

Solution:

RapidFuzz eliminates data cleaning overhead with intelligent fuzzy matching.Key benefits:
Automatic handling of typos, spacing, and case variations
Production-ready C++ performance for large datasets
Full spectrum of fuzzy algorithms in one library

Full Article:

4 Text Similarity Tools: When Regex Isn’t Enough

Run Code

View GitHub

☕️
Weekly Finds

xlwings

Python Utils

Python library that makes it easy to call Python from Excel and vice versa, with support for Excel on Windows, macOS, and web

juvio

Python Utils

UV kernel for Jupyter with inline dependency management for notebooks

drawdb

Data Engineer

Free, simple, and intuitive online database diagram editor and SQL generator

Favorite

Newsletter #233: Build Self-Documenting Regex with Pregex Read More »

Code example: Build Data Analysis with LangChain Pandas Agent

Newsletter #232: Build Data Analysis with LangChain Pandas Agent

📅
Today’s Picks

Build Data Analysis with LangChain Pandas Agent

Problem:

Do you find yourself writing the same pandas correlation, groupby, and filtering code repeatedly for data exploration?Complex, multi-step analyses often involve tedious manual calculations and comparisons, pulling data scientists away from higher-value tasks like modeling and insight generation.

Solution:

LangChain Pandas DataFrame Agent lets you analyze data using natural language, eliminating repetitive code and speeding up your workflow.Key capabilities:
Ask complex analytical questions in plain English
Multi-step analysis in single requests
Get results with automatic explanations of methodology
Select from multiple AI models based on your query complexity

Full Article:

Run Private AI Workflows with LangChain and Ollama

Run Code

View GitHub

Faster Type Checking with Ty’s Rust Engine

Problem:

Traditional type checkers like mypy are slow on large codebases, making iteration cycles longer and development less efficient.

Solution:

Ty is a Rust-based type checker that provides instant feedback on type errors.When testing the FastAPI codebase, Ty completes type checking 9x faster than mypy.Key benefits:
Significantly faster than mypy/pyright on large codebases
Auto-checks every save for immediate feedback while coding
Real-time IDE integration for VS Code and popular editors
Zero setup: run with uvx instantly, respects .gitignore automatically

View GitHub

☕️
Weekly Finds

hyperfine

Python Utils

A command-line benchmarking tool for measuring the execution time of commands with statistical analysis across multiple runs

SurfSense

LLM

Open Source Alternative to NotebookLM / Perplexity / Glean, connected to external sources such as search engines (Tavily, Linkup), Slack, Linear, Notion, YouTube, GitHub and more

stanza

ML

Stanford NLP Python library for tokenization, sentence segmentation, part-of-speech tagging, named entity recognition, dependency parsing, and more

Favorite

Newsletter #232: Build Data Analysis with LangChain Pandas Agent Read More »

Code example: Transform Document Images into Spreadsheets with LlamaParse

Newsletter #231: Transform Document Images into Spreadsheets with LlamaParse

📅
Today’s Picks

Transform Document Images into Spreadsheets with LlamaParse

Problem:

Converting document images such as receipts to structured spreadsheet data requires tedious typing and careful validation.

Solution:

LlamaParse automates document data extraction by combining OCR parsing with schema validation, eliminating manual typing and human error.Here is an example pipeline for extracting receipt data:
Parse receipt images to markdown using LlamaParse OCR engine
Define receipt structure with Pydantic models (company, date, items, totals)
Extract structured data automatically with OpenAI integration
Validate types and enforce business rules (positive prices, valid dates)
Export to pandas DataFrames or spreadsheets for analysis

Full Article:

Turn Receipt Images into Spreadsheets with LlamaIndex

Run Code

View GitHub

Solve Algebra Symbolically in Python with SymPy

Problem:

Have you ever needed to expand or factor complex expressions but found yourself doing tedious algebra by hand?Numeric libraries like NumPy can’t solve symbolic equations or manipulate algebraic expressions.

Solution:

SymPy transforms Python into a powerful symbolic mathematics system.Key capabilities:
Solve equations for any variable symbolically
Perform algebraic manipulations like expand, factor, and substitute
Generate LaTeX output for mathematical documentation
Integrate seamlessly with Jupyter notebooks and NumPy workflows

Full Article:

3 Tools That Automatically Convert Python Code to LaTeX Math

Run Code

View GitHub

☕️
Weekly Finds

BERTopic

ML

Leveraging BERT and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

mesop

Python Utils

Rapidly build AI apps in Python – A Python-based UI framework that allows you to rapidly build web apps like demos and internal apps

crawlee-python

Data Processing

A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs

Favorite

Newsletter #231: Transform Document Images into Spreadsheets with LlamaParse Read More »

Code example: PySpark Transformations: Python API vs SQL Expressions

Newsletter #230: PySpark Transformations: Python API vs SQL Expressions

📅
Today’s Picks

PySpark Transformations: Python API vs SQL Expressions

Problem:

PySpark offers two ways to handle SQL transformations. How do you know which one to use?

Solution:

Choose based on your development style and team expertise.Use the DataFrame API if you’re comfortable with Python and need Python-native development with type safety and autocomplete support.Use selectExpr() if you’re comfortable with SQL and need familiar SQL patterns and simplified CASE statements.Both methods deliver the same performance, so pick the approach that fits your workflow.

Full Article:

The Complete PySpark SQL Guide: DataFrames, Aggregations, Window Functions, and Pandas UDFs

Run Code

View GitHub

☕️
Weekly Finds

dotenvx

Python Utils

A secure dotenv with encryption, syncing, and zero-knowledge key sharing to make .env files secure and team-friendly

databases

Data Processing

Async database support for Python with support for PostgreSQL, MySQL, and SQLite

pomegranate

ML

Fast and flexible probabilistic modeling in Python implemented in PyTorch


Related Post

DuckDB: Zero-Config SQL Database for DataFrames

Problem:

Setting up database servers for SQL operations requires complex configuration, service management, and credential setup.This creates barriers between data scientists and their analytical workflows.

Solution:

DuckDB provides an embedded SQL database with zero configuration required.Key benefits:
No server installation or management needed
Direct SQL operations on DataFrames and files
Compatible with pandas, Polars, and Arrow ecosystems
Fast analytical queries with columnar storage
Open-source with active development community
Query your data instantly without database administration overhead.

Full Article:

A Deep Dive into DuckDB for Data Scientists

Run Code

View GitHub

Favorite

Newsletter #230: PySpark Transformations: Python API vs SQL Expressions Read More »

Code example: latexify: Turn Python Functions Into Clean Math Formulas

Newsletter #229: latexify: Turn Python Functions Into Clean Math Formulas

📅
Today’s Picks

Build Faster Tests with pytest Session Fixtures

Problem:

pytest fixtures provide reusable test data, but they reload for every test function by default.When your fixture loads a large DataFrame, every test reloads the same data, wasting time and delaying your development workflow.

Solution:

Session-scoped fixtures load data once at the start and reuse it across all test functions.Apply this pattern to:
Load large datasets once instead of reloading for each test function
Share a database connection across all tests without passing it as a parameter
Automatically set random seeds for reproducible train/test splits

Learn More:

Production-Ready Data Science: From Prototyping to Production with Python

Run Code

latexify: Turn Python Functions Into Clean Math Formulas

Problem:

It is not ideal to present mathematical formulas written in Python code to executives and stakeholders as they are often not familiar with Python code.However, writing LaTeX manually to show the formulas is time-consuming and tedious.

Solution:

latexify transforms Python functions into clean mathematical notation with a single decorator. No manual LaTeX required.Key features:
Automatic LaTeX generation from Python functions
Functions remain executable for calculations
Compatible with various notebooks such as Jupyter, Colab, and Marimo

Full Article:

3 Tools That Automatically Convert Python Code to LaTeX Math

Run Code

View GitHub

☕️
Weekly Finds

ty

Python Utils

An extremely fast Python type checker and language server, written in Rust

giotto-tda

ML

A high-performance topological machine learning toolbox in Python built on top of scikit-learn

vibekit

MLOps

Run Claude Code, Gemini, Codex — or any coding agent — in a clean, isolated sandbox with sensitive data redaction and observability baked in

Favorite

Newsletter #229: latexify: Turn Python Functions Into Clean Math Formulas Read More »

Code example: Create Dynamic Scatter Plots with Plotly Animation

Newsletter #228: Create Dynamic Scatter Plots with Plotly Animation

📅
Today’s Picks

Create Dynamic Scatter Plots with Plotly Animation

Problem:

Static scatter plots can’t show how data clusters change and evolve over time.

Solution:

Plotly Express creates animated scatter plots that change over time in one line of code.Key benefits:
Simply add the animation_frame=”time_column” parameter to px.scatter to create an animated scatter plot
Automatic smooth transitions between time periods
Built-in playback controls for user interaction
Works with any time-series dataset

Full Article:

Top 6 Python Libraries for Visualization: Which One to Use

Run Code

View GitHub

CloudQuery: Move RAG Data with 18-Line YAML (Sponsored)

Problem:

RAG applications need data from various sources moved into vector stores. Manual API integration means writing boilerplate for rate limiting, pagination, and error handling instead of building AI.

Solution:

CloudQuery handles the entire data-to-embeddings pipeline with declarative YAML config and native pgvector support.Key benefits:
Pre-built connectors for AWS, GCP, Azure, and 100+ platforms
Sync state persistence with incremental processing and automatic schema evolution
Built-in PII removal, column obfuscation, and data cleaning for compliance
Native pgvector support: text splitting, embeddings, semantic indexing for RAG

Full Article:

Hacker News Semantic Search: Production RAG with CloudQuery and Postgres

View GitHub

☕️
Weekly Finds

ShinkaEvolve

ML

An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency

claude-code-router

LLM

A powerful tool to route Claude Code requests to different models and customize any request

data-formulator

Data Viz

AI-driven tool designed to streamline the creation of data visualizations

Favorite

Newsletter #228: Create Dynamic Scatter Plots with Plotly Animation Read More »

Code example: LangGraph: Turn Any Python Function Into Agent Tools

Newsletter #227: LangGraph: Turn Any Python Function Into Agent Tools

📅
Today’s Picks

LangGraph: Turn Any Python Function Into Agent Tools

Problem:

AI agents need specialized tools to interact with the world beyond their training data like searching the web, querying databases, executing code, and integrating with APIs.However, if there are too many tools, it becomes difficult to connect them to user requests intelligently.

Solution:

LangGraph’s create_react_agent eliminates this entirely with LLM reasoning.Key benefits of ReAct agents:
Handles fuzzy user requests by letting the LLM choose tools on the fly
Lets you drop in new @tool functions without touching control flow
Turns any Python function into an agent-accessible tool

Full Article:

Building Coordinated AI Agents with LangGraph: A Hands-On Tutorial

Run Code

View GitHub

☕️
Weekly Finds

MindsDB

ML

AI data automation solution that connects and unifies petabyte scale enterprise data, enabling informed decision-making in real-time

gspread

Python Utils

Google Sheets Python API for managing Google Spreadsheets programmatically

wrapt

Python Utils

Python module for decorators, wrappers and monkey patching with transparent object proxy


Related Post

Query GitHub Issues with Natural Language Using LangChain

Problem:

Have you ever spent hours clicking through GitHub pages to understand project status, track bugs, or review recent changes? Manual repository analysis wastes development time that could be spent building features.

Solution:

LangChain’s GitHubIssuesLoader converts repository issues and PRs into searchable content that responds to natural language questions about bugs, features, and project status.This method integrates seamlessly with LangChain workflows.

Full Article:

Run Private AI Workflows with LangChain and Ollama

Run Code

View GitHub

Favorite

Newsletter #227: LangGraph: Turn Any Python Function Into Agent Tools Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran