Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter Archive

Automated newsletter archive from Klaviyo campaigns

Code example: Build Dynamic AI Prompts with LangChain Templates

Newsletter #222: Build Dynamic AI Prompts with LangChain Templates

📅
Today’s Picks

DuckDB: Zero-Config SQL Database for DataFrames

Problem:

Setting up database servers for SQL operations requires complex configuration, service management, and credential setup.This creates barriers between data scientists and their analytical workflows.

Solution:

DuckDB provides an embedded SQL database with zero configuration required.Key benefits:
No server installation or management needed
Direct SQL operations on DataFrames and files
Compatible with pandas, Polars, and Arrow ecosystems
Fast analytical queries with columnar storage
Open-source with active development community
Query your data instantly without database administration overhead.

Full Article:

A Deep Dive into DuckDB for Data Scientists

Run Code

View GitHub

Build Dynamic AI Prompts with LangChain Templates

Problem:

Hard-coded prompts limit flexibility and make it difficult to adapt AI applications to different contexts or user inputs.Creating separate functions for each prompt variation leads to duplicate code with no reusability.

Solution:

LangChain’s PromptTemplate enables dynamic, reusable prompts with variable substitution.Create one template that adapts to multiple contexts:
Variable substitution with {topic}, {audience}, {examples}
Single template for unlimited prompt variations
Clean, maintainable code structure
Compatible with all major LLM providers
Transform repetitive hard-coded prompts into flexible, reusable templates that scale with your AI application needs.

Full Article:

Run Private AI Workflows with LangChain and Ollama

View GitHub

☕️
Weekly Finds

GHunt

Python Utils

Modulable OSINT tool designed to investigate Google accounts and objects using various techniques

nbQA

Python Utils

Run ruff, isort, pyupgrade, mypy, pylint, flake8, and more on Jupyter Notebooks

pg_vectorize

LLM

Postgres extension that automates the transformation and orchestration of text to embeddings for vector and semantic search

Favorite

Newsletter #222: Build Dynamic AI Prompts with LangChain Templates Read More »

Code example: handcalcs: Generate LaTeX Step-by-Step Calculations from Python

Newsletter #221: handcalcs: Generate LaTeX Step-by-Step Calculations from Python

📅
Today’s Picks

handcalcs: Generate LaTeX Step-by-Step Calculations from Python

Problem:

Showing the intermediate steps of the calculation is important for stakeholders to understand the calculation and verify the results.However, writing LaTeX for each calculation step is manual and time-consuming.

Solution:

handcalcs eliminates manual LaTeX writing by auto-generating mathematical documentation from your Python calculations.Perfect for engineering reports, data science documentation, and educational materials.

Full Article:

3 Tools That Automatically Convert Python Code to LaTeX Math

Run Code

View GitHub

☕️
Weekly Finds

nanoGPT

LLM

The simplest, fastest repository for training/finetuning medium-sized GPTs. A clean, minimal implementation of GPT in PyTorch.

GHunt

Python Utils

Modulable OSINT tool designed to evolve over the years, incorporates many techniques to investigate Google accounts.

beartype

Python Utils

Fast, efficient runtime type checking for Python. Open-source pure-Python runtime type checker emphasizing efficiency and portability.


Related Post

TinyDB: Python Databases Without SQL Complexity

Problem:

Databases provide essential persistence, queries, and data integrity that Python lists can’t match. However, setting up PostgreSQL or MySQL servers creates unnecessary complexity for small applications.

Solution:

TinyDB delivers these database capabilities through file-based JSON storage with simple Python dict-like operations.Key benefits:
No SQL syntax required – use familiar Python dictionary operations
Single JSON file storage – perfect for prototyping and small applications
Zero configuration setup – just import and start storing data
Pure Python implementation with no external dependencies
Start storing data with just three lines of code.

Run Code

View GitHub

Favorite

Newsletter #221: handcalcs: Generate LaTeX Step-by-Step Calculations from Python Read More »

Code example: Altair: Multi-Chart Filtering in Pure Python

Newsletter #220: Altair: Multi-Chart Filtering in Pure Python

📅
Today’s Picks

LangChain: Smart Text Chunking Without Breaking Context

Problem:

RAG (Retrieval-Augmented Generation) applications require splitting documents into smaller chunks for processing.However, basic text splitting breaks semantic meaning, making your embeddings less effective for retrieval.

Solution:

LangChain’s RecursiveCharacterTextSplitter ensures your document chunks maintain meaning and context for better RAG performance.It intelligently splits text by trying these separators in order:
Double newlines (paragraphs)
Single newlines
Periods
Spaces
Individual characters (as last resort)
RecursiveCharacterTextSplitter also allows you to configure the chunk size and overlap to your specific use case.

Full Article:

Build a Complete RAG System with 5 Open-Source Tools

Run Code

View GitHub

Altair: Multi-Chart Filtering in Pure Python

Problem:

Static individual charts fail to show relationships between different data views and perspectives.Traditional dashboards require complex backend infrastructure for interactive filtering.

Solution:

Altair’s linked plots enable interactive selections that dynamically filter multiple connected visualizations.Other features of Altair:
Declarative syntax that makes visualization intuitive
Built-in data transformations and aggregations
Seamless chart composition and layering

Full Article:

Top 6 Python Libraries for Visualization: Which One to Use

Run Code

View GitHub

☕️
Weekly Finds

Boruta-Shap

ML

A Tree based feature selection algorithm which combines both the Boruta feature selection algorithm with Shapley values for interpretable feature importance

py-roughviz

Data Viz

A python visualization library for creating sketchy/hand-drawn styled charts that look fun and catchy compared to standard matplotlib graphs

prek

Python Utils

Better pre-commit re-engineered in Rust – automatically installs required Python versions and creates virtual environments with no hassle

Favorite

Newsletter #220: Altair: Multi-Chart Filtering in Pure Python Read More »

Code example: GLiNER: Zero-Shot Entity Recognition Without Retraining

Newsletter #219: GLiNER: Zero-Shot Entity Recognition Without Retraining

📅
Today’s Picks

Create Safe Temporary Files with Python tempfile

Problem:

Unit tests that create files for testing data processing functions often leave behind test artifacts or fail due to file conflicts.Running test suites in parallel or repeatedly creates naming conflicts and cluttered test environments.

Solution:

Python’s tempfile module ensures test isolation by creating unique temporary files that automatically cleanup after each test.Key benefits:
Automatic cleanup after test completion
Secure file creation with proper permissions
No naming conflicts between parallel tests
Production-safe workflows for processing large datasets
Use tempfile.NamedTemporaryFile() with context managers to process data in chunks without leaving artifacts behind.

Run Code

GLiNER: Zero-Shot Entity Recognition Without Retraining

Problem:

While spaCy provides excellent NER capabilities, its models need retraining for new entity types, which requires collecting training data, labeling examples, and running expensive model fine-tuning.This means weeks of model preparation before you can extract custom entities from your text data.

Solution:

GLiNER enables zero-shot entity recognition by accepting entity types as runtime parameters.With GLiNER, you can simply specify your desired entity types and get instant extraction results without any training.

Full Article:

langextract vs spaCy: AI-Powered vs Rule-Based Entity Extraction

Run Code

View GitHub

☕️
Weekly Finds

browser-use

LLM

Make websites accessible for AI agents. Automate tasks online with ease.

tiktoken

LLM

tiktoken is a fast BPE tokeniser for use with OpenAI’s models.

FuzzTypes

Python Utils

Pydantic extension for annotating autocorrecting fields.

Favorite

Newsletter #219: GLiNER: Zero-Shot Entity Recognition Without Retraining Read More »

Code example: Delta Lake: Time Travel Your Data Pipeline

Newsletter #218: Delta Lake: Time Travel Your Data Pipeline

📅
Today’s Picks

Delta Lake: Time Travel Your Data Pipeline

Problem:

Once data is overwritten in pandas, previous versions are lost forever.You can’t debug pipeline issues or rollback bad changes when your data history disappears.

Solution:

Delta Lake maintains version history allowing you to query any previous state of your data by timestamp or version number.Use cases:
Compare today’s sales data with yesterday’s to spot revenue anomalies
Recover accidentally deleted customer records from last week’s backup
Audit financial reports using data exactly as it existed at quarter-end

Full Article:

Delta Lake: Transform pandas Prototypes into Production

Run Code

View GitHub

☕️
Weekly Finds

DALEX

ML

Model Agnostic Language for Exploration and eXplanation – helps explore and explain behavior of complex machine learning models

OpenBB

Data Processing

Investment Research for Everyone, Anywhere – free and open-source financial platform with analytics tools

fastlite

Python Utils

A bit of extra usability for sqlite – quality-of-life improvements for interactive use of sqlite-utils library


Related Post

Delta Lake: Never Lose Data to Failed Writes Again

Problem:

Have you ever had a pandas operation fail midway through writing data, leaving you with corrupted datasets?Partial writes create inconsistent data states that can break downstream analysis and reporting workflows.

Solution:

Delta Lake provides ACID transactions that guarantee all-or-nothing writes with automatic rollback on failures.ACID properties:
Atomicity: Complete transaction success or automatic rollback
Consistency: Data consistency guaranteed
Isolation: Safe concurrent operations
Durability: Version history with time travel

Full Article:

Delta Lake: Transform pandas Prototypes into Production

View GitHub

Favorite

Newsletter #218: Delta Lake: Time Travel Your Data Pipeline Read More »

Code example: Whenever: Python DateTime Done Right

Newsletter #217: Whenever: Python DateTime Done Right

📅
Today’s Picks

Build Dynamic Log Filters with Loguru Callables

Problem:

Logging is informative, but unnecessary logs can distract from the important ones. While you can filter by log level, sometimes you need to filter by some specific metric values.

Solution:

Loguru allows you to add a custom callable filter to your logger based on your specific criteria. This is significantly easier than setting up a custom filter class with standard logging.Other features of Loguru:
Beautiful logging output out of the box
Significantly simpler to use than standard logging
Rich exception tracebacks with variable values

Learn More:

Production-Ready Data Science: From Prototyping to Production with Python

Run Code

View GitHub

Whenever: Python DateTime Done Right

Problem:

Standard library datetime arithmetic ignores Daylight Saving Time (DST) transitions, producing incorrect results.Your time calculations can be off by an hour during DST changes.

Solution:

Whenever’s ZonedDateTime automatically accounts for Daylight Saving Time during time calculations.Why use Whenever:
Type-safe datetime operations prevent mixing errors
DST transitions handled automatically (no surprises)
Faster performance than standard library, Arrow and Pendulum
Drop-in replacement for standard library

Run Code

View GitHub

☕️
Weekly Finds

ExtractThinker

LLM

Document Intelligence library for LLMs offering ORM-style interaction for flexible and powerful document workflows

pytest-mock

Python Utils

Thin-wrapper around the mock package for easier use with pytest

ecco

LLM

Explain, analyze, and visualize NLP language models with interactive visualizations for Transformer models

Favorite

Newsletter #217: Whenever: Python DateTime Done Right Read More »

Code example: Milvus: Unified Search Across Text, Images, and Audio (Sponsored)

Newsletter #216: Milvus: Unified Search Across Text, Images, and Audio

📅
Today’s Picks

Create Compelling Animated Visualizations with Matplotlib Animation

Problem:

Static charts can’t reveal how data patterns and relationships change over time.

Solution:

With Matplotlib’s animation module, you can transform static plots into dynamic, interactive data stories.Some use cases of Matplotlib animation:
Time series data visualization showing trends over periods
Machine learning model convergence and training progress
Scientific simulations and mathematical function behavior
Business metrics dashboards with real-time updates

Full Article:

Top 6 Python Libraries for Visualization: Which One to Use

Run Code

View GitHub

Milvus: Unified Search Across Text, Images, and Audio

Problem:

It is a pain to search across text documents, images, and audio files in different search systems. Traditional search engines excel at text but struggle with visual content, while media-specific tools can’t understand textual context.

Solution:

Milvus supports multi-modal search by storing embeddings from different data types in a single collection. This allows you to query text, images, and audio simultaneously.Here’s how Milvus works:
Generate embeddings for text, images, and audio using specialized models
Store all embeddings in unified Milvus collection with metadata
Execute similarity searches across all content types simultaneously
Return ranked results regardless of original data format

Run Code

View GitHub

☕️
Weekly Finds

phoenix

MLOps

Open-source AI observability platform for experimentation, evaluation, and troubleshooting of LLM applications

mesop

Python Utils

Python-based UI framework for rapidly building web apps and ML/AI demos

crawlee-python

Python Utils

Web scraping and browser automation library

Favorite

Newsletter #216: Milvus: Unified Search Across Text, Images, and Audio Read More »

Code example: All or Nothing: DuckDB Transaction Guarantee

Newsletter #215: All or Nothing: DuckDB Transaction Guarantee

📅
Today’s Picks

All or Nothing: DuckDB Transaction Guarantee

Problem:

Data operations can fail partway through, leaving databases in inconsistent states.Money transfers, inventory updates, and other critical operations need guaranteed atomicity.

Solution:

DuckDB uses ACID transactions to maintain data integrity. Operations either complete fully or roll back completely using BEGIN, COMMIT, and ROLLBACK commands.Why ACID transactions matter:
Atomicity: prevents half-completed operations
Consistency: maintains database integrity rules
Isolation: stops concurrent operations from conflicting
Durability: ensures committed data survives system failures

Full Article:

A Deep Dive into DuckDB for Data Scientists

Run Code

View GitHub

☕️
Weekly Finds

gpt-migrate

AI Tools

Easily migrate your codebase from one framework or language to another using AI

lmql

LLM

A query language for programming large language models with structured outputs

respx

Python Utils

Mock HTTPX with awesome request patterns and response side effects for testing


Related Post

Secure Database Queries with DuckDB Parameters

Problem:

F-strings create SQL injection vulnerabilities by inserting values directly into queries.

Solution:

DuckDB’s parameterized queries use placeholders to safely pass parameters and prevent SQL injection attacks.Other key features of DuckDB:
In-Process Analytics – No external database needed
Fast Performance – Columnar storage for speed
Zero Setup – Works instantly in Python
DataFrame Integration – Native pandas support

Full Article:

A Deep Dive into DuckDB for Data Scientists

Run Code

View GitHub

Favorite

Newsletter #215: All or Nothing: DuckDB Transaction Guarantee Read More »

Code example: Create Compelling Animated Visualizations with Matplotlib Animation

Newsletter #214: Create Compelling Animated Visualizations with Matplotlib Animation

📅
Today’s Picks

Create Compelling Animated Visualizations with Matplotlib Animation

Problem:

Static charts can’t reveal how data patterns and relationships change over time.

Solution:

With Matplotlib’s animation module, you can transform static plots into dynamic, interactive data stories.Some use cases of Matplotlib animation:
Time series data visualization showing trends over periods
Machine learning model convergence and training progress
Scientific simulations and mathematical function behavior
Business metrics dashboards with real-time updates

Full Article:

Create Compelling Animated Visualizations with Matplotlib Animation

Run Code

View GitHub

☕️
Weekly Finds

implicit

Machine Learning

Fast Python implementations of several different popular recommendation algorithms for implicit feedback datasets

developer

AI Development

AI-powered code generation tool designed to automate software development processes and build entire codebases with prompts

datasets-server

Data Infrastructure

Backend API for visualizing and exploring all types of datasets – computer vision, speech, text, and tabular – stored on Hugging Face Hub

Favorite

Newsletter #214: Create Compelling Animated Visualizations with Matplotlib Animation Read More »

Code example: Query GitHub Issues with Natural Language Using LangChain

Newsletter #213: Query GitHub Issues with Natural Language Using LangChain

📅
Today’s Picks

Query GitHub Issues with Natural Language Using LangChain

Problem:

Have you ever spent hours clicking through GitHub pages to understand project status, track bugs, or review recent changes? Manual repository analysis wastes development time that could be spent building features.

Solution:

LangChain’s GitHubIssuesLoader converts repository issues and PRs into searchable content that responds to natural language questions about bugs, features, and project status.This method integrates seamlessly with LangChain workflows.

Full Article:

Run Private AI Workflows with LangChain and Ollama

Run Code

View GitHub

Mock External APIs for Fast, Reliable Tests

Problem:

Testing with real APIs and databases is slow, expensive, and unreliable.External dependencies create flaky tests that can fail due to network issues, rate limits, or service downtime rather than code problems.

Solution:

The patch decorator replaces external calls with controllable mock objects for isolated testing.Key benefits:
Reproducible results across different machines
Fast, reliable tests that focus on your logic
Test edge cases and error conditions that are hard to trigger naturally
Test your data processing logic without waiting for external services or consuming API quotas.

Full Article:

Pytest for Data Scientists

Run Code

☕️
Weekly Finds

filprofiler

Performance Profiling

A Python memory profiler for data processing applications with native Jupyter support

organize

Automation

The file management automation tool for sorting, renaming, and organizing files

plotnine

Data Visualization

A Grammar of Graphics for Python based on ggplot2 for data visualization

Favorite

Newsletter #213: Query GitHub Issues with Natural Language Using LangChain Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran