Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter Archive

Automated newsletter archive from Klaviyo campaigns

Newsletter #224: Delta Lake vs pandas: Stop Silent Data Corruption

📅 Today’s Picks

Delta Lake vs pandas: Stop Silent Data Corruption

Problem
Pandas allows type coercion during DataFrame operations. A single string value can silently convert numeric columns to object dtype, breaking downstream systems and corrupting data integrity.
Solution
Delta Lake prevents these issues through strict schema enforcement at write time, validating data types before ingestion to maintain table integrity.
Other features of Delta Lake:

Time travel provides instant access to any historical data version
ACID transactions guarantee data consistency across all operations
Smart file skipping eliminates 95% of unnecessary data scanning
Incremental processing handles billion-row updates efficiently

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

ZeroFS
[Data Engineer]
– ZeroFS – The Filesystem That Makes S3 your Primary Storage. Provides file-level access via NFS and 9P and block-level access via NBD on S3 storage with encryption, caching, and high performance.

vicinity
[ML]
– Lightweight Nearest Neighbors with Flexible Backends. Provides a unified interface for vector similarity search with support for multiple backends like HNSW, FAISS, Annoy, and more.

vec2text
[LLM]
– Utilities for decoding deep representations (like sentence embeddings) back to text. Train models to reconstruct text sequences from embeddings and invert pre-trained embeddings.

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #224: Delta Lake vs pandas: Stop Silent Data Corruption Read More »

Newsletter #223: ChromaDB’s Automatic Indexing: Fast Vector Search Made Easy

📅 Today’s Picks

Type-Safe Configuration Management with Hydra

Problem
Configuration errors and type mismatches often go undetected until runtime, wasting time and computing resources.
Solution
Hydra’s structured configurations with dataclasses validate types before your code runs, preventing configuration crashes.
What Hydra adds to dataclasses:

Runtime parameter overrides from command line
Configuration composition and inheritance
Built-in experiment management and logging
Run multiple parameters in one command

📖 Learn more

🧪 Run code

⭐ View GitHub

ChromaDB’s Automatic Indexing: Fast Vector Search Made Easy

Problem
Why saving vector embeddings in a file is not enough?
Basic file storage forces you to scan every single embedding for similarity search, creating massive performance bottlenecks as your dataset grows.
Solution
ChromaDB provides persistent vector storage with automatic indexing and metadata filtering capabilities.
Key benefits:

Find relevant content by meaning, not just keyword matching
Handle large datasets without memory crashes using efficient indexing
Complete toolkit included: similarity scoring, deduplication, search ranking, and more

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

wrapt
[Python Utils]
– A Python module for decorators, wrappers and monkey patching

TabPFN
[ML]
– A transformer-based foundation model for tabular data that outperforms traditional methods

superduperdb
[Data Processing]
– A Python framework for integrating AI models, APIs, and vector search engines directly with your existing databases

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #223: ChromaDB’s Automatic Indexing: Fast Vector Search Made Easy Read More »

Newsletter #222: Build Dynamic AI Prompts with LangChain Templates

📅 Today’s Picks

DuckDB: Zero-Config SQL Database for DataFrames

Problem
Setting up database servers for SQL operations requires complex configuration, service management, and credential setup.
This creates barriers between data scientists and their analytical workflows.
Solution
DuckDB provides an embedded SQL database with zero configuration required.
Key benefits:

No server installation or management needed
Direct SQL operations on DataFrames and files
Compatible with pandas, Polars, and Arrow ecosystems
Fast analytical queries with columnar storage
Open-source with active development community

Query your data instantly without database administration overhead.

📖 View Full Article

🧪 Run code

⭐ View GitHub

Build Dynamic AI Prompts with LangChain Templates

Problem
Hard-coded prompts limit flexibility and make it difficult to adapt AI applications to different contexts or user inputs.
Creating separate functions for each prompt variation leads to duplicate code with no reusability.
Solution
LangChain’s PromptTemplate enables dynamic, reusable prompts with variable substitution.
Create one template that adapts to multiple contexts:

Variable substitution with {topic}, {audience}, {examples}
Single template for unlimited prompt variations
Clean, maintainable code structure
Compatible with all major LLM providers

Transform repetitive hard-coded prompts into flexible, reusable templates that scale with your AI application needs.

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

GHunt
[Python Utils]
– Modulable OSINT tool designed to investigate Google accounts and objects using various techniques

nbQA
[Python Utils]
– Run ruff, isort, pyupgrade, mypy, pylint, flake8, and more on Jupyter Notebooks

pg_vectorize
[LLM]
– Postgres extension that automates the transformation and orchestration of text to embeddings for vector and semantic search

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #222: Build Dynamic AI Prompts with LangChain Templates Read More »

Newsletter #221: handcalcs: Generate LaTeX Step-by-Step Calculations from Python

📅 Today’s Picks

handcalcs: Generate LaTeX Step-by-Step Calculations from Python

Problem
Showing the intermediate steps of the calculation is important for stakeholders to understand the calculation and verify the results.
However, writing LaTeX for each calculation step is manual and time-consuming.
Solution
handcalcs eliminates manual LaTeX writing by auto-generating mathematical documentation from your Python calculations.
Perfect for engineering reports, data science documentation, and educational materials.

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

nanoGPT
[LLM]
– The simplest, fastest repository for training/finetuning medium-sized GPTs. A clean, minimal implementation of GPT in PyTorch.

GHunt
[Python Utils]
– Modulable OSINT tool designed to evolve over the years, incorporates many techniques to investigate Google accounts.

beartype
[Python Utils]
– Fast, efficient runtime type checking for Python. Open-source pure-Python runtime type checker emphasizing efficiency and portability.

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #221: handcalcs: Generate LaTeX Step-by-Step Calculations from Python Read More »

Newsletter #220: Altair: Multi-Chart Filtering in Pure Python

📅 Today’s Picks

LangChain: Smart Text Chunking Without Breaking Context

Problem
RAG (Retrieval-Augmented Generation) applications require splitting documents into smaller chunks for processing.
However, basic text splitting breaks semantic meaning, making your embeddings less effective for retrieval.
Solution
LangChain’s RecursiveCharacterTextSplitter ensures your document chunks maintain meaning and context for better RAG performance.
It intelligently splits text by trying these separators in order:

Double newlines (paragraphs)
Single newlines
Periods
Spaces
Individual characters (as last resort)

RecursiveCharacterTextSplitter also allows you to configure the chunk size and overlap to your specific use case.

📖 View Full Article

🧪 Run code

⭐ View GitHub

Altair: Multi-Chart Filtering in Pure Python

Problem
Static individual charts fail to show relationships between different data views and perspectives.
Traditional dashboards require complex backend infrastructure for interactive filtering.
Solution
Altair’s linked plots enable interactive selections that dynamically filter multiple connected visualizations.
Other features of Altair:

Declarative syntax that makes visualization intuitive
Built-in data transformations and aggregations
Seamless chart composition and layering

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

Boruta-Shap
[ML]
– A Tree based feature selection algorithm which combines both the Boruta feature selection algorithm with Shapley values for interpretable feature importance

py-roughviz
[Data Viz]
– A python visualization library for creating sketchy/hand-drawn styled charts that look fun and catchy compared to standard matplotlib graphs

prek
[Python Utils]
– Better pre-commit re-engineered in Rust – automatically installs required Python versions and creates virtual environments with no hassle

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #220: Altair: Multi-Chart Filtering in Pure Python Read More »

Newsletter #219: GLiNER: Zero-Shot Entity Recognition Without Retraining

📅 Today’s Picks

Create Safe Temporary Files with Python tempfile

Problem
Unit tests that create files for testing data processing functions often leave behind test artifacts or fail due to file conflicts.
Running test suites in parallel or repeatedly creates naming conflicts and cluttered test environments.
Solution
Python’s tempfile module ensures test isolation by creating unique temporary files that automatically cleanup after each test.
Key benefits:

Automatic cleanup after test completion
Secure file creation with proper permissions
No naming conflicts between parallel tests
Production-safe workflows for processing large datasets

Use tempfile.NamedTemporaryFile() with context managers to process data in chunks without leaving artifacts behind.

🧪 Run code

GLiNER: Zero-Shot Entity Recognition Without Retraining

Problem
While spaCy provides excellent NER capabilities, its models need retraining for new entity types, which requires collecting training data, labeling examples, and running expensive model fine-tuning.
This means weeks of model preparation before you can extract custom entities from your text data.
Solution
GLiNER enables zero-shot entity recognition by accepting entity types as runtime parameters.
With GLiNER, you can simply specify your desired entity types and get instant extraction results without any training.

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

browser-use
[LLM]
– Make websites accessible for AI agents. Automate tasks online with ease.

tiktoken
[LLM]
– tiktoken is a fast BPE tokeniser for use with OpenAI’s models.

FuzzTypes
[Python Utils]
– Pydantic extension for annotating autocorrecting fields.

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #219: GLiNER: Zero-Shot Entity Recognition Without Retraining Read More »

Newsletter #218: Delta Lake: Time Travel Your Data Pipeline

📅 Today’s Picks

Delta Lake: Time Travel Your Data Pipeline

Problem
Once data is overwritten in pandas, previous versions are lost forever.
You can’t debug pipeline issues or rollback bad changes when your data history disappears.
Solution
Delta Lake maintains version history allowing you to query any previous state of your data by timestamp or version number.
Use cases:

Compare today’s sales data with yesterday’s to spot revenue anomalies
Recover accidentally deleted customer records from last week’s backup
Audit financial reports using data exactly as it existed at quarter-end

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

DALEX
[ML]
– Model Agnostic Language for Exploration and eXplanation – helps explore and explain behavior of complex machine learning models

OpenBB
[Data Processing]
– Investment Research for Everyone, Anywhere – free and open-source financial platform with analytics tools

fastlite
[Python Utils]
– A bit of extra usability for sqlite – quality-of-life improvements for interactive use of sqlite-utils library

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #218: Delta Lake: Time Travel Your Data Pipeline Read More »

Newsletter #217: Whenever: Python DateTime Done Right

🤝 COLLABORATION

Get Apache Airflow® 3 certified (for free)
On September 16, Beyond Analytics kicks off with a live Airflow 3 Certification Crash Course, where you can ask questions and prepare for the Airflow 3 certification exam.
Join “Data with Marc’s” creator Marc Lamberti for a live session where you will:

Learn about the Airflow 3 features that will be covered in the exam, such as scheduling, DAG versioning, and backfills
Get your certification questions answered live
Receive a $150 voucher for the official Airflow 3 certification exam

Register here

📅 Today’s Picks

Build Dynamic Log Filters with Loguru Callables

Problem
Logging is informative, but unnecessary logs can distract from the important ones. While you can filter by log level, sometimes you need to filter by some specific metric values.
Solution
Loguru allows you to add a custom callable filter to your logger based on your specific criteria. This is significantly easier than setting up a custom filter class with standard logging.
Other features of Loguru:

Beautiful logging output out of the box
Significantly simpler to use than standard logging
Rich exception tracebacks with variable values

📖 View Full Article

🧪 Run code

⭐ View GitHub

Whenever: Python DateTime Done Right

Problem
Standard library datetime arithmetic ignores Daylight Saving Time (DST) transitions, producing incorrect results.
Your time calculations can be off by an hour during DST changes.
Solution
Whenever’s ZonedDateTime automatically accounts for Daylight Saving Time during time calculations.
Why use Whenever:

Type-safe datetime operations prevent mixing errors
DST transitions handled automatically (no surprises)
Faster performance than standard library, Arrow and Pendulum
Drop-in replacement for standard library

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

ExtractThinker
[LLM]
– Document Intelligence library for LLMs offering ORM-style interaction for flexible and powerful document workflows

pytest-mock
[Python Utils]
– Thin-wrapper around the mock package for easier use with pytest

ecco
[LLM]
– Explain, analyze, and visualize NLP language models with interactive visualizations for Transformer models

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #217: Whenever: Python DateTime Done Right Read More »

Newsletter #216: Milvus: Unified Search Across Text, Images, and Audio

🤝 COLLABORATION

Get Apache Airflow® 3 certified (for free)
Get Apache Airflow® 3 certified (for free)
On September 16, Beyond Analytics kicks off with a live Airflow 3 Certification Crash Course, where you can ask questions and prepare for the Airflow 3 certification exam.
Join “Data with Marc’s” creator Marc Lamberti for a live session where you will:

Learn about the Airflow 3 features that will be covered in the exam, such as scheduling, DAG versioning, and backfills
Get your certification questions answered live
Receive a $150 voucher for the official Airflow 3 certification exam

Register here

📅 Today’s Picks

Create Compelling Animated Visualizations with Matplotlib Animation

Problem
Static charts can’t reveal how data patterns and relationships change over time.
Solution
With Matplotlib’s animation module, you can transform static plots into dynamic, interactive data stories.
Some use cases of Matplotlib animation:

Time series data visualization showing trends over periods
Machine learning model convergence and training progress
Scientific simulations and mathematical function behavior
Business metrics dashboards with real-time updates

📖 View Full Article

🧪 Run code

⭐ View GitHub

Milvus: Unified Search Across Text, Images, and Audio

Problem
It is a pain to search across text documents, images, and audio files in different search systems. Traditional search engines excel at text but struggle with visual content, while media-specific tools can’t understand textual context.
Solution
Milvus supports multi-modal search by storing embeddings from different data types in a single collection. This allows you to query text, images, and audio simultaneously.
Here’s how Milvus works:

Generate embeddings for text, images, and audio using specialized models
Store all embeddings in unified Milvus collection with metadata
Execute similarity searches across all content types simultaneously
Return ranked results regardless of original data format

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

phoenix
[MLOps]
– Open-source AI observability platform for experimentation, evaluation, and troubleshooting of LLM applications

mesop
[Python Utils]
– Python-based UI framework for rapidly building web apps and ML/AI demos

crawlee-python
[Python Utils]
– Web scraping and browser automation library

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #216: Milvus: Unified Search Across Text, Images, and Audio Read More »

Newsletter #215: All or Nothing: DuckDB Transaction Guarantee

🤝 COLLABORATION

Beyond Analytics: Get Apache Airflow® 3 certified (for free)
On September 16, Beyond Analytics kicks off with a live Airflow 3 Certification Crash Course, where you can ask questions and prepare for the Airflow 3 certification exam.
Join “Data with Marc’s” creator Marc Lamberti for a live session where you will:

Learn about the Airflow 3 features that will be covered in the exam, such as scheduling, DAG versioning, and backfills
Get your certification questions answered live
Receive a $150 voucher for the official Airflow 3 certification exam

Register here

📅 Today’s Picks

All or Nothing: DuckDB Transaction Guarantee

Problem
Data operations can fail partway through, leaving databases in inconsistent states.
Money transfers, inventory updates, and other critical operations need guaranteed atomicity.
Solution
DuckDB uses ACID transactions to maintain data integrity. Operations either complete fully or roll back completely using BEGIN, COMMIT, and ROLLBACK commands.
Why ACID transactions matter:

Atomicity: prevents half-completed operations
Consistency: maintains database integrity rules
Isolation: stops concurrent operations from conflicting
Durability: ensures committed data survives system failures

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

gpt-migrate
[AI Tools]
– Easily migrate your codebase from one framework or language to another using AI

lmql
[LLM]
– A query language for programming large language models with structured outputs

respx
[Python Utils]
– Mock HTTPX with awesome request patterns and response side effects for testing

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #215: All or Nothing: DuckDB Transaction Guarantee Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran