Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter Archive

Automated newsletter archive from Klaviyo campaigns

Newsletter #214: Create Compelling Animated Visualizations with Matplotlib Animation

📅 Today’s Picks

Create Compelling Animated Visualizations with Matplotlib Animation

Problem
Static charts can’t reveal how data patterns and relationships change over time.
Solution
With Matplotlib’s animation module, you can transform static plots into dynamic, interactive data stories.
Some use cases of Matplotlib animation:

Time series data visualization showing trends over periods
Machine learning model convergence and training progress
Scientific simulations and mathematical function behavior
Business metrics dashboards with real-time updates

📖 View Full Article

🧪 Run code

⭐ View GitHub

☕️ Weekly Finds

implicit
[Machine Learning]
– Fast Python implementations of several different popular recommendation algorithms for implicit feedback datasets

developer
[AI Development]
– AI-powered code generation tool designed to automate software development processes and build entire codebases with prompts

datasets-server
[Data Infrastructure]
– Backend API for visualizing and exploring all types of datasets – computer vision, speech, text, and tabular – stored on Hugging Face Hub

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #214: Create Compelling Animated Visualizations with Matplotlib Animation Read More »

Newsletter #213: Query GitHub Issues with Natural Language Using LangChain

📅 Today’s Picks

Query GitHub Issues with Natural Language Using LangChain

Problem
Have you ever spent hours clicking through GitHub pages to understand project status, track bugs, or review recent changes? Manual repository analysis wastes development time that could be spent building features.
Solution
LangChain’s GitHubIssuesLoader converts repository issues and PRs into searchable content that responds to natural language questions about bugs, features, and project status.
This method integrates seamlessly with LangChain workflows.

📖 View Full Article

🧪 Run code

⭐ View GitHub

Mock External APIs for Fast, Reliable Tests

Problem
Testing with real APIs and databases is slow, expensive, and unreliable.
External dependencies create flaky tests that can fail due to network issues, rate limits, or service downtime rather than code problems.
Solution
The patch decorator replaces external calls with controllable mock objects for isolated testing.
Key benefits:

Reproducible results across different machines
Fast, reliable tests that focus on your logic
Test edge cases and error conditions that are hard to trigger naturally

Test your data processing logic without waiting for external services or consuming API quotas.

📖 View Full Article

🧪 Run code

☕️ Weekly Finds

filprofiler
[Performance Profiling]
– A Python memory profiler for data processing applications with native Jupyter support

organize
[Automation]
– The file management automation tool for sorting, renaming, and organizing files

plotnine
[Data Visualization]
– A Grammar of Graphics for Python based on ggplot2 for data visualization

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #213: Query GitHub Issues with Natural Language Using LangChain Read More »

Newsletter #212: Delta Lake: Never Lose Data to Failed Writes Again

📅 Today’s Picks

Delta Lake: Never Lose Data to Failed Writes Again

Problem
Have you ever had a pandas operation fail midway through writing data, leaving you with corrupted datasets?
Partial writes create inconsistent data states that can break downstream analysis and reporting workflows.
Solution
Delta Lake provides ACID transactions that guarantee all-or-nothing writes with automatic rollback on failures.
ACID properties:

Atomicity: Complete transaction success or automatic rollback
Consistency: Data consistency guaranteed
Isolation: Safe concurrent operations
Durability: Version history with time travel

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

TinyDB
[Database]
– Lightweight, document-oriented database written in pure Python with no external dependencies. Designed to be simple and developer-friendly, storing data in JSON format by default.

ollama-python
[LLM]
– Python library that provides the easiest way to integrate Python 3.8+ projects with Ollama, an open-source large language model platform. Offers both synchronous and asynchronous client interfaces for seamless AI model interaction.

PyMC
[ML]
– Python package for Bayesian statistical modeling that focuses on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Enables researchers and data scientists to build sophisticated Bayesian models with minimal algorithmic complexity.

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #212: Delta Lake: Never Lose Data to Failed Writes Again Read More »

Newsletter #211: Secure Database Queries with DuckDB Parameters

🤝 COLLABORATION

How to write better DAGs in Airflow
DAGs (Directed Acyclic Graphs) are Airflow’s workflow definition format. They specify how data tasks connect and execute in sequence.
Well-designed DAGs handle edge cases, scale with data volume changes, and remain maintainable as your pipeline complexity grows.
What you’ll learn:

Design DAGs that are easier to read, test, and maintain
Make your pipelines adapt to your data at runtime with dynamic task mapping
Avoid common pitfalls that can cause performance issues
Create data-aware pipelines with XComs and event-driven scheduling
Learn proven DAG writing best practices including Airflow 3’s latest features

This covers practical patterns for building production-ready workflows that handle failures gracefully and scale with your data infrastructure needs.
Speakers:

Kenten Danas – Senior Manager, Developer Relations at Astronomer
Tamara Fingerlin – Developer Advocate at Astronomer

Register here

📅 Today’s Picks

Secure Database Queries with DuckDB Parameters

Problem
F-strings create SQL injection vulnerabilities by inserting values directly into queries.
Solution
DuckDB’s parameterized queries use placeholders to safely pass parameters and prevent SQL injection attacks.
Other key features of DuckDB:

In-Process Analytics – No external database needed
Fast Performance – Columnar storage for speed
Zero Setup – Works instantly in Python
DataFrame Integration – Native pandas support

📖 View Full Article

⭐ View GitHub

Build Semantic Text Matching with Sentence Transformers

Problem
RapidFuzz, which I introduced in my previous post, excels at lightning-fast string matching.
However, it cannot understand semantic relationships. It scores ‘running shoes’ vs ‘athletic footwear’ at only 0.267 despite describing similar product categories.
RapidFuzz compares characters, not meaning, so different words describing identical concepts get low scores.
Solution
Sentence Transformers comprehends conceptual similarity by analyzing word meanings.
Sentence Transformers follows this process:

Creates embedding vectors that represent word concepts
Similar meanings produce similar embedding patterns
Compares these concept embeddings to identify semantically similar text
Recognizes synonyms and related terminology automatically

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

tenacity
[Testing & Reliability]
– Apache 2.0 licensed general-purpose retrying library for Python to simplify adding retry behavior to just about anything

ParadeDB
[Database & Search]
– Modern Elasticsearch alternative built on Postgres for real-time, update-heavy workloads with full-text search capabilities

responses
[Testing & Mocking]
– Utility library for mocking out the Python Requests library, making it easy to test HTTP API interactions

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #211: Secure Database Queries with DuckDB Parameters Read More »

Newsletter #210: MarkItDown: Convert PDFs to Clean Markdown in 3 Lines

🤝 COLLABORATION

How to write better DAGs in Airflow
DAGs (Directed Acyclic Graphs) are Airflow’s workflow definition format. They specify how data tasks connect and execute in sequence.
Well-designed DAGs handle edge cases, scale with data volume changes, and remain maintainable as your pipeline complexity grows.
What you’ll learn:

Design DAGs that are easier to read, test, and maintain
Make your pipelines adapt to your data at runtime with dynamic task mapping
Avoid common pitfalls that can cause performance issues
Create data-aware pipelines with XComs and event-driven scheduling
Learn proven DAG writing best practices including Airflow 3’s latest features

This covers practical patterns for building production-ready workflows that handle failures gracefully and scale with your data infrastructure needs.
Speakers:

Kenten Danas – Senior Manager, Developer Relations at Astronomer
Tamara Fingerlin – Developer Advocate at Astronomer

Register here

📅 Today’s Picks

Feature Engineering Without Complex Nested Loops

Problem
Nested loops for sequence permutations create exponential complexity that becomes unmanageable as data grows.
Solution
The itertools.permutations() function automatically generates all ordered arrangements of items from your sequences.
Perfect for generating interaction features that preserve temporal or logical ordering in your feature set.

📖 View Full Article

MarkItDown: Convert PDFs to Clean Markdown in 3 Lines

Problem
Have you ever wanted to convert PDFs to text for analysis and search but find it hard to do so?
While there are many tools to convert PDFs to text, they often lose structure and readability.
Solution
Microsoft MarkItDown preserves document structure while converting PDFs to clean markdown format.
The library handles multiple file types and maintains formatting hierarchy:

Clean markdown output with preserved headers and structure
Support for PDFs, Word docs, PowerPoint, and Excel files
Simple three-line implementation for any document type
Seamless integration with existing RAG pipelines

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

scalene
[Performance & Profiling]
– A high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

bandit
[Security & Code Quality]
– A tool designed to find common security issues in Python code through static code analysis

river
[Machine Learning]
– Online machine learning in Python – enabling incremental learning algorithms for streaming data

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #210: MarkItDown: Convert PDFs to Clean Markdown in 3 Lines Read More »

Newsletter #209: Transform PDFs to Pandas with Docling’s Complete Pipeline

🤝 COLLABORATION

Learn ML Engineering for Free on ML Zoomcamp
Learn ML engineering for free on ML Zoomcamp and receive a certificate! Join online for practical, hands-on experience with the tech stack and workflows used in production ML. The next cohort of the course starts on September 15, 2025. Here’s what you’ll learn:
Core foundations:

Python ecosystem: Jupyter, NumPy, Pandas, Matplotlib, Seaborn
ML frameworks: Scikit-learn, TensorFlow, Keras

Applied projects:

Supervised learning with CRISP-DM framework
Classification/regression with evaluation metrics
Advanced models: decision trees, ensembles, neural nets, CNNs

Production deployment:

APIs and containers: Flask, Docker, Kubernetes
Cloud solutions: AWS Lambda, TensorFlow Serving/Lite

Register here

📅 Today’s Picks

Transform PDFs to Pandas with Docling’s Complete Pipeline

Problem
Most PDF processing tools force you to stitch together multiple solutions – one for extraction, another for parsing, and yet another for chunking.
Each step introduces potential data loss and format incompatibilities, making document processing complex and error-prone.
Solution
Docling handles the entire workflow from raw PDFs to structured, searchable content in a single solution.
Key features:

Universal format support for PDF, DOCX, PPTX, HTML, and images
AI-powered extraction with TableFormer and Vision models
Direct export to pandas DataFrames, JSON, and Markdown
RAG-ready output maintains context and structure

📖 View Full Article

☕️ Weekly Finds

semantic-kernel
[AI Orchestration]
– Model-agnostic SDK that empowers developers to build, orchestrate, and deploy AI agents and multi-agent systems with enterprise-grade reliability.

transformers
[Machine Learning]
– The model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

whisper
[Speech Recognition]
– Robust Speech Recognition via Large-Scale Weak Supervision. A multitasking model for multilingual speech recognition, translation, and language identification.

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #209: Transform PDFs to Pandas with Docling’s Complete Pipeline Read More »

Newsletter #208: Stop Loading Full Datasets: Use itertools.islice() for Smart Sampling

🤝 COLLABORATION

Learn ML Engineering for Free on ML Zoomcamp
Learn ML engineering for free on ML Zoomcamp and receive a certificate! Join online for practical, hands-on experience with the tech stack and workflows used in production ML. The next cohort of the course starts on September 15, 2025. Here’s what you’ll learn:
Core foundations:

Python ecosystem: Jupyter, NumPy, Pandas, Matplotlib, Seaborn
ML frameworks: Scikit-learn, TensorFlow, Keras

Applied projects:

Supervised learning with CRISP-DM framework
Classification/regression with evaluation metrics
Advanced models: decision trees, ensembles, neural nets, CNNs

Production deployment:

APIs and containers: Flask, Docker, Kubernetes
Cloud solutions: AWS Lambda, TensorFlow Serving/Lite

Register here

📅 Today’s Picks

Stop Loading Full Datasets: Use itertools.islice() for Smart Sampling

Problem
Data prototyping typically requires loading entire datasets into memory first before sampling.
A 1-million-row dataset consumes 7.6 MB of memory even when you only need 10 rows for initial feature exploration, creating unnecessary resource overhead.
Solution
Use itertools.islice() to extract slices from iterators without loading full datasets into memory first.
Key benefits:

Memory-efficient data sampling
Faster prototyping workflows
Less computational load on laptops

📖 View Full Article

From pandas Full Reloads to Delta Lake Incremental Updates

Problem
Processing entire datasets when you only need to add a few new records wastes time and memory.
Pandas lacks incremental append capabilities, requiring full dataset reload for data updates.
Solution
Delta Lake’s append mode processes only new data without touching existing records.
Key advantages:

Append new records without full dataset reload
Memory usage scales with new data size, not total dataset size
Automatic data protection prevents corruption during updates
Time travel enables rollback to previous dataset versions

Perfect for production data pipelines that need reliable incremental updates.

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

Semantic Kernel
[AI Framework]
– Model-agnostic SDK that empowers developers to build, orchestrate, and deploy AI agents and multi-agent systems with enterprise-grade reliability

Ray
[Distributed Computing]
– AI compute engine with core distributed runtime and AI Libraries for accelerating ML workloads from laptop to cluster

Apache Airflow
[Workflow Orchestration]
– Platform for developing, scheduling, and monitoring workflows with powerful data pipeline orchestration capabilities

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #208: Stop Loading Full Datasets: Use itertools.islice() for Smart Sampling Read More »

Newsletter #207: Build Automated Chart Analysis with Hugging Face SmolVLM

📅 Today’s Picks

Build Automated Chart Analysis with Hugging Face SmolVLM

Problem
Data teams spend hours manually analyzing charts and extracting insights from complex visualizations.
Manual chart analysis creates bottlenecks in decision-making workflows and reduces time available for strategic insights.
Solution
Hugging Face’s SmolVLM transforms this workflow by instantly generating insights, allowing analysts to focus on validation, strategic context, and decision-making rather than basic pattern recognition.
The complete workflow could look like this:

Automated chart interpretation using vision language models
Expert review and validation of AI findings
Strategic context addition by domain specialists

📖 View Full Article

⭐ View GitHub

Hydra Multi-run: Test All Parameters in One Command

Problem
When you run a Python script with different preprocessing strategies and hyperparameter combinations, waiting for each variation to complete before testing the next is time-consuming.
Solution
Hydra multi-run executes all parameter combinations in a single command, saving you time and effort.
Plus, Hydra offers:

YAML-based configuration management
Override parameters from the command line
Compose configs from multiple files
Environment-specific configuration switching

📖 View Full Article

⭐ View GitHub

☕️ Weekly Finds

Scrapegraph-ai
[Data Extraction]
– Python scraper based on AI

Marker
[Document Processing]
– Convert PDF to markdown quickly with high accuracy

EdgeDB
[Database]
– A graph-relational database with declarative schema, built-in migration system, and a next-generation query language

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #207: Build Automated Chart Analysis with Hugging Face SmolVLM Read More »

Newsletter #206: Handle Messy Data with RapidFuzz Fuzzy Matching

📅 Today’s Picks

Handle Messy Data with RapidFuzz Fuzzy Matching

Problem
Traditional regex approaches require hours of preprocessing but still break with common data variations like missing spaces, typos, or inconsistent formatting.
Solution
RapidFuzz eliminates data cleaning overhead with intelligent fuzzy matching.
Key benefits:

Automatic handling of typos, spacing, and case variations
Production-ready C++ performance for large datasets
Full spectrum of fuzzy algorithms in one library

📖 View Full Article

⭐ View GitHub

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #206: Handle Messy Data with RapidFuzz Fuzzy Matching Read More »

Newsletter #205: Build Debuggable Tests: One Assertion Per Function

🤝 COLLABORATION

Learn ML Engineering for Free on ML Zoomcamp
Learn ML engineering for free on ML Zoomcamp and receive a certificate! Join online for practical, hands-on experience with the tech stack and workflows used in production ML. The next cohort of the course starts on September 15, 2025. Here’s what you’ll learn:
Core foundations:

Python ecosystem: Jupyter, NumPy, Pandas, Matplotlib, Seaborn
ML frameworks: Scikit-learn, TensorFlow, Keras

Applied projects:

Supervised learning with CRISP-DM framework
Classification/regression with evaluation metrics
Advanced models: decision trees, ensembles, neural nets, CNNs

Production deployment:

APIs and containers: Flask, Docker, Kubernetes
Cloud solutions: AWS Lambda, TensorFlow Serving/Lite

Register here

📅 Today’s Picks

Ruff: Stop AI Code Complexity Before It Hits Production

Problem
AI agents often create overengineered code with multiple nested if/else and try/except blocks, increasing technical debt and making functions difficult to test.
However, it is time-consuming to check each function manually.
Solution
Ruff’s C901 complexity check automatically flags overly complex functions before they enter your codebase.
This tool counts decision points (if/else, loops) that create multiple execution paths in your code.
Key benefits:

Automatic detection of complex functions during development
Configurable complexity thresholds for your team standards
Integration with pre-commit hooks for automated validation
Clear error messages showing exact complexity scores

No more manual code reviews to catch overengineered functions.

📖 View Full Article

Build Debuggable Tests: One Assertion Per Function

Problem
Tests with multiple assertions make debugging harder.
When a test fails, you can’t tell which assertion broke without examining the code.
Solution
Create multiple specific test functions for different scenarios of the same function.
Follow these practices for focused test functions:

One assertion per test function for clear failure points
Use descriptive test names that explain the expected behavior
Maintain consistent naming patterns across your test suite

This approach makes your test suite more maintainable and failures easier to diagnose.

📖 View Full Article

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

.codecut-subscribe-form .codecut-input {
background: #2F2D2E !important;
border: 1px solid #72BEFA !important;
color: #FFFFFF !important;
}
.codecut-subscribe-form .codecut-input::placeholder {
color: #999999 !important;
}
.codecut-subscribe-form .codecut-subscribe-btn {
background: #72BEFA !important;
color: #2F2D2E !important;
}
.codecut-subscribe-form .codecut-subscribe-btn:hover {
background: #5aa8e8 !important;
}

.codecut-subscribe-form {
max-width: 650px;
display: flex;
flex-direction: column;
gap: 8px;
}
.codecut-input {
-webkit-appearance: none;
-moz-appearance: none;
appearance: none;
background: #FFFFFF;
border-radius: 8px !important;
padding: 8px 12px;
font-family: ‘Comfortaa’, sans-serif !important;
font-size: 14px !important;
color: #333333;
border: none !important;
outline: none;
width: 100%;
box-sizing: border-box;
}
input[type=”email”].codecut-input {
border-radius: 8px !important;
}
.codecut-input::placeholder {
color: #666666;
}
.codecut-email-row {
display: flex;
align-items: stretch;
height: 36px;
gap: 8px;
}
.codecut-email-row .codecut-input {
flex: 1;
}
.codecut-subscribe-btn {
background: #72BEFA;
color: #2F2D2E;
border: none;
border-radius: 8px;
padding: 8px 14px;
font-family: ‘Comfortaa’, sans-serif;
font-size: 14px;
font-weight: 500;
cursor: pointer;
text-decoration: none;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.3s ease;
}
.codecut-subscribe-btn:hover {
background: #5aa8e8;
}
.codecut-subscribe-btn:disabled {
background: #999;
cursor: not-allowed;
}
.codecut-message {
font-family: ‘Comfortaa’, sans-serif;
font-size: 12px;
padding: 8px;
border-radius: 6px;
display: none;
}
.codecut-message.success {
background: #d4edda;
color: #155724;
display: block;
}
@media (max-width: 480px) {
.codecut-email-row {
flex-direction: column;
height: auto;
gap: 8px;
}
.codecut-input {
border-radius: 8px;
height: 36px;
}
.codecut-subscribe-btn {
width: 100%;
text-align: center;
border-radius: 8px;
height: 36px;
}
}

Subscribe

Newsletter #205: Build Debuggable Tests: One Assertion Per Function Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran