Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter #297: Polars scan_csv: Merge CSVs with Different Schemas in One Call

Newsletter #297: Polars scan_csv: Merge CSVs with Different Schemas in One Call

Grab your coffee. Here are this week’s highlights.


๐Ÿ“… Today’s Picks

Polars scan_csv: Merge CSVs with Different Schemas in One Call

Code example: Polars scan_csv: Merge CSVs with Different Schemas in One Call

Problem

Polars’ scan_csv lets you load multiple CSV files lazily, reading data only when needed.

But before v1.39.0, every file had to share the same columns, or you’d get a SchemaError.

Solution

Polars v1.39.0 introduces missing_columns="insert" in scan_csv, allowing you to combine multiple files in one call while null-filling any missing columns.


Build Professional Python Packages with UV –package

Code example: Build Professional Python Packages with UV --package

Problem

Python packages turn your code into reusable modules you can share across projects.

But building them requires complex setup with setuptools, managing build systems, and understanding distribution mechanics.

Solution

UV, a fast Python package installer and resolver, reduces the entire process to 2 simple commands:

  • uv init –package sets up your package structure instantly
  • uv build and uv publish to create and distribute to PyPI

๐Ÿ“š Latest Deep Dives

uv vs pixi: Which Python Environment Manager Should You Use for Data Science?

What if one tool could manage both your Python packages and compiled system libraries?

uv installs Python packages from PyPI, but it doesn’t support compiled C/C++ libraries.

The typical workaround is to install system libraries separately using an OS package manager, then manually align versions with your Python dependencies.

Since these system dependencies aren’t captured in project files, reproducing the environment across machines can be unreliable.

pixi solves this by managing both Python packages from PyPI and compiled system libraries from conda-forge in a single tool.

Quick comparison:

  • uv: fast, reliable lockfiles, Python-only
  • conda: system libraries supported, but slower and no lockfiles
  • pixi: fast, unified, with system libraries, lockfiles, and a built-in task runner

In this article, I compare uv and pixi on a real ML project so you can see how they perform in practice.

๐Ÿ“– View Full Article


โ˜•๏ธ Weekly Finds

datachain [Data Processing] – Process and curate unstructured data from cloud storages using local ML models and Python

label-studio [Data Processing] – Open source data labeling and annotation tool with standardized output format for ML workflows

qsv [Command Line] – Blazingly fast CSV command-line toolkit for slicing, dicing, and analyzing tabular data

Looking for a specific tool? Explore 70+ Python tools โ†’

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran