Grab your coffee. Here are this week’s highlights.
๐ Today’s Picks
Polars scan_csv: Merge CSVs with Different Schemas in One Call
Problem
Polars’ scan_csv lets you load multiple CSV files lazily, reading data only when needed.
But before v1.39.0, every file had to share the same columns, or you’d get a SchemaError.
Solution
Polars v1.39.0 introduces missing_columns="insert" in scan_csv, allowing you to combine multiple files in one call while null-filling any missing columns.
Build Professional Python Packages with UV –package

Problem
Python packages turn your code into reusable modules you can share across projects.
But building them requires complex setup with setuptools, managing build systems, and understanding distribution mechanics.
Solution
UV, a fast Python package installer and resolver, reduces the entire process to 2 simple commands:
- uv init –package sets up your package structure instantly
- uv build and uv publish to create and distribute to PyPI
๐ Latest Deep Dives
uv vs pixi: Which Python Environment Manager Should You Use for Data Science?
What if one tool could manage both your Python packages and compiled system libraries?
uv installs Python packages from PyPI, but it doesn’t support compiled C/C++ libraries.
The typical workaround is to install system libraries separately using an OS package manager, then manually align versions with your Python dependencies.
Since these system dependencies aren’t captured in project files, reproducing the environment across machines can be unreliable.
pixi solves this by managing both Python packages from PyPI and compiled system libraries from conda-forge in a single tool.
Quick comparison:
- uv: fast, reliable lockfiles, Python-only
- conda: system libraries supported, but slower and no lockfiles
- pixi: fast, unified, with system libraries, lockfiles, and a built-in task runner
In this article, I compare uv and pixi on a real ML project so you can see how they perform in practice.
๐ View Full Article
โ๏ธ Weekly Finds
datachain [Data Processing] – Process and curate unstructured data from cloud storages using local ML models and Python
label-studio [Data Processing] – Open source data labeling and annotation tool with standardized output format for ML workflows
qsv [Command Line] – Blazingly fast CSV command-line toolkit for slicing, dicing, and analyzing tabular data
Looking for a specific tool? Explore 70+ Python tools โ
Stay Current with CodeCut
Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.




