Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter #272: Split Large Parquet Files Automatically with Polars

Newsletter #272: Split Large Parquet Files Automatically with Polars


๐Ÿ“… Today’s Picks

Split Large Parquet Files Automatically with Polars

Code example: Split Large Parquet Files Automatically with Polars

Problem

When writing large datasets to Parquet, you end up with either one massive file that is slow to read or must manually split data into smaller files.

Solution

With Polars PartitionMaxSize, output is automatically broken into multiple Parquet files according to a defined size limit.

This enables:

  • Parallel reads across multiple cores
  • Faster, more reliable cloud storage transfers

Coiled: One Decorator Replaces Your Entire Docker Workflow (Sponsored)

Code example: Coiled: One Decorator Replaces Your Entire Docker Workflow

Problem

Have you ever had code work locally but fail on cloud VMs because of missing dependencies or version mismatches?

Docker solves this by freezing dependencies, but introduces friction: Dockerfiles, slow builds, registry pushes, and full redeploys for minor package changes.

Solution

Coiled can remove Docker from the workflow entirely. With a single decorator, it automatically syncs your local environment to the cloud.

Key features:

  • Exact dependency replication from local to cloud
  • No need for container builds or registry management
  • Compatible with pandas, Polars, DuckDB, Dask, and more
  • Faster deployments through smart caching

โ˜•๏ธ Weekly Finds

crewAI [LLM] – Framework for orchestrating role-playing autonomous AI agents that work together to accomplish complex tasks

Ray [MLOps] – Unified framework for scaling AI and Python applications from laptop to cluster with distributed runtime and ML libraries

Metabase [Data Viz] – Open-source business intelligence tool that lets everyone visualize, analyze, and share data insights

Looking for a specific tool? Explore 70+ Python tools โ†’

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

Leave a Comment

Your email address will not be published. Required fields are marked *

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran