📅 Today’s Picks |
Delta Lake: Time Travel Your Data Pipeline
Problem:
Once data is overwritten in pandas, previous versions are lost forever.
You can’t debug pipeline issues or rollback bad changes when your data history disappears.
Solution:
Delta Lake maintains version history allowing you to query any previous state of your data by timestamp or version number.
Use cases:
- Compare today’s sales data with yesterday’s to spot revenue anomalies
- Recover accidentally deleted customer records from last week’s backup
- Audit financial reports using data exactly as it existed at quarter-end
Full Article:
☕️ Weekly Finds |
DALEX
ML
Model Agnostic Language for Exploration and eXplanation – helps explore and explain behavior of complex machine learning models
OpenBB
Data Processing
Investment Research for Everyone, Anywhere – free and open-source financial platform with analytics tools
fastlite
Python Utils
A bit of extra usability for sqlite – quality-of-life improvements for interactive use of sqlite-utils library
⭐ Related Post |
Delta Lake: Never Lose Data to Failed Writes Again
Problem:
Have you ever had a pandas operation fail midway through writing data, leaving you with corrupted datasets?
Partial writes create inconsistent data states that can break downstream analysis and reporting workflows.
Solution:
Delta Lake provides ACID transactions that guarantee all-or-nothing writes with automatic rollback on failures.
ACID properties:
- Atomicity: Complete transaction success or automatic rollback
- Consistency: Data consistency guaranteed
- Isolation: Safe concurrent operations
- Durability: Version history with time travel
Full Article:
|