| 🤝 COLLABORATION |
How to write better DAGs in Airflow
DAGs (Directed Acyclic Graphs) are Airflow’s workflow definition format. They specify how data tasks connect and execute in sequence.
Well-designed DAGs handle edge cases, scale with data volume changes, and remain maintainable as your pipeline complexity grows.
What you’ll learn:
- Design DAGs that are easier to read, test, and maintain
- Make your pipelines adapt to your data at runtime with dynamic task mapping
- Avoid common pitfalls that can cause performance issues
- Create data-aware pipelines with XComs and event-driven scheduling
- Learn proven DAG writing best practices including Airflow 3’s latest features
This covers practical patterns for building production-ready workflows that handle failures gracefully and scale with your data infrastructure needs.
Speakers:
- Kenten Danas – Senior Manager, Developer Relations at Astronomer
- Tamara Fingerlin – Developer Advocate at Astronomer
|
| 📅 Today’s Picks |
Feature Engineering Without Complex Nested Loops
Problem:
Nested loops for sequence permutations create exponential complexity that becomes unmanageable as data grows.
Solution:
The itertools.permutations() function automatically generates all ordered arrangements of items from your sequences.
Perfect for generating interaction features that preserve temporal or logical ordering in your feature set.
Full Article:
MarkItDown: Convert PDFs to Clean Markdown in 3 Lines
Problem:
Have you ever wanted to convert PDFs to text for analysis and search but find it hard to do so?
While there are many tools to convert PDFs to text, they often lose structure and readability.
Solution:
Microsoft MarkItDown preserves document structure while converting PDFs to clean markdown format.
The library handles multiple file types and maintains formatting hierarchy:
- Clean markdown output with preserved headers and structure
- Support for PDFs, Word docs, PowerPoint, and Excel files
- Simple three-line implementation for any document type
- Seamless integration with existing RAG pipelines
Full Article:
|
| ⭐ Related Post |
Transform PDFs to Pandas with Docling’s Complete Pipeline
Problem:
Most PDF processing tools force you to stitch together multiple solutions – one for extraction, another for parsing, and yet another for chunking. Each step introduces potential data loss and format incompatibilities, making document processing complex and error-prone.
Solution:
Docling handles the entire workflow from raw PDFs to structured, searchable content in a single solution.
Key features:
- Universal format support for PDF, DOCX, PPTX, HTML, and images
- AI-powered extraction with TableFormer and Vision models
- Direct export to pandas DataFrames, JSON, and Markdown
- RAG-ready output maintains context and structure
|
| ☕️ Weekly Finds |
scalene
Performance & Profiling
A high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
bandit
Security & Code Quality
A tool designed to find common security issues in Python code through static code analysis
river
Machine Learning
Online machine learning in Python – enabling incremental learning algorithms for streaming data


