📅 Today’s Picks |
PySpark Transformations: Python API vs SQL Expressions
Problem:
PySpark offers two ways to handle SQL transformations. How do you know which one to use?
Solution:
Choose based on your development style and team expertise.
Use the DataFrame API if you’re comfortable with Python and need Python-native development with type safety and autocomplete support.
Use selectExpr() if you’re comfortable with SQL and need familiar SQL patterns and simplified CASE statements.
Both methods deliver the same performance, so pick the approach that fits your workflow.
☕️ Weekly Finds |
dotenvx
Python Utils
A secure dotenv with encryption, syncing, and zero-knowledge key sharing to make .env files secure and team-friendly
databases
Data Processing
Async database support for Python with support for PostgreSQL, MySQL, and SQLite
pomegranate
ML
Fast and flexible probabilistic modeling in Python implemented in PyTorch
⭐ Related Post |
DuckDB: Zero-Config SQL Database for DataFrames
Problem:
Setting up database servers for SQL operations requires complex configuration, service management, and credential setup.
This creates barriers between data scientists and their analytical workflows.
Solution:
DuckDB provides an embedded SQL database with zero configuration required.
Key benefits:
- No server installation or management needed
- Direct SQL operations on DataFrames and files
- Compatible with pandas, Polars, and Arrow ecosystems
- Fast analytical queries with columnar storage
- Open-source with active development community
Query your data instantly without database administration overhead.
Full Article: