Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Pandas

Efficient Data Appending in Parquet Files: Delta Lake vs. Pandas

Appending data to an existing Parquet file using pandas involves loading the existing table and merging the new data with the existing table.

This process can be time-consuming and memory-intensive.

With Delta Lake, you can add, remove, or modify columns without the need to recreate the entire table.

Efficient Data Appending in Parquet Files: Delta Lake vs. Pandas Read More »

Raise an Exception for a Chained Assignment in pandas

Pandas allows chained assignments, which involve performing multiple indexing operations in a single statement, but they can lead to unexpected results or errors.

The statement above fails to modify the values in df as intended, but it doesn’t throw an error.

Setting pd.options.mode.chained_assignment to 'raise' will cause pandas to raise an exception if a chained assignment occurs.

My previous tips on pandas.

Raise an Exception for a Chained Assignment in pandas Read More »

Optimizing Memory Usage in a pandas DataFrame with infer_objects

pandas DataFrames that contain columns of mixed data types are stored in a more general format (such as object), resulting in inefficient memory usage and slower computation times.

df.infer_objects() infers the true data types of columns in a DataFrame, which helps optimize memory usage in your code.

In the code above, df.infer_objects() converts the data type of “col1” from object to int64, saving approximately 27 MB of memory.

My previous tips on pandas.

Optimizing Memory Usage in a pandas DataFrame with infer_objects Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran