Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Pandas

Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames

As a data scientist, you might be familiar with both Pandas and SQL. However, there might be some queries, transformations that you feel comfortable doing in SQL instead of Python.

Wouldn’t it be nice if you can query a pandas DataFrame using SQL or use a Python function within a SQL query?

That is when FugueSQL comes in handy. FugueSQL is a Python library that allows users to combine Python code and SQL commands.

In my latest article, we will explore some utilities of FugueSQL and compare FugueSQL with other tools such as pandasql.

Link to the article.

Link to the source code

Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames Read More »

Filter a pandas DataFrame by Value Counts

To filter a pandas DataFrame based on the occurrences of categories, you might attempt to use df.groupby and df.count. However, since the Series returned by the count method is shorter than the original DataFrame, you will get an error when filtering.

Instead of using count, use transform. This method will return the Series with the same length as the original DataFrame. Now you can filter without encountering any error.

You can play with the code in this Colab notebook.

Link to my previous tips on pandas.

Filter a pandas DataFrame by Value Counts Read More »

pandas.DataFrame.combine_first: Update Null Elements Using Another DataFrame

If you want to fill null values in one DataFrame with non-null values at the same locations from another DataFrame, use pandas.DataFrame.combine_first.

In the code above, the values at the first row of store1 are updated with the values at the first row of store2.

Link to my previous tips on pandas.

pandas.DataFrame.combine_first: Update Null Elements Using Another DataFrame Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran