Pandas Archives

pandas.DataFrame.corrwith: Compute Pairwise Correlation Between 2 DataFrame

Leave a Comment / Pandas / Khuyen Tran

If you want to compute between rows or columns of two DataFrame, use corrwith.

Link to my previous tips on pandas.

pandas.DataFrame.corrwith: Compute Pairwise Correlation Between 2 DataFrame Read More »

pandas.crosstab: Create a Cross Tabulation from a Pandas DataFrame

Leave a Comment / Pandas / Khuyen Tran

Cross tabulation allows you to analyze the relationship between multiple variables. To turn a pandas DataFrame into a cross tabulation, use pandas.crosstab.

Link to my previous tips on pandas.

Google Colab notebook of this tip.

pandas.crosstab: Create a Cross Tabulation from a Pandas DataFrame Read More »

df.query: Query Columns Using Boolean Expression

Leave a Comment / Pandas / Khuyen Tran

It can be lengthy to filter columns of a pandas DataFrame using brackets. To shorten the filtering statements, use df.query instead.

Link to the Google Colab of this tip.

Link to my previous tips on pandas.

df.query: Query Columns Using Boolean Expression Read More »

pandas.melt: Unpivot a DataFrame

Leave a Comment / Pandas / Khuyen Tran

If you want to unpivot a DataFrame from wide to long format, use pandas.melt.

For example, you can use pandas.melt to turn multiple columns (Aldi, Walmart, Costco) into values of one column (store).

Google Colab notebook of the code above.

Link to my previous tips on pandas.

pandas.melt: Unpivot a DataFrame Read More »

Assign Name to a Pandas Aggregation

Leave a Comment / Pandas / Khuyen Tran

By default, aggregating a column returns the name of that column. If you want to assign a new name to the aggregation, use name = (column, agg_method).

Link to Google Colab notebook of this tip.

Assign Name to a Pandas Aggregation Read More »

Get Certain Values From a DataFrame

Leave a Comment / Pandas / Khuyen Tran

If you want to get the count of a value in a column, use value_counts. However, if you want to get the percentage of a value in a column, add normalize=True to value_counts.

Link to the previous tips on pandas.

Link to Google Colab of this tip.

Get Certain Values From a DataFrame Read More »

Filter a pandas DataFrame by Value Counts

Leave a Comment / Pandas / Khuyen Tran

To filter a pandas DataFrame based on the occurrences of categories, you might attempt to use df.groupby and df.count. However, since the Series returned by the count method is shorter than the original DataFrame, you will get an error when filtering.

Instead of using count, use transform. This method will return the Series with the same length as the original DataFrame. Now you can filter without encountering any error.

You can play with the code in this Colab notebook.

Link to my previous tips on pandas.

Filter a pandas DataFrame by Value Counts Read More »

pandas.DataFrame.combine_first: Update Null Elements Using Another DataFrame

Leave a Comment / Pandas / Khuyen Tran

If you want to fill null values in one DataFrame with non-null values at the same locations from another DataFrame, use pandas.DataFrame.combine_first.

In the code above, the values at the first row of store1 are updated with the values at the first row of store2.

Link to my previous tips on pandas.

pandas.DataFrame.combine_first: Update Null Elements Using Another DataFrame Read More »