Relative Content

Tag Archive for pythonpandasdataframe

How to set value to a slice of a multi-index dataframe from another slice

I have a multi-index pandas dataframe, and I need assign the value to a slice of the dataframe (based on one index), with a calculation on another slice of the same dataframe (based on the same index).
I have tried to assign the values using loc, but the entire slice ends up NaN.
I have written this simple example code to make clear the problem I’m having.

Convert category structure to merge with other df

I have 2 dfs, the first df has transactions that each have a category id, the categories are multi layered and the number of layers varies. The 2. df has the categories. for each category it has the category id and the parent id. I would like to prepare df 2 in such a way that i can merge it with df 1 and then have all the layers of the categories in df1.

sampling unbalanced data frame columns

If I have a data frame df, which has five columns: ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’, which contains python strings. Currently, ‘B’, ‘C’, ‘D’, and ‘E’ has unbalanced unique values (i.e., some unique values have more rows than the others). How can I sample df so that column ‘B’, ‘C’, ‘D’, and ‘E’ have balanced number of unique values (i.e., each unique value in a specific column has the same number of rows)? I want to sample with replacement so that the resulting data frame has the same length as the original data frame, though some rows may be duplicated and some may be omitted. Thanks!

How to divide values in columns in one dataframe by the same value in another df in Pandas?

I want to divide all values from particular columns in the dataframe rpk by the same value from the dataframe scaling_factor, according to sample_name. I know how to do it for a particular value (e.g. for the column ‘P1-6’ in rpk all the values should be divided by 2 – according to value factor for ‘P1-6’ in scaling_factor dataframe) but how to do it for all samples?.

Trying to exclude blank entries on pandas dataframe.loc command

I have two lists of people in two separate systems. One is our data warehouse, and the other is an autodialer. We recently had an issue where some of the data got duplicated in the dialer. We are trying to identify the problem records and I’m trying to create two lists, one that is a priority list with only active people, and the other is a complete list with everybody. I have two DataFrames that I merged into a single DataFrame and then substituted a blank string for the NaN’s so it’s more readable for the end users. However, now I’m trying to remove the blank records for the “priority” list (the ID gets blanked out as part of the process when the person is made inactive). However when I try it’s not working. Here is what I’ve got: