How to change the value of a column based on a condition and an inner join?
I have two Pandas dataframes; let’s call them df_a and df_b.
Merging 2 DataFrame to find different rows
I have two DataFrames that contain some log data, lets call them CorrectData and WrongData
How to set value to a slice of a multi-index dataframe from another slice
I have a multi-index pandas dataframe, and I need assign the value to a slice of the dataframe (based on one index), with a calculation on another slice of the same dataframe (based on the same index).
I have tried to assign the values using loc, but the entire slice ends up NaN.
I have written this simple example code to make clear the problem I’m having.
How to set only a subset of columns and rows to nan in a pandas dataframe based on a condition? [duplicate]
This question already has answers here: Pandas DataFrame: replace all values in a column, based on condition (10 answers) Closed last month. I have a pandas dataframe (let’s call it df), and one of the columns of this df (let’s say column 10) specifies if a sensor has acquired data (1 it did, 0 it […]
Convert category structure to merge with other df
I have 2 dfs, the first df has transactions that each have a category id, the categories are multi layered and the number of layers varies. The 2. df has the categories. for each category it has the category id and the parent id. I would like to prepare df 2 in such a way that i can merge it with df 1 and then have all the layers of the categories in df1.
sampling unbalanced data frame columns
If I have a data frame df
, which has five columns: ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’, which contains python strings. Currently, ‘B’, ‘C’, ‘D’, and ‘E’ has unbalanced unique values (i.e., some unique values have more rows than the others). How can I sample df
so that column ‘B’, ‘C’, ‘D’, and ‘E’ have balanced number of unique values (i.e., each unique value in a specific column has the same number of rows)? I want to sample with replacement so that the resulting data frame has the same length as the original data frame, though some rows may be duplicated and some may be omitted. Thanks!
How to divide values in columns in one dataframe by the same value in another df in Pandas?
I want to divide all values from particular columns in the dataframe rpk
by the same value from the dataframe scaling_factor
, according to sample_name. I know how to do it for a particular value (e.g. for the column ‘P1-6’ in rpk
all the values should be divided by 2 – according to value factor
for ‘P1-6’ in scaling_factor
dataframe) but how to do it for all samples?.
Trying to exclude blank entries on pandas dataframe.loc command
I have two lists of people in two separate systems. One is our data warehouse, and the other is an autodialer. We recently had an issue where some of the data got duplicated in the dialer. We are trying to identify the problem records and I’m trying to create two lists, one that is a priority list with only active people, and the other is a complete list with everybody. I have two DataFrames that I merged into a single DataFrame and then substituted a blank string for the NaN’s so it’s more readable for the end users. However, now I’m trying to remove the blank records for the “priority” list (the ID gets blanked out as part of the process when the person is made inactive). However when I try it’s not working. Here is what I’ve got:
Counting number of separate events in dataframe
I am trying to count the number of separate events in a Dataframe (not number of occurences)
How do I select rows from a DataFrame based on column values?
How can I select rows from a DataFrame based on values in some column in Pandas?