Relative Content

Tag Archive for pysparkcountlazy-evaluation

Divide by the total dataframe row count in a PySpark efficiently

let’s assume I have an expensive PySpark query that results in a large dataframe sdf_input. I now want to add a column to this dataframe that requires the count of the total number of rows in sdf_input. Let’s for simplicity say I want to divide another column A by the total_num_rows.