Divide by the total dataframe row count in a PySpark efficiently
let’s assume I have an expensive PySpark query that results in a large dataframe sdf_input
. I now want to add a column to this dataframe that requires the count of the total number of rows in sdf_input
. Let’s for simplicity say I want to divide another column A
by the total_num_rows
.