Issue writing pyspark or sql query to conditionally count column

  Kiến thức lập trình

Trying to write a query that correctly creates and populates an ID column that counts until it encounters a new non-null value in the Data column, at which point it restarts at 1. Here’s an example of the data.

+-------+
| Data  | 
+-------+
|"this" |       
| null  | 
|"that" |  
|"those"| 
| null  |  
| null  | 

And here’s a sample of what the output should look like:

+-------+---+
| Data  | ID|
+-------+---+
|"this" | 1 |       
| null  | 2 | 
|"that" | 1 | 
|"those"| 1 | 
| null  | 2 |  
| null  | 3 |

I’ve attempted to use both row_number() and Monotonically_increasing_id() but neither are producing the output that I need.

Any ideas?

Thank you!

1

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT