Pyspark – stream to stream join – state store not getting cleaned up
I am trying to do a stream-to-stream join in pyspark. I have two streams reading from Kafka. Heres the schema:
StreamA : EventTime, Key, ValueA
StreamB : EventTime, Key, ValueB
I have set watermark of 1 hour on both streams.
Pyspark – stream to stream join – state store not getting cleaned up
I am trying to do a stream-to-stream join in pyspark. I have two streams reading from Kafka. Heres the schema:
StreamA : EventTime, Key, ValueA
StreamB : EventTime, Key, ValueB
I have set watermark of 1 hour on both streams.
Pyspark – stream to stream join – state store not getting cleaned up
I am trying to do a stream-to-stream join in pyspark. I have two streams reading from Kafka. Heres the schema:
StreamA : EventTime, Key, ValueA
StreamB : EventTime, Key, ValueB
I have set watermark of 1 hour on both streams.
Pyspark – stream to stream join – state store not getting cleaned up
I am trying to do a stream-to-stream join in pyspark. I have two streams reading from Kafka. Heres the schema:
StreamA : EventTime, Key, ValueA
StreamB : EventTime, Key, ValueB
I have set watermark of 1 hour on both streams.
Using pyspark structured streaming to parse Kafka but getting null
I try to parse Kafka using the code bellow:
How to find if Data Frame has null column or schema is not matching
I have below schema fixed