Relative Content

Tag Archive for apache-sparkspark-structured-streaming

Spark Arbitrary Stateful Operations – how does Spark manage the timeout?

I’m looking at this example and also tried to apply it for my use case, but I couldn’t understand how Spark uses the hasTimedOut to manage the data in memory. For example, what happen if I handle a session by its key and stop receiving events for this specific key – that’s mean I’m no longer going to get into the mergeSessions function for this key, and consequently never will evict the events that related to this key so they will remain in the memory.

spark structured streaming window

i want to process all the data in the 60 seconds window, but i found data which belongs to the previous window, how to avoid this?

Unbounded table contains old data in Spark structured streaming?

I’m using spark structured streaming. I don’t understand some of its mechanics. Does an Unbounded table keep old data in memory? If yes, is there any way to delete old data in Unbounded table? In case of using output append, does the old data in the Unbounded table still exist?