Apache Flink stream files from directory in order (timestamp)

  Kiến thức lập trình

Is it possible to use Flink’s FileSource to produce elements in ascending order (by filename or timestamp of creation)?

Here the problem is described: Apache Flink: Process data in order with mapPartition
However, this solution seems deprecated and relates to DataSet. At the moment I am using the DataStream API. I am reading files from a directory continuously. Under normal condition (live case), this is not a problem, as I can define the temporal disorderness. I know how many files are created within the polling interval specified in the flink application. In the case of processing old files, with the same application logic, of course the defined waiting time does not hold in any case, because of the random reading of the files (yes I know, because of the process scheduler).

Is there anything I can do? So my first idea was writing a Bash script or using DataGen instead of FileSource, for ordering the files. A second idea was using another application logic only for the init (reading old files and by this init states). However, all these solutions do not seem to be very professional.

I did not try any solution yet.

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT