Overwrite a hive table without downtime

  Kiến thức lập trình

I have a hive table which is associated with an HDFS path. The table is overwritten by a periodic job and has a few downstream consumers. The table gets dropped while being overwritten and if a downstream consumer tries to access this table during this time it throws an error and the job fails. How can I prevent the table from being unavailable.

Here’s an approach I tried which doesn’t seem to work

  1. Write data to a temporary table (copy of original table)
  2. Get new location of the temporary table
  3. Update original table’s location with temporary table’s location (spark.sql(s"ALTER TABLE $originalTable SET LOCATION '$tempTableLocation'"))
  4. Run spark.sql(s"MSCK REPAIR TABLE $originalTable")

The location seems to be updated when I run DESCRIBE FORMATTED $originalTable but when I try to load the data from original table it still gets data from the previous path.

How can I fix this?

LEAVE A COMMENT