Oops, I dropped my Hive Metastore database

The Hive Metastore (HMS) is backed up by a database (MySQL for us), which stores information about HDFS files, stats and more. Without this database, you have no Hive.

A few days ago, a DROP has been issued on the wrong server. Bye Bye metastore. What do you do then?

First things first, I looked at the backups. We had one from 3 hours before, so still quite recent. After it was restored I noticed that Hive worked, but not all data was present visible.

For context, our tables are ORC and transactional. As HDFS files cannot be updated, the way for ORC (and other DBs I know of) to manage transactions is to have a base directory as well as some deltas holding new changes. Reading such a table thus means reading the base directory and all the deltas to apply potential updates. As reading the deltas can become expensive, they are eventually compacted, in 2 possible ways. Quite often the deltas are squashed together (minor compaction for ORC) and once in a while, the base directory is fully rewritten to apply all the deltas (major compaction for ORC).

In my case, between the backup and the restore some new data had been added (new delta directories appeared) but no compaction happened.

This means that the HMS was completely ignoring all the new delta files, but was happily using all the files it knew about. I should add that I have a way to replay data of the last hours/days if needed. Once this was understood and confirmed, the fix was easy:

 

  1. Run a major compaction on all tables
    • All tables ended up consisting of one base directory (used and known by HMS) and a few delta directories ignored by HMS.
  2. Delete all remaining deltas
    • They were not used anyway.
  3. Replay my data
    • Hive was up to date again.

Of course, my tables are partitioned and compaction happens per partition so there was a bit of bash-fu to explicitly compact all partitions, then double check that all remaining deltas were created after the backup. Nothing too complex.

This worked because no compactions ran between restore and backup. A compaction would completely change the files on disk, and the HMS metadata would thus be completely out of sync with the actual files. I have no idea how I would have recovered that one.

On a side note, Hive hanged a few times with this error message in the log:

insert into “NOTIFICATION_LOG
” (“NL_ID”, “EVENT_ID”, “EVENT_TIME”, “EVENT_TYPE”, “DB_NAME”, “TBL_NAME”, “MESSAGE”, “MESSAGE_FORMAT”) values(774946,774869,1543829236,’OPEN_TXN’,’null’,’ ‘,'{“txnIds”:null,”timestamp”:1543829236,”fromTxnId”
:54788,”toTxnId”:54788,”server”:”thrift://x.x.x.x:9083,thrift://y.y.y.y:9083″,”servicePrincipal”:”hive/_HOST@EXAMPLE.COM”}’,’json-0.2′)

metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(201)) – MetaException(message:Unable to execute direct SQL java.sql.SQLInte
grityConstraintViolationException: Duplicate entry ‘774946’ for key ‘PRIMARY’

The fix was quite easy. Log into the metastore DB, remove the offending row from NOTIFICATION_LOG, and update the sequence (NOTIFICICATION_LOG_SEQUENCE table) to the maximum value of NL_ID in NOTIFICATION_LOG + 1. The NOTIFCATION_LOG table has something to do with compactions, and as you compacted all tables anyway old values do not matter much.

What is the takeaway? Compactions are run dynamically by Hive. It would be smart to keep an eye on them, and when some happened to run a backup just after. At least you would still be able to partially recover consistent recent enough data.

Advertisements