Hive logs are very verbose, and I personally find it a pain to wade through them when I try to understand which queries my ETL tool decided to generate.
To help with this, I created this small python script which looks at hive logs files and output the SQL queries and only the queries, with some information about them if known: time started, duration, success.
Usage:
./hqe.py --help usage: hqe.py [-h] [--since SINCE] [--to TO] [--logdir LOGDIR] [--glob LOGFILE_GLOB] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}] Displays queries ran on Hive. optional arguments: -h, --help show this help message and exit --since SINCE how far to look back. (default: 15m) --to TO How far to look forward. (default: now) --logdir LOGDIR Directory of hive log files. (default: /var/log/hive) --glob LOGFILE_GLOB Shell pattern of hive logfiles inside their logdir. (default: hiveserver2.log*) --loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}, -l {DEBUG,INFO,WARNING,ERROR,CRITICAL} Log level. (default: warn)
Sample output:
Started at 2017-06-22 05:30:58 for 12.788000s by hive on ip-10-0-0-10.eu-west-2.compute.internal (Probably success). (Thread id: 79733, query id: hive_20170622053058_676612af-7bb8-4c4b-8fce-51bd1ae7be71, txn id: 0): SELECT id, count(*) FROM raw.event GROUP BY 1 ORDER BY -- required for next step sys_partition Started at 2017-06-22 05:31:25 for 0.018000s by Unknown on Unknown (Probably success). (Thread id: 79770, query id: hive_20170622053125_7d8e644a-5c23-4ca8-ab0f-20becdd65c3b, txn id: Unknown): use events Started at 2017-06-22 05:31:25 for Unknowns by Unknown on Unknown (FAILED). (Thread id: handler-46, query id: Unknown, txn id: Unknown): MERGE INTO mart.click dst USING ( SELECT [big sql...] ) as r FROM raw.click WHERE ${SEQ_CHECKER_SQL} ) src ON [big sql...] WHEN NOT MATCHED THEN INSERT VALUES ( [more sql] ) Error: ParseException line 36:4 cannot recognize input near '$' '{' 'SEQ_CHECKER_SQL' in expression specification
As you can see:
- If user, hostname and duration are know they are displayed,
- query is displayed with the same formatting as it was sent, inclusive comments,
- error (if any) is showed. In my case, a variable is not expanded by the ETL tool.
You can find the source on github.