Hive logs are very verbose, and I personally find it a pain to wade through them when I try to understand which queries my ETL tool decided to generate.
To help with this, I created this small python script which looks at hive logs files and output the SQL queries and only the queries, with some information about them if known: time started, duration, success.
Usage:
./hqe.py --help
usage: hqe.py [-h] [--since SINCE] [--to TO] [--logdir LOGDIR]
[--glob LOGFILE_GLOB]
[--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
Displays queries ran on Hive.
optional arguments:
-h, --help show this help message and exit
--since SINCE how far to look back. (default: 15m)
--to TO How far to look forward. (default: now)
--logdir LOGDIR Directory of hive log files. (default: /var/log/hive)
--glob LOGFILE_GLOB Shell pattern of hive logfiles inside their logdir.
(default: hiveserver2.log*)
--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}, -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Log level. (default: warn)
Sample output:
Started at 2017-06-22 05:30:58 for 12.788000s by hive on ip-10-0-0-10.eu-west-2.compute.internal (Probably success). (Thread id: 79733, query id: hive_20170622053058_676612af-7bb8-4c4b-8fce-51bd1ae7be71, txn id: 0):
SELECT
id,
count(*)
FROM
raw.event
GROUP BY
1
ORDER BY -- required for next step
sys_partition
Started at 2017-06-22 05:31:25 for 0.018000s by Unknown on Unknown (Probably success). (Thread id: 79770, query id: hive_20170622053125_7d8e644a-5c23-4ca8-ab0f-20becdd65c3b, txn id: Unknown):
use events
Started at 2017-06-22 05:31:25 for Unknowns by Unknown on Unknown (FAILED). (Thread id: handler-46, query id: Unknown, txn id: Unknown):
MERGE INTO mart.click dst
USING (
SELECT
[big sql...]
) as r
FROM
raw.click
WHERE
${SEQ_CHECKER_SQL}
) src
ON
[big sql...]
WHEN NOT MATCHED THEN INSERT VALUES (
[more sql]
)
Error: ParseException line 36:4 cannot recognize input near '$' '{' 'SEQ_CHECKER_SQL' in expression specification
As you can see:
- If user, hostname and duration are know they are displayed,
- query is displayed with the same formatting as it was sent, inclusive comments,
- error (if any) is showed. In my case, a variable is not expanded by the ETL tool.
You can find the source on github.
Loved reading thiss thanks