Timezones are a pain. This is not new and every time you deviate from UTC this will bite you. That said sometimes you have to deviate from UTC, especially for the final display of a date if you want to show it in the local timezone from the reader. In that case, adding an offset to be explicit will save some questions and uncertainty down the line.
There is no function get_offset_from_tz() in Hive, sadly. Using reflect() does not work either as the method call is to complex for reflect. Writing a UDF would be possible but feels overkill.
The solution I give here works in Hive and should probably work in all SQL variants as well apart from the variables.
The algorithm to find the offset is easy:
- get the time in UTC,
- get the same in another timezone,
- subtract one from the other to get the offset,
- format the offset in a standard way.
The main issue is that you cannot assign results to variables in SQL, meaning that many computations need to be duplicated. They will be optimised away, of course, but they make for an ugly code.
In hive, luckily, you can use variables. They cannot store results but are used as-is, a bit like macros, where the variable name is just replaced by its content which can be some piece of code.
This sets up the date to find the offset for as well as a few TZ for test.
-- Date to display. If you use this from a table you can -- put here the column that would be used, eg. t.logdate. set hivevar:D='2018-06-01 01:02:02'; -- A few tests: -- positive offset +02:00 (in summer) set hivevar:DISPLAY_TZ='Europe/Amsterdam'; -- negative offset -04:00 (in summer) set hivevar:DISPLAY_TZ='America/New_York'; -- 0 offset set hivevar:DISPLAY_TZ='UTC'; -- Non integer offset: +09:30 set hivevar:DISPLAY_TZ='Australia/Adelaide';
Those are the macros
-- Date displayed in the right TZ set hivevar:dateintz=DATE_FORMAT(FROM_UTC_TIMESTAMP(${D}, ${DISPLAY_TZ}),"yyyy-MM-dd HH:mm:ss"); -- Offset in interval type set hivevar:delta=cast(${dateintz} as timestamp) - cast(${D} as timestamp);
And the code itself, tiny and readable once variables are used:
select concat( -- date in TZ ${dateintz} -- sign , if(${delta} < interval '0' minute, '-', '+') -- hour , lpad(abs(hour(${delta})), 2, 0) , ':' -- minute ,lpad(minute(${delta}), 2, 0) ) as dtwithoffset ;
et voilĂ .