Hive and ODBC confusions

Hive has no official ODBC drivers since version 3 at least. All Hadoop distributions (and Microsoft) distribute the ODBC driver from Simba. It works OK if you can use native queries (the driver passes the query as-is to Hive) or if you query is simple. Otherwise, the driver tries to be smart and fails miserably.

As I am neither a customer of Simba nor of Hortonworks, I cannot send a bug report. I asked on the Hortonworks community, but I feel quite isolated. I will share here a few of my experience, and hopefully, a good soul might pop by and tell me what I am doing wrong (or join me in whinging about this driver).

I should note that I cannot use native queries because I need to use parametrised statements, which are not available with native queries.

Parse Error

Syntax or semantic analysis error thrown in server while executing query. Error message from server: Error while compiling statement: FAILED: ParseException line 25:29 cannot recognize input near ‘?’ ‘and’ ‘s’ in expression specification

You will get that one a lot. Basically, on any error, this is what you will get, with the place of the error being your first question mark. I thought for a long time that the driver was completely borked, but actually no (just majorly, not completely). If you enable logging (LogLevel=4 and eg. LogPath=/tmp/hivelogs) in your obcinst.ini you will be able to see the inner error, which is a lot more informative.

unix_timestamp

Any query using unix_timestamp will give you

unix_timestamp is not a valid scalar function or procedure call

My guess is that the driver mixes up with unix_timestamp(), with no parameters, which is deprecated. As a workaround, you can cast your date as bigint, which works the same. I was proud of myself with this workaround, but look below (Cast) for the issues this causes.

CTE

They are the best thing in SQL with the analytics functions. The driver does not support them:

  syntax error near ‘with<<< ??? >>> init as (select ? as lic, ? as cpg) select * from init’.

The solution is, of course, to use subqueries instead.

‘floor’ is a reserved keyword.

Yes, I agree that it’s reserved, but because it’s an actual function. I should not have this error when I am using eg. floor(42) in a query.

This one surprises me because a simple query (select floor(42)) will succeed, whereas the same line use in a more complex query will give fail. I see from the logs that the driver shows the error but is somehow able to recover for simple queries, not for complex queries.

Cast does not only returns string

Casting to dates as bigint and taking a diff:

select cast(cast('2019-01-21 01:32:32' as timestamp) as bigint) - cast(cast('2019-02-21 01:32:32' as timestamp) as bigint) as tto

fails as well:

Operand types SQL_WCHAR and SQL_WCHAR are incompatible for the binary minus operator

Same as for floor, in some cases the driver recovers, sometimes not.

 

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s