Hive and integer overflows

I was playing around in Hive (3, from HDP) with big integers when I noticed something weird:

select 9223372036854775807 * 9223372036854775807;
-- 1

select 9223372036854775807 + 1;
-- -9223372036854775808

It turns out that Hive silently overflows integers. This comes from java, which does the same.

It’s lucky I noticed, it could have been very painful for me down the line. The workaround is to use big decimal. As the doc says:

Integral literals larger than BIGINT must be handled with Decimal(38,0). The Postfix BD is required.

For instance:

select 9223372036854775807BD * 9223372036854775807BD;
-- 85070591730234615847396907784232501249

select 9223372036854775807BD + 1;
-- 9223372036854775808

But it is still somewhat weird. Overflows with big decimals won’t error out but will return null:

select 9223372036854775807BD
  * 9223372036854775807BD
  * 9223372036854775807BD
;
-- NULL

Furthermore, if the precision is not 0 some behaviours are not consistent:

create temporary table dec_precision(
  d decimal(38, 18)
);
insert into dec_precision values
    (98765432109876543210.12345)
  , (98765432109876543210.12345)
;

select sum(d) from dec_precision;
-- NULL (but why?)
select sum(98765432109876543210.12345BD) from dec_precision;
-- 197530864219753086420.24690 (as expected)
select 98765432109876543210.12345BD + 98765432109876543210.12345BD;
-- 197530864219753086420.24690 (as expected)

Conversely, Mysql

select 9223372036854775807 * 9223372036854775807;
-- ERROR 1690 (22003): BIGINT value is out of range in '9223372036854775807 * 9223372036854775807'

or Postgres

select 2147483647 * 2147483647;
ERROR:  integer out of range

are a lot safer and friendlier in that regard.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s