Hive: add a column pitfalls

Adding a column to an existing table is easy:


ALTER TABLE tbl ADD COLUMNS (new_col TIMESTAMP)

Easy right? Not always.

As the doc says,

The column change command will only modify Hive’s metadata, and will not modify data. Users should make sure the actual data layout of the table/partition conforms with the metadata definition.

What this means is that this command will change the table metadata, but not the partition metadata and this column will appear as NULL in select queries.

The solution is then easy, just add the CASCADE keyword:


ALTER TABLE tbl ADD COLUMNS (new_col TIMESTAMP) CASCADE

Then partitions will be updated as well.

Easy right? Not always.

If you run this command, with CASCADE, on a table without partition, you will end up with this non-descriptive error:

Error: Error while compiling statement: FAILED: NullPointerException null (state=42000,code=40000)

In short:

  • if you have a partitioned table you must use CASCADE.
  • if you do not have partitions, you must not use CASCADE.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s