Long story short: the location of a hive managed table is just metadata, if you update it hive will not find its data anymore. You do need to physically move the data on hdfs yourself.
Short story long:
You can decide where on hdfs you put the data of a table, for a managed table:
create table if not exists tstloc (id bigint) clustered by (id) into 4 buckets stored as orc location 'hdfs:///tmp/ttslocorig' tblproperties ("transactional"="true"); insert into tstloc values(1); select * from tstloc;
Now if you want to move this table to another location for any reason, you might run the following statement:
alter table tstloc set location 'hdfs:///tmp/ttslocnew';
But then the table is empty!
select * from tstloc;
will return an empty set. The reason is that the location property is only metadata, telling hive where to look without any effect on said location (except at creation time, where the location will be created if it does not exist for managed tables). If nothing happens to be there, hive will not return anything. Conversely, if it happens to be something, hive will return this something.
To get your data back, you just need to physically move the data on hdfs at the expected location:
hdfs dfs -mv /tmp/ttslocorig /tmp/ttslocnew
For partitioned tables it’s more involved. You also need to relocate every partition to point at the new folder structure, i.e.
alter table tstloc partition () set location ‘hdfs:///tmp/ttslocnew/’
… and so on for each partition
A massive pain if you have many partitions but you can build a script to generate the alter table statements from metadata if you have access to it (sys.tbls, sys.partitions)
Cheers,
Dave.
woohoo thanks Dave! Pulling my hair out over this one.
Yep, a pain – my table has 72 [nested] partitions, but this worked!