Variables in Hive

I will here explain how to set and use variables in hive.

How to set a variable

Just use the keyword set

set foo=bar;
set system:foo=bar

Alternatively, for the hiveconf namespace you can set the variable on the command line:

beeline --hiveconf foo=bar

How to use a variable

Wherever you want to use a value, use this syntax instead: ${namespace:variable_name}.  if the namespace is hivevar, it can be ommited. For instance:

select '${hiveconf:foo}', '${system:foo}', '${env:CLASSPATH}', ${bar};

Note that variables will be replaced before anything else happens. This means that this is perfectly valid:

set hivevar:t=employees;
set hivevar:verb=desc;
${verb} ${t};

But this will not do what you expect (hint: you will end up with 4 quotes in your select statement):

set hivevar:s='Hello world';
select '${s}';

 

Furthermore, it means that you need to take care of your data type. As selecting a bare string is not valid, so is the following code invalid as well:

set hivevar:v=astring;
select ${v};

You will get:

Error: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference ‘astring’: (possible column names are: ) (state=42000,code=10004)

In our case, you just need to quote the variable.

Note that this would work as-is with an int as a bare int is valid in the select statement.

Another caveat is to make sure the variable exists, otherwise you will either get the variable literal for quoted variables:

select ‘${donotexists}’;
+————————–+–+
| _c0 |
+————————–+–+
| ${donotexists} |
+————————–+–+

Either an unhelpful message for unquoted variables:

> select ${doesnotexist};
Error: Error while compiling statement: FAILED: ParseException line 1:7 cannot recognize input near ‘$’ ‘{‘ ‘hiveconf’ in select clause (state=42000,code=40000)

If you only want to see the value of a variable, you can just use set as well:

set hiveconf:foo;

How to List variables

Just use SET;, but this will output a massive unreadable list. You are better off redirecting this output to a file, e.g.

beeline -e 'SET;' | sed 's/\s\+/ /g'> set.out

Note that I squash the spaces here. As the columns are aligned and some values are very long strings, squashing makes reading much easier.

Then if you want to see a specific set of variables, you can just run:

# system variables
grep '| system:' set.out

# Env variables
grep '| env:' set.out

# other variables
cat set.out | grep -v '| env:' | grep -v '| system:'

Namespaces

Hive has 4 namespaces for variables: hivevar, hiveconf, system and env.

Hivevar

Hivevar is the easiest namespace to use, as you do not need to explicitly mention it when using a variable.

set hivevar:foo=bar;
select "${foo}";

Hiveconf

Hiveconf is the namespace used when you use set without explicit namespace or when you give a variable on the command line with –hiveconf foo=bar. Note that you can set those without specifying the namespace, but you always need to specify the namespace when using them.

set foo=bar;
select "${hiveconf:foo}";

env

This is the namespace of the shell environment variables. You can easily get them with the ${env} prefix:

SELECT "${env:hostname}";

I specifically chose this variable. If you run this query yourself, you will see that it is the environment of the hive server which is used, not the environemnt of your client. This limits a lot the use of environment variables.

Note that environment variables cannot be set.

system

Those will contain for instance jvm settings, logfile destinations and more.

 

2 thoughts on “Variables in Hive

Leave a comment