Automated Hadoop setup with ambari

Hadoop is complex to configure, this is not new. Cluster managers like ambari help, of course, but finding the sweet configuration spot is not easy, especially as the spot will be different per use case.

My cluster is almost yarn/tez only currently, so I wrote this python script which will look at the whole cluster via the ambari API and configure a good chunk of it, depending on the number of disks, CPU, ram and based on documentation I found scattered all around. It works great even on small clusters and will tell you the reason behind the values of the configuration settings.

The caveats are that LLAP is not yet managed (it was attempted so the option is there, but it does not do anything) and that it assumes that all datanodes are identical.

The default run mode is read-only but you can ask the script to actually update ambari. You will still read to restart the relevant updated services yourself, on your own time (this is of course just one click in the ambari UI).

An example showing only what my script thinks is not correct:

./settings.py --tofix hadoop.example.com

Basic info

Yarn config.
✘ yarn-site/yarn.scheduler.minimum-allocation-mb = 768, expects 1024 (Min container size.) #75%
✘ capacity-scheduler/yarn.scheduler.capacity.resource-calculator = org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, expects org.apache.hadoop.yarn.util.resource.DominantResourceCalculator (Take all resources in account, not only RAM) #0%

A full run:

./settings.py hadoop.example.com

Basic info
FYI – { ‘ip-10-0-0-001.eu-west-1.compute.internal’: {‘cpu’: 8, ‘mem’: 33566638080},
‘ip-10-0-0-002.eu-west-1.compute.internal’: {‘cpu’: 8, ‘mem’: 33566638080},
‘ip-10-0-0-003.eu-west-1.compute.internal’: {‘cpu’: 8, ‘mem’: 33566638080},
‘ip-10-0-0-004.eu-west-1.compute.internal’: {‘cpu’: 8, ‘mem’: 33566638080},
‘ip-10-0-0-005.eu-west-1.compute.internal’: {‘cpu’: 8, ‘mem’: 33566638080},
‘ip-10-0-0-006.eu-west-1.compute.internal’: {‘cpu’: 8, ‘mem’: 33566633984},
‘ip-10-0-0-007eu-west-1.compute.internal’: {‘cpu’: 8, ‘mem’: 33566638080},
‘ip-10-0-0-008.eu-west-1.compute.internal’: {‘cpu’: 8, ‘mem’: 33566638080}}: Data nodes.
FYI – {‘cpu’: 64, ‘disk’: 104, ‘mem’: 268533100544}: Total cluster resources.
FYI – 1024: Min container size (MB), based on amount of ram/cpu in the cluster.
FYI – 128: Number of containers based on recommendations.
FYI – 0.55: Default queue capacity.

Yarn config.
✔ yarn-site/yarn.nodemanager.resource.memory-mb = 24008, expects 24008 (min(yarn memory for one DN) * 0.75.) #100%
✘ yarn-site/yarn.scheduler.minimum-allocation-mb = 768, expects 1024 (Min container size.) #75%
✔ yarn-site/yarn.scheduler.maximum-allocation-mb = 24008, expects 24008 (Same as yarn.nodemanager.resource.memory-mb) #100%
✔ yarn-site/yarn.nodemanager.resource.cpu-vcores = 7, expects 7 (Assuming the cluster in yarn only. Total cores per node -1) #100%
✔ yarn-site/yarn.scheduler.maximum-allocation-vcores = 7, expects 7 (Assuming the cluster in yarn only. Total cores per node -1) #100%
✘ capacity-scheduler/yarn.scheduler.capacity.resource-calculator = org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, expects org.apache.hadoop.yarn.util.resource.DominantResourceCalculator (Take all resources in account, not only RAM) #0%
Map/reduce config
✔ mapred-site/mapreduce.map.memory.mb = 1024, expects 1024 (Min container size) #100%
✔ mapred-site/mapreduce.reduce.memory.mb = 2048, expects 2048 (2 * min container size) #100%
✔ mapred-site/mapreduce.map.java.opts = 819, expects 819 (0.8 * min container size) #100%
✔ mapred-site/mapreduce.reduce.java.opts = 1638, expects 1638 (0.8 * mapreduce.reduce.memory.mb) #100%
✔ mapred-site/yarn.app.mapreduce.am.resource.mb = 2048, expects 2048 (2 * min container size) #100%
✔ mapred-site/yarn.app.mapreduce.am.command-opts = 1638, expects 1638 (0.8 * yarn.app.mapreduce.am.resource.mb) #100%
✔ mapred-site/mapreduce.task.io.sort.mb = 409, expects 409 (0.4 * min container size) #100%

Hive and Tez configuration
✔ hive-site/hive.execution.engine = tez, expects tez (Use Tez, not map/reduce.) #100%
✔ hive-site/hive.server2.enable.doAs = false, expects false (All queries will run as Hive user, allowing resource sharing/reuse.) #100%
✔ hive-site/hive.optimize.index.filter = true, expects true (This optimizes “select statement with where clause” on ORC tables) #100%
✔ hive-site/hive.fetch.task.conversion = more, expects more (This optimizes “select statement with limit clause;”) #100%
✔ hive-site/hive.compute.query.using.stats = true, expects true (This optimizes “select count (1) from table;” ) #100%
✔ hive-site/hive.vectorized.execution.enabled = true, expects true (Perform operations in batch instead of single row) #100%
✔ hive-site/hive.vectorized.execution.reduce.enabled = true, expects true (Perform operations in batch instead of single row) #100%
✔ hive-site/hive.cbo.enable = true, expects true (Enable CBO. You still need to prepare it by using the analyse HQL command.) #100%
✔ hive-site/hive.compute.query.using.stats = true, expects true (Use CBO.) #100%
✔ hive-site/hive.stats.fetch.column.stats = true, expects true (Use CBO.) #100%
✔ hive-site/hive.stats.fetch.partition.stats = true, expects true (Use CBO.) #100%
✔ hive-site/hive.stats.autogather = true, expects true (Use CBO.) #100%
✔ hive-site/hive.server2.tez.default.queues = default, expects ‘lambda x: config.queue in x’ (Must contain the queue name) #100%
✔ hive-site/hive.tez.dynamic.partition.pruning = true, expects true (Make sure tez can prune whole partitions) #100%
✔ hive-site/hive.exec.parallel = true, expects true (Can Hive subqueries be executed in parallel) #100%
✔ hive-site/hive.auto.convert.join = true, expects true (use map joins as much as possible) #100%
✔ hive-site/hive.auto.convert.join.noconditionaltask = true, expects true (Use map joins for small datasets) #100%
✔ hive-site/hive.tez.container.size = 4096, expects 4096 (Multiple of min container size.) #100%
✔ hive-site/hive.auto.convert.join.noconditionaltask.size = 1417339207, expects 1417339207 (Threshold to perform map join. 1/3 * hive.tez.container.size.) #100%
✔ hive-site/hive.vectorized.groupby.maxentries = 10240, expects 10240 (Reduces execution time on small datasets, but also OK for large ones.) #100%
✔ hive-site/hive.vectorized.groupby.flush.percent = 0.1, expects 0.1 (Reduces execution time on small datasets, but also OK for large ones.) #100%
✔ hive-site/hive.server2.tez.initialize.default.sessions = true, expects true (Enable tez use without session pool if requested) #100%
✔ hive-site/hive.server2.tez.sessions.per.default.queue = 3, expects 3 (Number of parallel execution inside one queue.) #100%

Hive and Tez memory
✔ tez-site/tez.am.resource.memory.mb = 1024, expects 1024 (Appmaster memory == min container size.) #100%
✔ tez-site/tez.am.container.reuse.enabled = true, expects true (Reuse tez containers to prevent reallocation.) #100%
✔ tez-site/tez.container.max.java.heap.fraction = 0.8, expects 0.8 (default % of memory used for java opts) #100%
✔ tez-site/tez.runtime.io.sort.mb = 1024, expects 1024 (memory when the output needs to be sorted. == 0.25 * tezContainerSize (up to 40%)) #100%
✔ tez-site/tez.runtime.unordered.output.buffer.size-mb = 307, expects 307 (Memory when the output does not need to be sorted. 0.075 * hive.tez.container.size (up to 10%).) #100%
✔ tez-site/tez.task.resource.memory.mb = 1024, expects 1024 (Mem to be used by launched taks. == min container size. Overriden by hive to hive.tez.container.size anyway.) #100%
✔ tez-site/tez.task.launch.cmd-opts = 819, expects 819 (xmx = 0.8 * minContainerSize) #100%
✔ hive-site/hive.tez.java.opts = 3276, expects 3276 (xmx = 0.8 * tezContainerSize) #100%
✔ hive-site/hive.prewarm.enabled = true, expects true (Enable prewarm to reduce latency) #100%
✔ hive-site/hive.prewarm.numcontainers = 3, expects ‘lambda x: x >= 1’ (Hold containers to reduce latency, >= 1) #100%
✔ tez-site/tez.session.am.dag.submit.timeout.secs = 300, expects 300 (Tez Application Master waits for a DAG to be submitted before shutting down. Only useful when reuse is enabled.) #100%
✔ tez-site/tez.am.container.idle.release-timeout-min.millis = 10000, expects 10000 (Tez container min wait before shutting down. Should give enough time to an app to send the next query) #100%
✔ tez-site/tez.am.container.idle.release-timeout-max.millis = 20000, expects 20000 (Tez container min wait before shutting down) #100%
✔ tez-site/tez.am.view-acls = *, expects * (Enable tz ui access) #100%
✔ yarn-site/yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes = org.apache.tez.dag.history.logging.ats.TimelineCachePluginImpl, expects org.apache.tez.dag.history.logging.ats.TimelineCachePluginImpl (Set up tez UI) #100%
✔ mapred-site/mapreduce.job.acl-view-job = *, expects * (Enable tez ui for mapred jobs) #100%

Compress all
✔ mapred-site/mapreduce.map.output.compress = true, expects true #100%
✔ mapred-site/mapreduce.output.fileoutputformat.compress = true, expects true #100%
✔ hive-site/hive.exec.compress.intermediate = true, expects true #100%
✔ hive-site/hive.exec.compress.output = true, expects true #100%

Queue configuration. Assuming queue default is subqueue from root. Note that undefined values are inherited from parent.
✔ capacity-scheduler/yarn.scheduler.capacity.root.default.maximum-am-resource-percent = 0.2, expects ‘lambda x: x != ‘NOT FOUND’ and float(x) >= 0.2′ (How much of the Q the AM can use. Must be at least 0.2.) #100%
✔ capacity-scheduler/yarn.scheduler.capacity.root.default.ordering-policy = fair, expects fair (Helps small queries get a chunk of time between big ones) #100%
✔ capacity-scheduler/yarn.scheduler.capacity.root.default.user-limit-factor = 2, expects ‘lambda x: x != ‘NOT FOUND’ and int(x) >= 1′ (How much of the Q capacity the user can exceed if enough resources. Should be at leat 1. 1=100%, 2=200%…) #100%
✔ capacity-scheduler/yarn.scheduler.capacity.root.default.minimum-user-limit-percent = 10, expects ‘lambda x: x != ‘NOT FOUND’ and int(x) >= 10′ (How much of the Q in percent a user is guaranteed to get. Should be at least 10) #100%

Random stuff
✔ hdfs-site/dfs.client.use.datanode.hostname = true, expects true (For AWS only) #100%

LLAP
✔ hive-interactive-env/enable_hive_interactive = false, expects false (Disable LLAP) #100%
FYI –

More doc can be found at:
Memory settings:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_command-line-installation/content/determine-hdp-memory-config.html
https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html
Hive performance tuning:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_hive-performance-tuning/content/ch_hive-perf-tuning-intro.html
http://pivotalhd.docs.pivotal.io/docs/performance-tuning-guide.html
https://www.justanalytics.com/blog/hive-tez-query-optimization
llap:
https://community.hortonworks.com/questions/84636/llap-not-using-io-cache.html

Will not update the unexpected parameters without –update.