This is a tutorial after my own experience to install CDH 5.4 via the Cloudera Manager on one machine only for test purposes. This is based on a Mint machine (based on Ubuntu/Debian). Commands will thus be given with apt-get, you can probably just replace apt-get by yum if you are trying to do this on a Redhat-based server.
Preparation
ssh
Install ssh server on your machine:
apt-get install openssh-server
Make sure you can connect as root if you do no want everything to run under one user, which is a question which will be asked during the installation process (screen 3). Running all under one user is nice for a one-machine test, but I believe you might run into issues if you later want to extend your cluster. For this reason I chose the normal, multi user (hdfs, hadoop and so on) installation. Cloudera actually gives a warning for the single user installation:
The major benefit of this option is that the Agent does not run as root. However, this mode complicates installation, which is described fully in the documentation. Most notably, directories which in the regular mode are created automatically by the Agent, must be created manually on every host with appropriate permissions, and sudo (or equivalent) access must be set up for the configured user.
On my machine, I for instance needed to update /etc/ssh/sshd_config to have the line :
PermitRootLogin yes
Other packages
For the heartbeat, you need supervisor and the command ntpdc:
apt-get install supervisor ntp
Supported platforms
Officially, Cloudera can install on some versions of Debian or Ubuntu. If you use a derivative, it might work (YMMV), but Cloudera will refuse to install. You can fool the installer by changing the lsb-release file:
sudo mv /etc/lsb-release /etc/lsb-release.orig sudo ln -s /etc/upstream-release/lsb-release /etc/lsb-release # After installation you can revert with: sudo rm /etc/lsb-release sudo mv /etc/lsb-release.orig /etc/lsb-release
Installation
Follow the documentation from cloudera:
wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin chmod u+x cloudera-manager-installer.bin sudo ./cloudera-manager-installer.bin
Note that it will install the oracle JDK (1.7 for CDH 5.4.0), and postgres. A the end your browser should open and connect you to http://localhost:7180. Do not panic if the connection cannot be established at first. Try again in a minute or two, to give the servers enough time to properly startup. Note that if your machine is not very powerful, it can take 2 minutes. The username and password there are admin/admin.
Problems/Tips
IP address
Click a few times continue, and you will be asked to enter an IP address. As you are only testing on your machine, type yours, which you can find via hostname -I in your terminal. Make sure to use your real IP, not 127.0.0.1. The reason is that if later you extend your cluster with another node, and this node number 2 (n2) wants to access node number 1 (n1), it would try to access n1 via 127.0.0.1, which would of course point to n2 itself. This is a general good practice. As a host will be added to the cloudera manager if it heartbeats, a partial installation might make a ghost host (localhost) appear in ‘Currently Managed Host’. In that case, make sure they are not selected before carrying on.
Acquiring installation lock
If you are blocked on ‘Acquiring installation lock’. Click ‘Abort’, then:
rm -rf /tmp/scm_prepare* rm -f /tmp/.scm_prepare_node.lock # if above is not enough: service cloudera-scm-agent restart service cloudera-scm-server-db restart service cloudera-scm-server restart
and ‘retry failed host’
Full restart
If like me you screwed up everything, you can always uninstall everything (make sure to say yes when asked to delete the database files). Cloudera explains (parts of) what to do, but the violent and complete way is as follow, to do as root:
/usr/share/cmf/uninstall-cloudera-manager.sh # kill any PID listed by this ps below: ps aux | grep cloudera # this command does it automatically kill $(ps ax --format pid,command | grep cloudera | sed -r 's/^\s*([0-9]+).*$/\1/') # purge all cloudera packages apt-get purge cloudera-manager-server-db-2 cloudera-manager-server cloudera-manager-daemons cloudera-manager-agent # I am not so sure when this one is installed or not: apt-get purge cloudera-manager-repository # your choice, would clean up orphaned packages (postgres) apt-get autoremove # purge all droppings rm -rf /etc/cloudera* rm -rf /tmp/scm_prepare* rm -f /tmp/.scm_prepare_node.lock rm -rf /var/lib/cloudera* rm -rf /var/log/cloudera* rm -rf /usr/share/cmf rm -rf /var/cache/yum/cloudera* rm -rf /usr/lib/cmf
Could not connect to host monitor
After all is done with success everywhere, you go back to the home page and you see a lot of sad empty graphs with ‘query error’. This means that the management services are not running.
You can easily fix this by clicking on the top left ‘Add Cloudera Management Service’, and following the wizard from there.