Easy management of Hadoop Clusters in Azure with Cloudbreak
Did you ever thought about automating the deployment process of Hadoop clusters to the cloud? - We did!
Due to a new project we needed to deploy a new HDP 2.3 (Hortonworks Data Platform) environment to Microsoft Azure, so we decided to work with Cloudbreak.
Cloudbreak delivers an easy to use Web UI that allows you to create a HDP cluster based on an Ambari blueprint and deploy it to the major cloud providers (Microsoft, Amazon, Google, OpenStack).
It also provides an auto-scaling feature that automatically sizes your cluster based on usage or time metrics so you can use your cluster very efficient.
The company behind Cloudbreak, SquenceIQ (recently acquired by HortonWorks) hosts an open Cloudbreak-deployer portal for everyone.
We set up our own Cloudbreak server in Azure.
So let’s have a look on the installation process step by step:
Because you’ll need a x.509 certificate later on for Azure deployment, you can also use this for the Cloudbreak Host. So I used an existing Linux VM to create a certificate with OpenSSH:
- openssl req x509 nodes days 365 newkey rsa:2048
- keyout my_azure_private.key out my_azure_cert.pem
I deployed a Cent OS 7.1 Machine from Azure Marketplace as Host for our Cloudbreak Server.
For SSH authentication I used the x.509 Certificate I just created.
After deployment you can connect to the machine over SSH.
- putty user@cloudbreak_host_url -i generated_cert.ppk
In case you use Putty you need to convert the private key using Puttygen. Putty will not work with the generated one. (Conversion -> Import Key -> Save private key)
Now we need to install Docker as it is required for Cloudbreak.
First we update yum and install Docker:
- sudo yum update
- curl -sSL https://get.docker.com/ | sh
After installation start the docker service:
- sudo service docker start
I had trouble starting the service and got it work after installing docker-selinux:
- yum install docker-selinux
Verify docker is running and make docker start on machine start:
- sudo docker run hello-world
- sudo chkconfig docker on
If not installed - install wget and unzip:
- yum install wget
- yum install unzip
Now create a directory for your cloudbreak installation. Then download Cloudbreak Deployer into this directory and unzip the package:
- mkdir cloudbreak
- wget http://publicrepo1.hortonworks.com/HDP/cloudbreak/cloudbreakdeployer_1.0.0_Linux_x86_64.tgz
- tar xvf cloudbreakdeployer_1.0.0_Linux_x86_64.tgz
Next copy cbd to your bin folder and run cbd init from the cloudbreak directory:
- sudo cp cbd / /usr/local/bin
- cbd init
After initialization you can create a profile with your public server address and a custom user:
- echo export PUBLIC_IP=cloudbreakservername.cloudapp.net > Profile
- echo export UAA_DEFAULT_USER_EMAIL=E-Mail Address >> Profile
- echo export UAA_DEFAULT_USER_PW=Password >> Profile
To finish the installation, generate cloudbreak-config and start the server:
- cbd generate cbd start
Work with Cloudbreak
After you have installed the cloudbreak deployer visit the URL (http://yourcloudbreakurl:3000) and login with the credentials you provided in the created profile.
Now you need to create credentials for your cloud provider.
To do that use the certificate you generated for installation. Click on “create credentials” and open the created .pem file and copy it’s text into "SSH Certificate".
After successful creation you can download a certificate wich is used to allow cloudbreak to manage your azure subscription:
The downloaded certificate can be uploaded to Azure: Settings –> Manage Certificates (The certificate name depends on exporting cloudbreak user.)
The next step is to create a blueprint. The blueprint defines wich service will be installed on wich node in our cluster by Ambari.
There are three predefined blueprints. We used the HDP-small-default (Copy the JSON and edit it in a XML Editor) and added spark to it.
Now create resources that you want to use in your cluster.
F.e. we create a datanode resource we want to use for all datanodes.
After you created all needed resources you could also create a custom network and security group. We used the default ones. Now select the created credential and create a cluster:
Set a Cluster Name and choose the created blueprint and map the created resources to the cluster groups. To change the default ambari username and password you can use the advanced options.
Now “Create and Start” the cluster. (Make sure that the time on the Cloudbreak host is correct) You will see the current state of the create process in the event history. After successful creation you will see the ambari ip and the current cluster state. The cluster can now be managed from our cloudbreak portal. You can also activate auto-scaling policies to make your cluster elastic. To connect to the cluster create a connection via SSH to the client node. As the cluster runs in docker you have to connect to the ambari-agent docker container.
- sudo -su
Show Docker Container
- docker ps (note container id of ambari agent)
Login to ambari shell:
- docker exec -it ContainerID bash
If cloudbreak thinks your cluster is in a bad state you maybe have to help yourself by update the cloudbreak database.
- docker exec -it cbreak_cbdb_1 bash
- psql -U postgres
- > update stack set status='AVAILABLE' where name like 'cluster-name';
- > update cluster set status='AVAILABLE' where name like 'cluster-name';