Configuring Hadoop Cluster using Ansible
Today, we will discuss how can you configure a Hadoop cluster using Ansible.
So, let’s get started.
The Setup: I have two nodes, one of which is the controller node, and the other is the target node. These nodes are Linux-based RHEL systems. I have configured local DNS using the /etc/hosts file by mentioning the DNS name and the IP address of the node. The controller node is called the master, and the target node is called the node1.
I also have configured password-less login via SSH from the controller node to all the target nodes. This configuration simplifies editing the inventory file.
The inventory: Here, I will use the [hadoop] group name in the playbook. The controller node master is set up as the Hadoop master, while the other node node1 will serve as the Hadoop slave.
Assumptions: The playbook was written assuming that the controller node already has the rpm files for jdk-8u171-linux-x64.rpm and hadoop-1.2.1–1.x86_64.rpm, copied in the /root directory.
So, let’s get started.
Step 1: Create a directory where the software is to be installed.
Step 2: Copy the rpm files in the directory created in step 1
Step 3: Install the software (jdk and Hadoop)
Step 4: Create a directory for cluster management and storing cluster files
Step 5: Edit the Configuration files.
Step 6: Format the namenode
Step 7: Start the cluster