Configuring Hadoop Cluster using Ansible

Inshiya Nalawala
2 min readMar 24, 2021

Today, we will discuss how can you configure a Hadoop cluster using Ansible.

So, let’s get started.

The Setup: I have two nodes, one of which is the controller node, and the other is the target node. These nodes are Linux-based RHEL systems. I have configured local DNS using the /etc/hosts file by mentioning the DNS name and the IP address of the node. The controller node is called the master, and the target node is called the node1.

I also have configured password-less login via SSH from the controller node to all the target nodes. This configuration simplifies editing the inventory file.

The inventory: Here, I will use the [hadoop] group name in the playbook. The controller node master is set up as the Hadoop master, while the other node node1 will serve as the Hadoop slave.

Assumptions: The playbook was written assuming that the controller node already has the rpm files for jdk-8u171-linux-x64.rpm and hadoop-1.2.1–1.x86_64.rpm, copied in the /root directory.

So, let’s get started.

Step 1: Create a directory where the software is to be installed.

Step 2: Copy the rpm files in the directory created in step 1

Step 3: Install the software (jdk and Hadoop)

Step 4: Create a directory for cluster management and storing cluster files

Step 5: Edit the Configuration files.

Step 6: Format the namenode

Step 7: Start the cluster

--

--