Hadoop Cluster setup using Ansible 🎯

👉Our Aim : Configure Hadoop and start cluster services using Ansible Playbook…..

🔑What is an Ansible ?

Ansible is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows.

🔑What is Ansible PlayBook ?

An Ansible® playbook is a blueprint of automation tasks — which are complex IT actions executed with limited or no human involvement. Ansible playbooks are executed on a set, group, or classification of hosts, which together make up an Ansible inventory.

🔑What is Hadoop ?

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

Let’s have hands on the task now …..

We will be performing this task in RHEL8 booted on Virtual Box……

🟥Let’s first install Ansible.

Ansible is built in the top of Python …..Thus we will using pip to install Ansible

#pip install ansible

🟥Now we will be writing the Ansible Playbook , Excited!!!

👉Before Commencing ,jot down what is to be achieved…

1️⃣ Mounting the DVD on RHEL8

2️⃣ Configuring Yum Repo

3️⃣ JDK Installation for Hadoop

4️⃣ Hadoop Installation

5️⃣ NameNode Configuration

6️⃣ DataNode Configuration

7️⃣ Starting the Hadoop Daemon service

📀Lets see how we write yaml code for mounting DVD :(Namenode as well as Datanode)

- name : Mounting DVD
mount :
src : "/dev/cdrom"
path : "/dvd"
state : mounted
fstype : "iso9660"

🎨Configuring Yum Repo :(Namenode as well as Datanode)

- name : Local Repo    
yum_repository :
name : Repo1
description : "Local repo 1"
baseurl : "/dev/AppStream"
gpgcheck : 0
- name : Local Repo setup
yum_repository :
name : Repo2
description : "Local repo 2"
baseurl : "/dev/BaseOS"
gpgcheck : 0

👉Installing JDK (JAVA) : (Namenode as well as Datanode)

- name : "Checking whether jdk exists or not"
tags : jdk
command : "rpm -q java | grep jdk1.8-2000.1.8.0_171-fcs.x86_64 | grep -v grep"
changed_when : false
ignore_errors : yes
register : java_install_status
- name : "JDK installation"
tags : jdk
command : "rpm -ivh jdk-8u171-linux-x64.rpm"
ignore_errors : yes
when : java_install_status.rc == 1

👉Installing Hadoop :(Namenode as well as Datanode):

- name : "Checking whether hadoop exists or not"
tags : hadoop
command : "rpm -q hadoop | grep hadoop-1.2.1-1.x86_64 | grep -v grep"
changed_when : false
ignore_errors : yes
register : hadoop_status
- name : "Hadoop installation"
tags : hadoop
command : "rpm -ivh hadoop-1.2.1-1.x86_64.rpm --force"
ignore_errors: yes
when : hadoop_status.rc == 1

👉Stopping firewall for connection purpose : (Namenode as well as Datanode)

- name : stop firewalld
shell : "systemctl stop firewalld"
ignore_errors : yes

🛠Name Node Configuration Hadoop :

👉 Creating directory in NameNode :

- hosts: namenode
gather_facts : no
- name : namedir
private : no
prompt : "Enter namenode dir name e.g. /{dir}"
- name : Creating namenode dir
file :
state : directory
path : "{{ namedir }}"
ignore_errors: True
register : directory

👉 Configuring hdfs-site.xml in /etc/hadoop/

- name : configuring hdfs-site.xml
path : "/etc/hadoop/hdfs-site.xml"
insertafter : "<configuration>"
line : "<property>\n\t <name>dfs.name.dir</name>\n
\t <value>{{ namedir }}</value>\n

👉Configuring core-site.xml in /etc/hadoop/

- name : configuring core-site.xml
path: "/etc/hadoop/core-site.xml"
insertafter : "<configuration>"
line : "<property>\n\t <name>fs.default.name</name>\n
\t <value>hdfs://</value>\n

👉Formatting NameNode directory :

- name: formatting namenode dir   
command : "echo Y | hadoop namenode -format -force"
ignore_errors : yes
when : directory.changed == true

👉 Starting hadoop-daemon namenode service :

- name : "Hadoop daemon service status check!"
tags : namenode
shell : "jps | grep NameNode | grep -v grep"
ignore_errors : yes
register : namenode_status
- name: starting hadoop-daemon namenode service
tags : namenode
ignore_errors: yes
shell : "hadoop-daemon.sh start namenode"
when : namenode_status.rc == 1

🛠Data Node Configuration :

👉 Creating DataNode directory :

- hosts: datanode
gather_facts : no
- name: datadir
private : no
prompt : "Enter datanode dir name e.g. /{dir}"
tasks :
- name: "Creating datanode dir"
state : directory
path : "{{ datadir }}"
ignore_errors : True

👉 Configuring hdfs-site.xml :

- name : configuring hdfs-site.xml
path : "/etc/hadoop/hdfs-site.xml"
insertafter : "<configuration>"
line : "<property>\n
\t <name>dfs.data.dir</name>\n
\t <value>{{ datadir }}</value>\n

👉Configuring core-site.xml

- name : configuring core-site.xml
path: "/etc/hadoop/core-site.xml"
insertafter : "<configuration>"
line : "<property>\n\t
<value>hdfs://{{ groups['namenode'][0] }}:9001</value>

👉Starting the DataNode :

- name : checking datanode daemon status
tags : datanode
shell : "jps | grep DataNode | grep -v grep"
ignore_errors : yes
register : datanode_status
- name : starting data node daemon
tags : datanode
command : "hadoop-daemon.sh start datanode"
ignore_errors: yes
when : datanode_status.rc == 1

Here’s the github link for the playbook : Hadoop_Ansible

📚Lets see how its working now :

TO check syntax is fine or not : 
#ansible-playbook --syntax-check <playbook>.yml
Running playbook using
To run playbook : ansible-playbook <playbook>.yml
JDK installation
Hadoop Installation and Namenode directory creation
Formatting name node dir and starting services
Data node Configuration

Bravo ! : NameNode is started and 1 DataNode is connected to it..

To check if namenode launched ?
TO list datanodes connected :
#hadoop dfsadmin -report

Also the DataNode is configured and launched :

Data Node

WEB UI View :

In your browser : TYPE

That’s how we succeeded in configuring the Hadoop Cluster using Ansible…..

Ansible has made our task easy…Rather than configuring the cluster manually , automating it had made our task easy and in less time we can configure as many as desired clusters…

Keep Sharing ……🤗

Happy Learning …🧮

Technological Enthusiast , Like to express what is need of time, Relates real world to philosophical insights