If you have a new node and want to add it to a running Hadoop cluster; how do you do to get things down properly without restarting entire cluster?

Here are steps you want to do from scratch, suppose the new node is called hadoop-newdatanode

Step 1. Install Java

Skip this step if your servers already have java installed, but make sure java version is compatible

# java -version 

java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

Step 2. Create User Account

Create a system user account on both master and slave systems to use for hadoop installation

# useradd hadoop
# passwd hadoop
Changing password for user hadoop. New password: Retype new password: passwd: all authentication tokens updated successfully.

Step 3: Add FQDN Mapping

Skip this step if your working environment has centralized hostname mapping.

Edit /etc/hosts file on all master and slave servers and add following entries.

# vi /etc/hosts
192.168.1.1 hadoop-namenode 192.168.1.2 hadoop-datanode-1 192.168.1.3 hadoop-datanode-2
192.168.1.4 hadoop-newdatanode

Step 4. Configuring SSH key pair login

Hadoop framework itself doesn't need ssh, the administration tools like start.dfs.sh and stop.dfs.sh etc.. need it to start/stop various daemons. Thus, ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

Use following commands to configure auto login between all hadoop cluster servers..

# su - hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop-newdatanode
$ chmod 0600 ~/.ssh/authorized_keys
$ exit

Step 5: Copy Hadoop folder to Slave Servers

After updating above configuration, we need to copy the source files to all slaves servers.

# su - hadoop
$ rsync -auvx /home/hadoop/.bashrc hadoop-newdatanode:/home/hadoop/
$ rsync -auvx $HADOOP_HOME hadoop-newdatanode:$HADOOP_HOME

Step 6: Configure Hadoop on namenode Server Only

Go to hadoop folder on hadoop-namenode and do following settings.

# su - hadoop
$ cd $HADOOP_HOME/etc/hadoop
$ vi slaves
hadoop-datanode-1 hadoop-datanode-2
hadoop-newdatanode

Step 7: Configure Hadoop on new datanode server Only

On new datanode, you may want to specify each node data directory, data nodes can have multiple data directories dfs.data.dir

edit hdfs-site.xml, change dfs.data.dir value accordingly

# vi hdfs-site.xml
# Add the following inside the configuration tag <property> <name>dfs.data.dir</name> <value>/opt/hadoop/dfs/name/data</value> <final>true</final> </property>

Step 8: Start datanode Hadoop Services

Use the following command to start all hadoop services on new datanode

On namenode, run

$start-dfs.sh

Or, on new datanode, run

$ hadoop-daemon.sh start datanode

Step 9: Check Hadoop cluster status

On any of hadoop cluster node, run

hdfs dfsadmin -report