If you have a new node and want to add it to a running Hadoop cluster; how do you do to get things down properly without restarting entire cluster?
Here are steps you want to do from scratch, suppose the new node is called hadoop-newdatanode
Step 1. Install Java
Skip this step if your servers already have java installed, but make sure java version is compatible
# java -version java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
Step 2. Create User Account
Create a system user account on both master and slave systems to use for hadoop installation
# useradd hadoop # passwd hadoop
Changing password for user hadoop. New password: Retype new password: passwd: all authentication tokens updated successfully.
Step 3: Add FQDN Mapping
Skip this step if your working environment has centralized hostname mapping.
Edit /etc/hosts file on all master and slave servers and add following entries.
# vi /etc/hosts
192.168.1.1 hadoop-namenode 192.168.1.2 hadoop-datanode-1 192.168.1.3 hadoop-datanode-2
Step 4. Configuring SSH key pair login
Hadoop framework itself doesn't need ssh, the administration tools like start.dfs.sh and stop.dfs.sh etc.. need it to start/stop various daemons. Thus, ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Use following commands to configure auto login between all hadoop cluster servers..
# su - hadoop $ ssh-keygen -t rsa $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop-newdatanode $ chmod 0600 ~/.ssh/authorized_keys $ exit
Step 5: Copy Hadoop folder to Slave Servers
After updating above configuration, we need to copy the source files to all slaves servers.
# su - hadoop
$ rsync -auvx /home/hadoop/.bashrc hadoop-newdatanode:/home/hadoop/
$ rsync -auvx $HADOOP_HOME hadoop-newdatanode:$HADOOP_HOME
Step 6: Configure Hadoop on namenode Server Only
Go to hadoop folder on hadoop-namenode and do following settings.
# su - hadoop
$ cd $HADOOP_HOME/etc/hadoop
$ vi slaves
Step 7: Configure Hadoop on new datanode server Only
On new datanode, you may want to specify each node data directory, data nodes can have multiple data directories dfs.data.dir
edit hdfs-site.xml, change dfs.data.dir value accordingly
# vi hdfs-site.xml
# Add the following inside the configuration tag <property> <name>dfs.data.dir</name> <value>/opt/hadoop/dfs/name/data</value> <final>true</final> </property>
Step 8: Start datanode Hadoop Services
Use the following command to start all hadoop services on new datanode
On namenode, run
Or, on new datanode, run
$ hadoop-daemon.sh start datanode
Step 9: Check Hadoop cluster status
On any of hadoop cluster node, run
hdfs dfsadmin -report