In the other article, I showed you how to add a new datanode into hadoop cluster. Here is a quick instruction to decommission DataNodes in a Hadoop cluster:
Step1. Check dfs.hosts.exclude variable
On the NameNode host, check if dfs.hosts.exclude value defined in hdfs-site.xml, if not, add the following lines into hdfs-site.xml
Then restart Hadoop
Note: You don't have to restart hadoop if you have the variable 'dfs.hosts.exclude' defined.
Step2. add datanode to dfs.exclude
The dfs.hosts and dfs.hosts.exclude properties in hdfs-site.xml are used to specify the dfs.include and dfs.exclude files.
If no dfs.include file is specified, all DataNodes are considered to be included in the cluster (unless excluded in the dfs.exclude file).
So, On the NameNode host machine, edit the $HADOOP_HOME/etc/hadoop/dfs.exclude file and add the list of DataNodes hostnames (separated by a newline character).
However, if your cluster utilizes a dfs.include file, remove the datanode from the $HADOOP_HOME/dfs.include file on the NameNode.
Step3. Upate the namenode to exclue the datanode
Update the NameNode with the new set of excluded DataNodes. On the NameNode host, execute the following commands:
$ su - <HDFS_USER>
$ hdfs dfsadmin -refreshNodes
Step4. Check status
Open the NameNode web UI (http://<NameNode_FQDN>:50070) and navigate to the DataNodes page. Check to see whether the state has changed to Decommission In Progress for the DataNode being decommissioned.
Or run the following command to check
$ hdfs dfsadmin -report
Name: 188.8.131.52:50010 (datanode1)
Decommission Status : Decommissioned
Configured Capacity: 36493639680000 (33.19 TB)
DFS Used: 69521602560 (64.75 GB)
When all the DataNodes report their state as Decommissioned (on the DataNodes page, or on the Decommissioned Nodes page at http://<NameNode_FQDN>:8088/cluster/ nodes/decommissioned), all of the blocks have been replicated.
Move to next step to shutdown the datanode
Step5. Stop hadoop process on the datanode
$ hadoop-daemon.sh stop datanode
$ hadoop-daemon.sh stop tasktracker
You can know shutdown the datanode.