dfsAdmin Command

The hdfs dfsadmin command supports a few HDFS administration related operations. The command used to be bin/hadoop dfsadmin, now deprecated.), In this article, we'll go through dfsadmin commands with examples

hdfs dfsadmin command lists all the commands currently supported.

Usage: hdfs dfsadmin
Note: Administrative commands can only be run as the HDFS superuser.
    [-report [-live] [-dead] [-decommissioning]]
    [-safemode <enter | leave | get | wait>]
    [-saveNamespace]
    [-rollEdits]
    [-restoreFailedStorage true|false|check]
    [-refreshNodes]
    [-setQuota <quota> <dirname>...<dirname>]
    [-clrQuota <dirname>...<dirname>]
    [-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>]
    [-clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>]
    [-finalizeUpgrade]
    [-rollingUpgrade [<query|prepare|finalize>]]
    [-refreshServiceAcl]
    [-refreshUserToGroupsMappings]
    [-refreshSuperUserGroupsConfiguration]
    [-refreshCallQueue]
    [-refresh <host:ipc_port> <key> [arg1..argn]
    [-reconfig <datanode|...> <host:ipc_port> <start|status>]
    [-printTopology]
    [-refreshNamenodes datanode_host:ipc_port]
    [-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
    [-setBalancerBandwidth <bandwidth in bytes per second>]
    [-fetchImage <local directory>]
    [-allowSnapshot <snapshotDir>]
    [-disallowSnapshot <snapshotDir>]
    [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
    [-getDatanodeInfo <datanode_host:ipc_port>]
    [-metasave filename]
    [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
    [-help [cmd]]

Here are some examples:

dfsadmin -report, report HDFS statistics

$ hdfs dfsadmin -report
Configured Capacity: 124468215349248 (113.20 TB)
Present Capacity: 124466727378944 (113.20 TB)
DFS Remaining: 124441188446208 (113.18 TB)
DFS Used: 25538932736 (23.78 GB)
DFS Used%: 0.02%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.0.1:50010 (datanode1)
Hostname:datanode1
Decommission Status : Normal
Configured Capacity: 36493639548928 (33.19 TB)
DFS Used: 10960756736 (10.21 GB)
Non DFS Used: 1843200 (1.76 MB)
DFS Remaining: 36482676948992 (33.18 TB)
DFS Used%: 0.03%
DFS Remaining%: 99.97%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Dec 02 12:46:57 PST 2016


Name: 192.168.0.2:50010 (datanode2)
Hostname: datanode2
Decommission Status : Normal
Configured Capacity: 87974575800320 (80.01 TB)
DFS Used: 14578176000 (13.58 GB)
Non DFS Used: 1486127104 (1.38 GB)
DFS Remaining: 87958511497216 (80.00 TB)
DFS Used%: 0.02%
DFS Remaining%: 99.98%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Dec 02 12:46:55 PST 2016

dfsadmin -safemode maintenance command

Safe mode is a Namenode state that name space is readonly, no changes, no replicate and delete blocks.
Safe mode is entered automatically at Namenode startup, and leaves safe mode automatically when the configured minimum percentage of blocks satisfies the minimum replication condition. Safe mode can also be turned on and off manually.

Set to safemode

$ hdfs dfsadmin -safemode enter
Safe mode is ON

Check safemode

$ hdfs dfsadmin -safemode get
Safe mode is ON

Set to non safemode

$ hdfs dfsadmin -safemode leave
Safe mode is OFF

Check safemode

$ hdfs dfsadmin -safemode get
Safe mode is OFF

hdfs dfsadmin -saveNamespace

Save current namespace into storage directories and reset edits log. Requires safe mode. It saves the namespace image directly to disk(s), and resets the namenode journal(edits file), so it does not need to replay the journal.
Since saving the image is much faster than digesting the edits the command can substantially reduce the overall cluster restart time.

$ hdfs dfsadmin -saveNamespace
Save namespace successful

This command can be used before regular (planned) cluster shutdown.

hdfs dfsadmin -rollEdits

Rolls the edit log on the active NameNode. 

$ hdfs dfsadmin -rollEdits 
Successfully rolled edit logs.
New segment starts at txid 22086

hdfs dfsadmin -restoreFailedStorage

This option will turn on/off automatic attempt to restore failed storage replicas. If a failed storage becomes available again the system will attempt to restore edits and/or fsimage during checkpoint. ‘check’ option will return current setting.

restoreFailedStorage check

$ hdfs dfsadmin -restoreFailedStorage check
restoreFailedStorage is set to false

restoreFailedStorage true

$ hdfs dfsadmin -restoreFailedStorage true
restoreFailedStorage is set to true

restoreFailedStorage false

$ hdfs dfsadmin -restoreFailedStorage true
restoreFailedStorage is set to true

hdfs dfsadmin -refreshNode

Re-read the hosts and exclude files to update the set of Datanodes that are allowed to connect to the Namenode and those that should be decommissioned or recommissioned. 

$ hdfs dfsadmin -refreshNodes
Refresh nodes successful

hdfs dfsadmin -reconfig datanode

Start reconfiguration or get the status of an ongoing reconfiguration. The second parameter specifies the node type. Currently, only reloading DataNode’s configuration is supported.

$ hdfs dfsadmin -reconfig datanode datanode2:50020 status
Reconfiguring status for DataNode[datanode2:50020]: no task was found.

hdfs dfsadmin -setBalancerBandwidth

Changes the network bandwidth used by each datanode during HDFS block balancing. <bandwidth> is the maximum number of bytes per second that will be used by each datanode. This value overrides the dfs.balance.bandwidthPerSec parameter. NOTE: The new value is not persistent on the DataNode.

$ hdfs dfsadmin -setBalancerBandwidth 1000000
Balancer bandwidth is set to 100000

hdfs dfsadmin -allowSnapshot/disallowSnapshot

Allowing  or Disallowing snapshots of a directory to be created. If the operation completes successfully, the directory becomes snapshottable. See the HDFS Snapshot Documentation for more information.

$ hdfs dfsadmin -allowSnapshot /user
Allowing snaphot on /user succeeded

$ hdfs dfsadmin -disallowSnapshot /user
Disallowing snaphot on /user succeeded

hdfs dfsadmin -fetchImage

Downloads the most recent fsimage from the NameNode and saves it in the specified local directory.

$ hdfs dfsadmin -fetchImage /home/hadoop
16/12/02 19:05:30 INFO namenode.TransferFsImage: Opening connection to http://namenode:50070/imagetransfer?getimage=1&txid=latest
16/12/02 19:05:30 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
16/12/02 19:05:31 INFO namenode.TransferFsImage: Transfer took 0.10s at 39.60 KB/s
$ ls
fsimage_0000000000000022083

hdfs dfsadmin -getDatanodeInfo

Get the information about the given datanode. See Rolling Upgrade document for the detail.

$ hdfs dfsadmin -getDatanodeInfo  datanode2:50020
Uptime: 97635, Software version: 2.7.3, Config version: core-0.23.0,hdfs-1

hdfs dfsadmin -shutdownDatanode

Submit a shutdown request for the given datanode. See Rolling Upgrade document for the detail.

$ hdfs dfsadmin -shutdownDatanode  datanode2:50020 
Submitted a shutdown request to datanode datanode2:50020

hdfs dfsadmin -shutdownDatanode

Trigger a block report for the given datanode. If ‘incremental’ is specified, it will be otherwise, it will be a full block report.

$ hdfs dfsadmin -triggerBlockReport datanode2:50020
Triggering a full block report on datanode2:50020

hdfs dfsadmin -printTopology

Print a tree of the racks and their nodes as reported by the Namenode 

$ hdfs dfsadmin -printTopology 
Rack: /default-rack
   192.168.0.1:50010 (datanode2)
   192.168.0.2:50010 (datanode1)

For a quick view, see the table blow

COMMAND_OPTION Description
-report [-live] [-dead] [-decommissioning] Reports basic filesystem information and statistics. Optional flags may be used to filter the list of displayed DataNodes.
-safemode enter|leave|get|wait Safe mode maintenance command. Safe mode is a Namenode state in which it
1. does not accept changes to the name space (read-only)
2. does not replicate or delete blocks.
Safe mode is entered automatically at Namenode startup, and leaves safe mode automatically when the configured minimum percentage of blocks satisfies the minimum replication condition. Safe mode can also be entered manually, but then it can only be turned off manually as well.
-saveNamespace Save current namespace into storage directories and reset edits log. Requires safe mode.
-rollEdits Rolls the edit log on the active NameNode.
-restoreFailedStorage true|false|check This option will turn on/off automatic attempt to restore failed storage replicas. If a failed storage becomes available again the system will attempt to restore edits and/or fsimage during checkpoint. ‘check’ option will return current setting.
-refreshNodes Re-read the hosts and exclude files to update the set of Datanodes that are allowed to connect to the Namenode and those that should be decommissioned or recommissioned.
-setQuota <quota> <dirname>…<dirname> See HDFS Quotas Guide for the detail.
-clrQuota <dirname>…<dirname> See HDFS Quotas Guide for the detail.
-setSpaceQuota <quota> <dirname>…<dirname> See HDFS Quotas Guide for the detail.
-clrSpaceQuota <dirname>…<dirname> See HDFS Quotas Guide for the detail.
-setStoragePolicy <path> <policyName> Set a storage policy to a file or a directory.
-getStoragePolicy <path> Get the storage policy of a file or a directory.
-finalizeUpgrade Finalize upgrade of HDFS. Datanodes delete their previous version working directories, followed by Namenode doing the same. This completes the upgrade process.
-rollingUpgrade [<query>|<prepare>|<finalize>] See Rolling Upgrade document for the detail.
-metasave filename Save Namenode’s primary data structures to filename in the directory specified by hadoop.log.dir property. filename is overwritten if it exists. filename will contain one line for each of the following
1. Datanodes heart beating with Namenode
2. Blocks waiting to be replicated
3. Blocks currently being replicated
4. Blocks waiting to be deleted
-refreshServiceAcl Reload the service-level authorization policy file.
-refreshUserToGroupsMappings Refresh user-to-groups mappings.
-refreshSuperUserGroupsConfiguration Refresh superuser proxy groups mappings
-refreshCallQueue Reload the call queue from config.
-refresh <host:ipc_port> <key> [arg1..argn] Triggers a runtime-refresh of the resource specified by <key> on <host:ipc_port>. All other args after are sent to the host.
-reconfig <datanode |…> <host:ipc_port> <start|status> Start reconfiguration or get the status of an ongoing reconfiguration. The second parameter specifies the node type. Currently, only reloading DataNode’s configuration is supported.
-printTopology Print a tree of the racks and their nodes as reported by the Namenode
-refreshNamenodes datanodehost:port For the given datanode, reloads the configuration files, stops serving the removed block-pools and starts serving new block-pools.
-deleteBlockPool datanode-host:port blockpoolId [force] If force is passed, block pool directory for the given blockpool id on the given datanode is deleted along with its contents, otherwise the directory is deleted only if it is empty. The command will fail if datanode is still serving the block pool. Refer to refreshNamenodes to shutdown a block pool service on a datanode.
-setBalancerBandwidth <bandwidth in bytes per second> Changes the network bandwidth used by each datanode during HDFS block balancing. <bandwidth> is the maximum number of bytes per second that will be used by each datanode. This value overrides the dfs.balance.bandwidthPerSec parameter. NOTE: The new value is not persistent on the DataNode.
-allowSnapshot <snapshotDir> Allowing snapshots of a directory to be created. If the operation completes successfully, the directory becomes snapshottable. See the HDFS Snapshot Documentation for more information.
-disallowSnapshot <snapshotDir> Disallowing snapshots of a directory to be created. All snapshots of the directory must be deleted before disallowing snapshots. See the HDFS Snapshot Documentation for more information.
-fetchImage <local directory> Downloads the most recent fsimage from the NameNode and saves it in the specified local directory.
-shutdownDatanode <datanode_host:ipc_port> [upgrade] Submit a shutdown request for the given datanode. See Rolling Upgrade document for the detail.
-getDatanodeInfo <datanode_host:ipc_port> Get the information about the given datanode. See Rolling Upgrade document for the detail.
-triggerBlockReport [-incremental] <datanode_host:ipc_port> Trigger a block report for the given datanode. If ‘incremental’ is specified, it will be otherwise, it will be a full block report.
-help [cmd] Displays help for the given command or all commands if none is specified.