Here are some most used commands for Hadoop troubleshooting.

HDFS cluster report

$ hdfs dfsadmin -report
Configured Capacity: 138556755984384 (126.02 TB)
Present Capacity: 138508930407027 (125.97 TB)
DFS Remaining: 138032209448981 (125.54 TB)
DFS Used: 476720958046 (443.98 GB)
DFS Used%: 0.34%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------

Trigger block report on DataNodes

If you find/think a datanode is not stable and you need to update, or you think there is a potential unknown bug in name-node (NN) replica accounting and you need to work around. As a cluster admin, if you suspect such an issue, you might be tempted to restart a DN, or all of the DNs in a cluster, in order to trigger full block reports. It'd be much lighter weight if instead you could just manually trigger a full BR instead of having to restart the DN and therefore need to scan all the DN data dirs, etc.

$ hdfs dfsadmin -triggerBlockReport [-incremental] <datanode_host:ipc_port>

This command is to help you. If "-incremental" is specified, it will be incremental block report (IBR). Otherwise, it will be a full block report.

For example:

$ hdfs dfsadmin -triggerBlockReport 192.168.0.2:50020
Triggering a full block report on 192.168.0.2:50020.

Verify block metadata

Say you have a replica, and you don't know whether it's corrupt.

$ hdfs debug verify -meta <metadata-file> [-block <block-file>]

This command is to verify a block's metadata. Argument "-meta <metadata-file>" is the absolute path for the metadata file on the local file system of the data node. Argument "-block <block-file>" is an optional parameter to specify the absolute path for the block file on the local file system of the data node.

Recover the lease for the file

When you do "hdfs dfs -cat file1" from the command line, you get the exception saying that it "Cannot obtain block length for LocatedBlock". Usually this means the file is still in being-written state, i.e., it has not been closed yet, and the reader cannot successfully identify its current length by communicating with corresponding DataNodes.

Suppose you're pretty sure the writer client is dead, killed, or lost connection to the servers. You're wondering what else you can do other than waiting.

    hdfs debug recoverLease -path <path-of-the-file> [-retries <retry-times>]

This command will ask the NameNode to try to recover the lease for the file, and based on the NameNode log you may track to detailed DataNodes to understand the states of the replicas. The command may successfully close the file if there are still healthy replicas. Otherwise we can get more internal details about the file/block state. Please refer to https://community.hortonworks.com/questions/37412/cannot-obtain-block-length-for-locatedblock.html for discussion, especially answer made by @Jing Zhao.

This is a lightweight operation so the server should not crash if you run it. This is an idempotent operation so the server should not crash if you run the this command multiple times against the same path.

Dump NameNode's metadata

You want to dump NameNode primary data structures.

$ hdfs dfsadmin -metasave <filename>
Created metasave file filename in the log directory of namenode hdfs://<namenode>:9000

This command is to save NameNode's meta data to filename in the directory specified by hadoop.log.dir property. The saved metadata will contain one line for each of the following:

    Datanodes heart beating with Namenode
    Blocks waiting to be replicated
    Blocks currently being replicated
    Blocks waiting to be deleted

Get specific Hadoop config

You want to know one specific config. The one actually effect in the cluster. Then you find something special like in the configuration files. For example:

Find the xml configuration files in your cluster.
Also, use hdfs getconf command to get some properties that actually effect in your cluster.

    hdfs getconf -confKey <key>

This command is to show you the actual, final results of any configuration properties as they are actually used by Hadoop. Interestingly, it is capable of checking configuration properties for YARN and MapReduce, not only HDFS.