Here is a Hadoop ports using reference, and hadoop services mapping

Hadoop HDFS DataNode 50010
DataNode HTTP UI 50075
DataNode IPC 50020
HA JournalNode 8485
NameNode 8020
Secondarynamenode 50090
NameNode HTTP UI 50070
backup /checkpoint node 50105
NameNode HTTPS UI 50470
HDFS over HTTP (HTTPFS) 14000
Hadoop MapReduce JobHistory HTTP UI 19888
JobHistory Server 10020
Shuffle Port 13562
Job tracker 50060
Tasktracker 50030
Hadoop YARN App Masters random
NodeManager 8041
NodeManager HTTP UI 8042
NodeManager localizer 8040
ResourceManager 8032
ResourceManager Admin 8033
ResourceManager HTTP UI 8088
ResourceManager Scheduler 8030
ResourceManager Tracker 8031
Hadoop Hive MetaStore 9083
Server 10000
Others Hue Server 8888
Oozie Server Admin Port 11001
Oozie Server HTTP interface 11000
Spark Local Client Driver HTTP UI 4040 and up, random
Spark Yarn Shuffle Service 7337
Zookeeper 2181 (client port)


A bit more brief introduction for ports from cloudera


Hadoop daemons expose some information over HTTP. All Hadoop daemons expose the following:

Exposes, for downloading, log files in the Java system property hadoop.log.dir.
Allows you to dial up or down log4j logging levels. This is similar to hadoop daemonlog on the command line.
Stack traces for all threads. Useful for debugging.
Metrics for the server. Use /metrics?format=json to retrieve the data in a structured form. Available in 0.21.

Individual daemons expose extra daemon-specific endpoints as well. Note that these are not necessarily part of Hadoop’s public API, so they tend to change over time.

The Namenode exposes:

Shows information about the namenode as well as the HDFS. There’s a link from here to browse the filesystem, as well.
Shows lists of nodes that are disconnected from (DEAD) or connected to (LIVE) the namenode.
Runs the “fsck” command. Not recommended on a busy cluster.
Returns an XML-formatted directory listing. This is useful if you wish (for example) to poll HDFS to see if a file exists. The URL can include a path (e.g., /listPaths/user/philip) and can take optional GET arguments: /listPaths?recursive=yes will return all files on the file system; /listPaths/user/philip?filter=s.* will return all files in the home directory that start with s; and /listPaths/user/philip?exclude=.txt will return all files except text files in the home directory. Beware that filter and exclude operate on the directory listed in the URL, and they ignore the recursive flag.
/data and /fileChecksum
These forward your HTTP request to an appropriate datanode, which in turn returns the data or the checksum.

Datanodes expose the following:

/browseBlock.jsp, /browseDirectory.jsp, tail.jsp, /streamFile, /getFileChecksum
These are the endpoints that the namenode redirects to when you are browsing filesystem content. You probably wouldn’t use these directly, but this is what’s going on underneath.
Every datanode verifies its blocks at configurable intervals. This endpoint provides a listing of that check.

The secondarynamenode exposes a simple status page with information including which namenode it’s talking to, when the last checkpoint was, how big it was, and which directories it’s using.

The jobtracker‘s UI is commonly used to look at running jobs, and, especially, to find the causes of failed jobs. The UI is best browsed starting at /jobtracker.jsp. There are over a dozen related pages providing details on tasks, history, scheduling queues, jobs, etc.

Tasktrackers have a simple page (/tasktracker.jsp), which shows running tasks. They also expose /taskLog?taskid= to query logs for a specific task. They use /mapOutput to serve the output of map tasks to reducers, but this is an internal API.

Internally, Hadoop mostly uses Hadoop IPC to communicate amongst servers. (Part of the goal of the Apache Avro project is to replace Hadoop IPC with something that is easier to evolve and more language-agnostic; HADOOP-6170 is the relevant ticket.) Hadoop also uses HTTP (for the secondarynamenode communicating with the namenode and for the tasktrackers serving map outputs to the reducers) and a raw network socket protocol (for datanodes copying around data).



Comments powered by CComment