Most popular Linux System Monitoring Tools Every SysAdmin Should Know
The tools introduced below are available on most Linux distros. The commands discussed are some of the most basic commands, you can find them on most UNIX systems too.
These tools provide metrics which can be used to get information about system activities. You can use these tools to find the possible causes of a performance problem. When it comes to system analysis and debugging server issues such as:
- Finding out bottlenecks.
- Disk (storage) bottlenecks.
- CPU and memory bottlenecks.
- Network bottlenecks.
top – Process Activity Command
The top program provides a dynamic real-time view of a running system i.e. actual process activity. By default, it displays the most CPU-intensive tasks running on the server and updates the list every five seconds.
# top
top - 23:11:34 up 10 days, 8:51, 1 user, load average: 0.00, 0.02, 0.05
Tasks: 1732 total, 1 running, 1731 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.2 sy, 0.0 ni, 99.3 id, 0.4 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32847204 total, 12706520 free, 18816592 used, 1324092 buff/cache
KiB Swap: 4095996 total, 4095996 free, 0 used. 12961044 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22 root 20 0 0 0 0 S 5.3 0.0 2:34.70 rcu_sched
1 root 20 0 44552 6400 2288 S 0.0 0.0 0:41.99 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.40 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:20.42 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
8 root rt 0 0 0 0 S 0.0 0.0 0:00.29 migration/0
Commonly Used Hot Keys
Hot Key | Usage |
---|---|
t | Displays summary information off and on. |
m | Displays memory information off and on. |
A | Sorts the display by top consumers of various system resources. Useful for quick identification of performance-hungry tasks on a system. |
f | Enters an interactive configuration screen for top. Helpful for setting up top for a specific task. |
o | Enables you to interactively select the ordering within top. |
r | Issues renice command. |
k | Issues kill command. |
z | Turn on or off color/mono |
vmstat – System Activity, Hardware and System Information
The command vmstat reports information about processes, memory, paging, block IO, traps, and cpu activity.
# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 78876 224652 88896 4995140 0 0 6 19 5 7 1 0 98 1 0
0 0 78876 224388 88904 4995144 0 0 0 246 299 488 0 0 97 3 0
0 0 78876 224388 88908 4995144 0 0 0 21 252 461 0 0 100 0 0
0 0 78876 224388 88916 4995144 0 0 0 22 280 500 0 0 100 0 0
0 0 78876 224264 88920 4995144 0 0 0 22 299 491 0 0 100 0 0
Display Memory Utilization Slabinfo
# vmstat -m
Get Information About Active / Inactive Memory Pages
# vmstat -a
w – Find Out Who Is Logged on And What They Are Doing
w command displays information about the users currently on the machine, and their processes.
# w username
# w usertest
Sample Outputs:# w
22:23:06 up 69 days, 13:05, 4 users, load average: 0.02, 0.07, 0.20
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root pts/0 fibrevillage.com 06Sep16 0.00s 0.01s 0.00s w
root pts/1 fibrevillage.com 30Aug16 3days 4.05s 3.80s ssh hsmhead
root pts/2 fibrevillage.com 07Sep16 3days 0.02s 0.02s -bash
root pts/3 fibrevillage.com 08:44 13:34m 0.01s 0.01s -bash
uptime – Tell How Long The System Has Been Running
The uptime command can be used to see how long the server has been running. The current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.
# uptime
22:25:35 up 69 days, 13:07, 4 users, load average: 0.03, 0.07, 0.18
ps – Displays The Processes
ps command will report a snapshot of the current processes. To select all processes use the -A or -e option:
# ps -A
PID TTY TIME CMD
1 ? 00:00:02 init
2 ? 00:00:02 migration/0
3 ? 00:00:01 ksoftirqd/0
4 ? 00:00:00 watchdog/0
5 ? 00:00:00 migration/1
6 ? 00:00:15 ksoftirqd/1
....
ps is just like top but provides more information.
Show Long Format Output
# ps -Al
To turn on extra full mode (it will show command line arguments passed to process):
# ps -AlF
To See Threads ( LWP and NLWP)
# ps -AlFH
To See Threads After Processes
# ps -AlLm
Print All Process On The Server
# ps ax
# ps axu
Print A Process Tree
# ps -ejH
# ps axjf
# pstree
Print Security Information
# ps -eo euser,ruser,suser,fuser,f,comm,label
# ps axZ
# ps -eM
See Every Process Running As User Vivek
# ps -U fibrevillage -u fibrevillage u
Set Output In a User-Defined Format
# ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm
# ps axo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm
# ps -eopid,tt,user,fname,tmout,f,wchan
Display Only The Process IDs of Lighttpd
# ps -C lighttpd -o pid=
OR
# pgrep lighttpd
OR
# pgrep -u fibrevillage php-cgi
Display The Name of PID 55977
# ps -p 55977 -o comm=
Find Out The Top 10 Memory Consuming Process
# ps -auxf | sort -nr -k 4 | head -10
Find Out top 10 CPU Consuming Process
# ps -auxf | sort -nr -k 3 | head -10
free – Memory Usage
The command free displays the total amount of free and used physical and swap memory in the system, as well as the buffers used by the kernel.
# free
total used free shared buff/cache available
Mem: 32847204 18821772 12703552 18156 1321880 12956664
Swap: 4095996 0 4095996
iostat – Average CPU Load, Disk Activity
The command iostat report Central Processing Unit (CPU) statistics and input/output statistics for devices, partitions and network filesystems (NFS).
# iostat 3 3
Linux 3.10.0-327.28.3.el7.x86_64 (fibrevillage.com) 09/18/16 _x86_64_ (12 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.05 0.00 0.25 0.45 0.00 99.25
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdi 34.98 2183.13 3.08 1953310368 2753037
sdy 3.13 0.56 5.59 502407 5002991
sdb 3.13 0.56 5.59 500307 5003114
sdac 2.22 0.26 4.38 231198 3916716
sdu 2.26 0.29 4.43 256854 3967682
sdc 3.13 0.55 5.59 494696 5003282
sda 35.00 2183.40 3.08 1953556951 2753037
...
sar – Collect and Report System Activity
The sar command is used to collect, report, and save system activity information. To see network counter, enter:
# sar -n DEV | more
To display the network counters from the 24th:
# sar -n DEV -f /var/log/sa/sa24 | more
You can also display real time usage using sar:
# sar 3 5
Linux 3.10.0-327.28.3.el7.x86_64 (fibrevillage.com) 09/18/16 _x86_64_ (12 CPU)
22:54:30 CPU %user %nice %system %iowait %steal %idle
22:54:33 all 0.00 0.00 0.03 0.00 0.00 99.97
22:54:36 all 0.03 0.00 0.06 0.00 0.00 99.92
22:54:39 all 0.03 0.00 0.03 0.00 0.00 99.94
22:54:42 all 0.03 0.00 0.03 0.00 0.00 99.94
22:54:45 all 0.03 0.00 0.06 0.00 0.00 99.92
Average: all 0.02 0.00 0.04 0.00 0.00 99.94
mpstat – Multiprocessor Usage
The mpstat command displays activities for each available processor, processor 0 being the first one. mpstat -P ALL to display average CPU utilization per processor:
# mpstat -P ALL
Linux 3.10.0-327.28.3.el7.x86_64 (fibrevillage.com) 09/18/16 _x86_64_ (12 CPU)
22:55:46 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
22:55:46 all 0.05 0.00 0.24 0.45 0.00 0.01 0.00 0.00 0.00 99.25
22:55:46 0 0.12 0.00 0.49 0.32 0.00 0.05 0.00 0.00 0.00 99.03
...
22:55:46 9 0.04 0.00 0.16 0.48 0.00 0.00 0.00 0.00 0.00 99.33
22:55:46 10 0.03 0.00 0.16 0.47 0.00 0.00 0.00 0.00 0.00 99.34
22:55:46 11 0.03 0.00 0.17 0.30 0.00 0.01 0.00 0.00 0.00 99.49
pmap – Process Memory Usage
The command pmap report memory map of a process. Use this command to find out causes of memory bottlenecks.
To display process memory information for pid # 47394, enter:
# pmap -d 21559
21559: /bin/java -server -Xmx1024m -XX:MaxDirectMemorySize=512m -Dsun.net.inetaddr.ttl=1800 -Dorg.globus.tcp.port.range=20000,25000 -Dorg.dcache.dcap.port=0 -Dorg.dcache.net.tcp.portrange=33115:33145 -Dorg.globus.jglobus.delegation.cache.lifetime=30000 -Dorg.globus.jglobus.crl.cache.lifetime=60000 -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/etc/dcache/jgss.conf -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dca
Address Kbytes Mode Offset Device Mapping
0000000000400000 4 r-x-- 0000000000000000 0fd:00001 java
...
00007f389c007000 104 r--s- 00000000001c7000 0fd:00001 datanucleus-core-4.0.7.jar
00007f389c021000 8 r--s- 000000000000a000 0fd:00001 datanucleus-cache-4.0.4.jar
...
ffffffffff600000 4 r-x-- 0000000000000000 000:00000 [ anon ]
mapped: 8857280K writeable/private: 811924K shared: 7872K
The last line is very important:
- mapped: 8857280K total amount of memory mapped to files
- writeable/private: 811924K the amount of private address space
- shared: 7872K the amount of address space this process is sharing with others
netstat and ss – Network Statistics
The command netstat displays network connections, routing tables, interface statistics, masquerade connections, and multicast memberships. ss command is used to dump socket statistics. It allows showing information similar to netstat.
iptraf – Real-time Network Statistics
The iptraf command is interactive colorful IP LAN monitor. It is an ncurses-based IP LAN monitor that generates various network statistics including TCP info, UDP counts, ICMP and OSPF information, Ethernet load info, node stats, IP checksum errors, and others. It can provide the following info in easy to read format:
- Network traffic statistics by TCP connection
- IP traffic statistics by network interface
- Network traffic statistics by protocol
- Network traffic statistics by TCP/UDP port and by packet size
- Network traffic statistics by Layer2 address
tcpdump – Detailed Network Traffic Analysis
The tcpdump is simple command that dump traffic on a network. However, you need good understanding of TCP/IP protocol to utilize this tool. For.e.g to display traffic info about DNS, enter:
# tcpdump -i eth1 'udp port 53'
To display all IPv4 HTTP packets to and from port 80, i.e. print only packets that contain data, not, for example, SYN and FIN packets and ACK-only packets, enter:
# tcpdump 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'
To display all FTP session to 209.15.192.186, enter:
# tcpdump -i eth1 'dst 209.15.192.186 and (port 21 or 20'
To display all HTTP session to 209.15.192.186:
# tcpdump -ni eth0 'dst 209.15.192.186 and tcp and port http'
See detail information about files, enter:
# tcpdump -n -i eth1 -s 0 -w output.txt src or dst port 80
strace – System Calls
Trace system calls and signals. This is useful for debugging webserver and other server problems. See how to use to trace the process and see What it is doing.
/Proc file system – Various Kernel Statistics
/proc file system provides detailed information about various hardware devices and other Linux kernel information. See Linux kernel /proc documentations for further details. Common /proc examples:
# cat /proc/cpuinfo
# cat /proc/meminfo
# cat /proc/zoneinfo
# cat /proc/mounts
Nagios – Server And Network Monitoring
Nagios is a popular open source computer system and network monitoring application software. You can easily monitor all your hosts, network equipment and services. It can send alert when things go wrong and again when they get better. FAN is “Fully Automated Nagios”. FAN goals are to provide a Nagios installation including most tools provided by the Nagios Community. FAN provides a CDRom image in the standard ISO format, making it easy to easilly install a Nagios server. Added to this, a wide bunch of tools are including to the distribution, in order to improve the user experience around Nagios.
Cacti – Web-based Monitoring Tool
Cacti is a complete network graphing solution designed to harness the power of RRDTool’s data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices. It can provide data about network, CPU, memory, logged in users, Apache, DNS servers and much more. See how to install and configure Cacti network graphing tool under CentOS / RHEL.
KDE System Guard – Real-time Systems Reporting and Graphing
KSysguard is a network enabled task and system monitor application for KDE desktop. This tool can be run over ssh session. It provides lots of features such as a client/server architecture that enables monitoring of local and remote hosts. The graphical front end uses so-called sensors to retrieve the information it displays. A sensor can return simple values or more complex information like tables. For each type of information, one or more displays are provided. Displays are organized in worksheets that can be saved and loaded independently from each other. So, KSysguard is not only a simple task manager but also a very powerful tool to control large server farms.
Gnome System Monitor – Real-time Systems Reporting and Graphing
The System Monitor application enables you to display basic system information and monitor system processes, usage of system resources, and file systems. You can also use System Monitor to modify the behavior of your system. Although not as powerful as the KDE System Guard, it provides the basic information which may be useful for new users:
- Displays various basic information about the computer’s hardware and software.
- Linux Kernel version
- GNOME version
- Hardware
- Installed memory
- Processors and speeds
- System Status
- Currently available disk space
- Processes
- Memory and swap space
- Network usage
- File Systems
- Lists all mounted filesystems along with basic information about each.
Bonus: Additional Tools
A few more tools:
- nmap – scan your server for open ports.
- lsof – list open files, network connections and much more.
- ntop web based tool – ntop is the best tool to see network usage in a way similar to what top command does for processes i.e. it is network traffic monitoring software. You can see network status, protocol wise distribution of traffic for UDP, TCP, DNS, HTTP and other protocols.
- Conky – Another good monitoring tool for the X Window System. It is highly configurable and is able to monitor many system variables including the status of the CPU, memory, swap space, disk storage, temperatures, processes, network interfaces, battery power, system messages, e-mail inboxes etc.
- GKrellM – It can be used to monitor the status of CPUs, main memory, hard disks, network interfaces, local and remote mailboxes, and many other things.
- vnstat – vnStat is a console-based network traffic monitor. It keeps a log of hourly, daily and monthly network traffic for the selected interface(s).
- htop – htop is an enhanced version of top, the interactive process viewer, which can display the list of processes in a tree form.
- mtr – mtr combines the functionality of the traceroute and ping programs in a single network diagnostic tool.
Did I miss something? Please add your favorite system motoring tool in the comments.
Comments powered by CComment