blktrace is a block layer IO tracing mechanism which provides detailed information about request queue operations up to user space.
There are three major components:
a kernel component
a utility to record the i/o trace information for the kernel to user space
utilities to analyse and view the trace information
Installation
On RHEL/CentOS
# yum install blktrace
================================================================
Package Arch Version Repository Size
================================================================
Installing:
blktrace x86_64 1.0.5-6.el7 sl 132 k
Transaction Summary
================================================================
Install 1 Package
Total download size: 132 k
Installed size: 297 k
Is this ok [y/d/N]: y
Installed:
blktrace.x86_64 0:1.0.5-6.el7
blktrace Usage
Usage: blktrace
-d <dev> | --dev=<dev>
[ -r <debugfs path> | --relay=<debugfs path> ]
[ -o <file> | --output=<file>]
[ -D <dir> | --output-dir=<dir>
[ -w <time> | --stopwatch=<time>]
[ -a <action field> | --act-mask=<action field>]
[ -A <action mask> | --set-mask=<action mask>]
[ -b <size> | --buffer-size]
[ -n <number> | --num-sub-buffers=<number>]
[ -l | --listen]
[ -h <hostname> | --host=<hostname>]
[ -p <port number> | --port=<port number>]
[ -s | --no-sendfile]
[ -I <devs file> | --input-devs=<devs file>]
[ -v <version> | --version]
[ -V <version> | --version]
-d Use specified device. May also be given last after options
-r Path to mounted debugfs, defaults to /sys/kernel/debug
-o File(s) to send output to
-D Directory to prepend to output file names
-w Stop after defined time, in seconds
-a Only trace specified actions. See documentation
-A Give trace mask as a single value. See documentation
-b Sub buffer size in KiB (default 512)
-n Number of sub buffers (default 4)
-l Run in network listen mode (blktrace server)
-h Run in network client mode, connecting to the given host
-p Network port to use (default 8462)
-s Make the network client NOT use sendfile() to transfer data
-I Add devices found in <devs file>
-v Print program version info
-V Print program version info
Use blktrace to trace disk i/o
1. Trace I/O
The command below trace disk sdp for 10 seconds, and send the trace result to file trace.blktrace.[0-11], by default, the output filename in the example below should be sdp.blktrace.[0-11]
# blktrace -d /dev/sdp -o trace -w 10
=== sdp ===
CPU 0: 8780 events, 412 KiB data
CPU 1: 3173 events, 149 KiB data
CPU 2: 3600 events, 169 KiB data
CPU 3: 1445 events, 68 KiB data
CPU 4: 4604 events, 216 KiB data
CPU 5: 3764 events, 177 KiB data
CPU 6: 1152 events, 54 KiB data
CPU 7: 1142 events, 54 KiB data
CPU 8: 335 events, 16 KiB data
CPU 9: 936 events, 44 KiB data
CPU 10: 1470 events, 69 KiB data
CPU 11: 385 events, 19 KiB data
Total: 30786 events (dropped 0), 1444 KiB data
2. blkparse parse the trace output file
#blkparse -i trace.blktrace.* -d sdp.bin
...
Total (trace):
Reads Queued: 4,833, 117,316KiB Writes Queued: 0, 0KiB
Read Dispatches: 4,833, 117,316KiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 4,828, 116,829KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 0, 0KiB
IO unplugs: 0 Timer unplugs: 0
Throughput (R/W): 11,692KiB/s / 0KiB/s
Events (trace): 28,993 entries
Skips: 0 forward (0 - 0.0%)
3. Use btt to get device i/o activites statistics
# btt -i sdp.bin | more
==================== All Devices ====================
ALL MIN AVG MAX N
--------------- ------------- ------------- ------------- -----------
Q2Q 0.000002693 0.002064896 0.224947065 4832
Q2G 0.000000289 0.000010970 0.010791768 57996
G2I 0.000000214 0.000000961 0.000033457 57996
I2D 0.000000160 0.000005564 0.005938405 57108
D2C 0.000135164 0.003013303 0.035674715 4805
Q2C 0.000168496 0.003040692 0.035678620 4828
==================== Device Overhead ====================
DEV | Q2G G2I Q2M I2D D2C
---------- | --------- --------- --------- --------- ---------
( 8,240) | 4.3338% 0.3797% 0.0000% 2.1643% 98.6272%
---------- | --------- --------- --------- --------- ---------
Overall | 4.3338% 0.3797% 0.0000% 2.1643% 98.6272%
==================== Device Merge Information ====================
DEV | #Q #D Ratio | BLKmin BLKavg BLKmax Total
---------- | -------- -------- ------- | -------- -------- -------- --------
( 8,240) | 57996 57996 1.0 | 25 48 256 2830284
==================== Device Q2Q Seek Information ====================
DEV | NSEEKS MEAN MEDIAN | MODE
---------- | --------------- --------------- --------------- | ---------------
( 8,240) | 4833 712982.8 0 | 0(2296)
---------- | --------------- --------------- --------------- | ---------------
Overall | NSEEKS MEAN MEDIAN | MODE
Average | 4833 712982.8 0 | 0(2296)
==================== Device D2D Seek Information ====================
DEV | NSEEKS MEAN MEDIAN | MODE
---------- | --------------- --------------- --------------- | ---------------
( 8,240) | 57996 59415.2 0 | 0(55458)
---------- | --------------- --------------- --------------- | ---------------
Overall | NSEEKS MEAN MEDIAN | MODE
Average | 57996 59415.2 0 | 0(55458)
==================== Plug Information ====================
DEV | # Plugs # Timer Us | % Time Q Plugged
---------- | ---------- ---------- | ----------------
DEV | IOs/Unp IOs/Unp(to)
---------- | ---------- ----------
( 8,240) | 0.0 0.0
==================== Active Requests At Q Information ====================
DEV | Avg Reqs @ Q
---------- | -------------
( 8,240) | 0.0
...
Interpreting blktrace Information
Note: All time is mili seconds
Q2Q — time between requests sent to the block layer
Q2G — howlong
it takes from the time a block I/O is queued to the time it gets a request allocated
for
it
G2I — howlong
it takes from the time a request is allocated to the time it is Inserted into the device's queue
Q2M — howlong
it takes from the time a block I/O is queued to the time it gets merged with an existing request
I2D — howlong
it takes from the time a request is inserted into the device's queue to the time it is actually issued to the device
M2D — howlong
it takes from the time a block I/O is merged with an exiting request until the request is issued to the device
D2C — service time of the request by the device
Q2C — total time spent in the block layerfor
a request
I/O sequence
Q------->G------------>I--------->M------------------->D----------------------------->C
|-Q time-|-Insert time-|
|--------- merge time ------------|-merge with other IO|
|----------------scheduler time time-------------------|---driver,adapter,storagetime--|
|----------------------- await time in iostat output ----------------------------------|
Hints and tips for I/O tuning
- If Q2Q is much larger than Q2C, that means the application is not issuing I/O in rapid succession. hus, any performance problems you have may not be at all related to the I/O subsystem
- D2C is very high, then the device is taking a long time to service requests. This can indicate that the device is simply overloaded (which may be due to the fact that it is a shared resource), or it could be because the workload sent down to the device is sub-optimal.
- If Q2G is very high, it means that there are a lot of requests queued concurrently. This could indicate that the storage is unable to keep up with the I/O load.
- await in iostat output = Q2C = Q2I + I2D + D2C
- Q2I + I2D == scheduler time
- The I2D time can include a lot of apparent extra time due to plug and unplug events (not shown above) which are used to improve merging of io within the schedule sort queue
- D2C time covers driver time, adapter time, transport time, and storage service time (and back)
- So if D2C/Q2C is approaching to 1, it means % time spent on storage component is high.
- high D->C times, the underlying transport structure of storage needs to be examined, such as switch counters or maintenance interfaces for storage boxes themselves.
Ref:
http://www.cse.unsw.edu.au/~aaronc/iosched/doc/btt.html#sec:lat-d2c
Comments powered by CComment