I/O aligment is one of the key point for storage performance, As part of enhancements of the SCSI and ATA standards, which now allow storage devices to indicate their preferred (and in some cases, required) I/O alignment and I/O size.


This information is particularly useful with newer disk drives that increase the physical sector size from 512 bytes to 4k bytes. This information may also be beneficial for RAID devices, where the chunk size and stripe size may impact performance.
The Linux I/O stack has been enhanced to process vendor-provided I/O alignment and I/O size information, allowing storage management tools (parted , l vm, mkfs. *, and the like) to optimize data placement and access.

If a legacy device does not export I/O alignment and size data, then storage management tools in Red Hat Enterprise Linux 7 will conservatively align I/O on a 4k (or larger power of 2) boundary. This will ensure that 4k-sector devices operate correctly even if they do not indicate any required/preferred I/O alignment and size.
The IO scheduler has changed for Red Hat Enterprise Linux 7. D efault IO Scheduler is now Deadline, except for SATA drives. CFQ is the default IO scheduler for SATA drives. For faster storage, D eadline outperforms CFQ and when it is used there is a performance increase without the need of special tuning.
If default is not right for some disks (for example, SAS rotational disks), then change the IO scheduler to CFQ. This instance will depend on the workload.

Parameters for Storage Access

The operating system uses the following information to determine I/O alignment and size:
     Smallest internal unit on which the device can operate
     Used externally to address a location on the device
     The number of bytes that the beginning of the Linux block device (partition/MD /LVM device) is offset from the underlying physical alignment
     The device’s preferred minimum unit for random I/O
     The device’s preferred unit for streaming I/O
For example, certain 4K sector devices may use a 4K physical_block_size internally but expose a more granular 512-byte logical_block_size to Linux. This discrepancy introduces potential for misaligned I/O. To address this, the Red Hat Enterprise Linux 7 I/O stack will attempt to start all data areas on a naturally-aligned boundary (physi cal_block_size) by making sure it accounts for any alignment_offset if the beginning of the block device is offset from the underlying physical alignment.

Storage vendors can also supply I/O hints about the preferred minimum unit for random I/O (minimum_io_size) and streaming I/O (optimal_io_size) of a device. For example, minimum_io_size and optimal_io_size may correspond to a RAID device's chunk size and stripe size respectively.

Userspace Access

Always take care to use properly aligned and sized I/O. This is especially important for Direct I/O access. D irect I/O should be aligned on a logical_block_size boundary, and in multiples of the log ical_block_size.
With native 4K devices (i.e. logical_block_size is 4K) it is now critical that applications perform direct I/O in multiples of the device's logical_block_size. This means that applications will fail with native 4k devices that perform 512-byte aligned I/O rather than 4k-aligned I/O.
To avoid this, an application should consult the I/O parameters of a device to ensure it is using the proper I/O alignment and size. As mentioned earlier, I/O parameters are exposed through the both sysfs and block device i o ctl interfaces.
You can find more details, in man libblkid . This man page is provided by the libblkid -devel package.

sysfs Interface




The file location depends on whether the disk is a physical disk (be that a local disk, local RAID , or a multipath LUN) or a virutal disk. The first file location is applicable to physical disks while the second file location is applicable to virtual disks. The reason for this is because virtio-blk will always report an alignment value for the partition. Physical disks may or may not report an alignment value.


The kernel will still export these sysfs attributes for " legacy" devices that do not provide I/O parameters information, for example:

alignment_offset: 0
physical_block_size: 512
logical_block_size: 512
minimum_io_size: 512
optimal_io_size: 0
Block Device ioct ls
BLKALIGNOFF: alignment_offset
BLKPBSZGET : physical_block_size
BLKSSZGET : logical_block_size
BLKIO MIN: minimum_io_size
BLKIOOPT : optimal_io_size

ATA and SCSI devices Standards


ATA devices must report appropriate information via the IDENTIFYDEVICE command. ATA devices only report I/O parameters for physi cal_block_size, logical_block_size, and alignment_offset. The additional I/O hints are outside the scope of the ATA Command Set.


I/O parameters support in Red Hat Enterprise Linux 7 requires at least version 3 of the SCSI Primary Commands (SPC-3) protocol. The kernel will only send an extended inquiry (which gains access to the BLOCK LIMITS VPD page) and READ CAPACITY (16 ) command to devices which claim compliance with SPC-3.
The READ CAPACITY (16 ) command provides the block sizes and alignment offset:

The BLOCK LIMITS VPD page (0xb0) provides the I/O hints. It also uses OPTIMAL TRANSFER
The sg3_utils package provides the sg_inq utility, which can be used to access the BLOCK
LIMITS VPD page. To do so, run:
# sg_inq -p 0xb0 dis

Stacking I/O Parameters

All layers of the Linux I/O stack have been engineered to propagate the various I/O parameters up the stack. When a layer consumes an attribute or aggregates many devices, the layer must expose appropriate I/O parameters so that upper-layer devices or tools will have an accurate view of the storage as it transformed. Some practical examples are:

Only one layer in the I/O stack should adjust for a non-zero alignment_offset; once a layer adjusts accordingly, it will export a device with an alignment_offset of zero.
A striped D evice Mapper (DM) device created with LVM must export a minimum_io_size and optimal_io_size relative to the stripe count (number of disks) and user-provided chunk size.

In Red Hat Enterprise Linux 7, Device Mapper and Software Raid (MD ) device drivers can be used to  arbitrarily combine devices with different I/O parameters. The kernel's block layer will attempt to reasonably  combine the I/O parameters of the individual devices. The kernel will not prevent combining heterogeneous devices; however, be aware of the risks associated with doing so.
For instance, a 512-byte device and a 4K device may be combined into a single logical DM device, which would have a logical_block_size of 4K. File systems layered on such a hybrid device assume that 4K will be written atomically, but in reality it will span 8 logical block addresses when issued to the 512-byte device. Using a 4K logical_block_size for the higher-level DM device
increases potential for a partial write to the 512-byte device if there is a system crash. If combining the I/O parameters of multiple devices results in a conflict, the block layer may issue a warning that the device is susceptible to partial writes and/or is misaligned.

Logical Volume Manager

LVM provides userspace tools that are used to manage the kernel's DM devices. LVM will shift the start of the data area (that a given DM device will use) to account for a non-zero alignment_offset associated with any device managed by LVM. This means logical volumes will be properly aligned (alignment_offset= 0 ).
By default, LVM will adjust for any alignment_offset, but this behavior can be disabled by setting data_alignment_offset_detection to 0 in /etc/lvm/lvm.conf. Disabling this is not recommended.
LVM will also detect the I/O hints for a device. The start of a device's data area will be a multiple of the minimum_io_size or optimal_io_size exposed in sysfs. LVM will use the minimum_io_size if optimal_io_size is undefined (i.e. 0 ).
By default, LVM will automatically determine these I/O hints, but this behavior can be disabled by setting d ata_alignment_detection to 0 in /etc/lvm/lvm.conf. Disabling this is not recommended.

Partition and File System Tools

This section describes how different partition and file system management tools interact with a device's I/O parameters.
util-linux-ng's libblkid and fdisk
The libblkid library provided with the util -linux-ng package includes a programmatic API to access a device's I/O parameters. libblkid allows applications, especially those that use Direct I/O, to properly size their I/O requests. The fdisk utility from util-linux-ng uses libblkid to determine the I/O parameters of a device for optimal placement of all partitions. The fdisk utility will align all partitions on a 1MB boundary.

parted and libparted
The libparted library from parted also uses the I/O parameters API of libblkid . The Red Hat Enterprise Linux 7 installer (Anaconda) uses libparted , which means that all partitions created by either the installer or parted will be properly aligned. For all partitions created on a device that does not appear to provide I/O parameters, the default alignment will be 1MB.
The heuristics parted uses are as follows:

Always use the reported alignment_offset as the offset for the start of the first primary partition.
If optimal_io_size is defined (i.e. not 0 ), align all partitions on an optimal_io_size boundary.
If optimal_io_size is undefined (i.e. 0 ), alignment_offset is 0 , and minimum_io_size is a power of 2, use a 1MB default alignment.
This is the catch-all for " legacy" devices which don't appear to provide I/O hints. As such, by default all partitions will be aligned on a 1MB boundary.

Red Hat Enterprise Linux 7 cannot distinguish between devices that don't provide I/O hints and those that do so with alignment_offset= 0 and optimal _io_size= 0 . Such a device might be a single SAS 4K device; as such, at worst 1MB of space is lost at the start of the disk.

File System tools
The different mkfs.filesystem utilities have also been enhanced to consume a device's I/O parameters. These  utilities will not allow a file system to be formatted to use a block size smaller than the logical_block_size of the underlying storage device.
Except for mkfs.gfs2, all other mkfs.filesystem utilities also use the I/O hints to layout on-disk data structure  and data areas relative to the minimum_io_size and optimal_io_size of the underlying storage device. This  allows file systems to be optimally formatted for various RAID (striped) layouts.


Comments powered by CComment