Regardless of what filesystem you choose, there are some general Linux tuning operations that apply.

Read ahead

The first parameter you should tune on any Linu installation is device read ahead. When doing sequential reads that seem to be moving forward, this feature results in Linux asking for blocks from the disk ahead of the application requesting them.


This is the key to reaching full read performance from today's faster drives. The usual symptom of insufficient read ahead is noting that write speed to a disk is faster than its read speed.

To check your current device read ahead setting

blockdev --getr /dev/sda

The default is 256 for regular drves, and may be larger for software RAID devices. The unit is normally 512bytes, so the default value of read ahead is 128kB.

To set a read ahead for a device

blockdev --setra 16384 /dev/sda

You can use linux benchmake tools to find the best read ahead value for the drive.

File access times

Each time you access a file in Linux, a file attribute called the file's last access time is updated. This overhead turns into a steady stream of writes when you're reading data. It's ok for regular file access, however it's an unwelcomed overhead when working with a database.

To disable this behavior by add noatime to the volume ounted options in /etc/fstab

/dev/sda1 / ext3 noatime,errors=remount-ro 0 1

There are two additional levels of access time updates available in some Linux kernels: nodiratime and relatme, both of which turn off a subset of the atime updates. Both of these are redundant if you use the preferred noatime, which disables them all.

Write cache size

On the write side of things, Liux handles writes to the disk using a daemon named pdflush. It will spawn some number of pdflush processes(between two and eight) to keep up with the amount of outstanding i/o. pdflush is not very aggressive about writes, under the theory that if you wait longer to write things out you will optimize total throughput. When you're writing a large data set, both write combining and being able to sort writes across more data will lower average seeking around while writing.

The main driver for when things in the write cache are aggressively written out to disk are two tuneable kernel parameters as follows:


Maximum percentage of active RAM that can be filled with dirty pages before pdflush begins to write them


Maximum percentage of total memory that can be filled with dirty pages before processes are forced to write dirty buffers themselves during their time slice, instead of being allowed to do more writes.

Note: that all processes are blocked for writes when this happens, not just the one that filled the write buffers. This can cause what is perceived as an unfair behavior where one "write-hog" process can block all i/o on the system.

On older linux kernel, it was tunned two high, on linux 2.6.22 and after, they are tunned to 
echo 10 > /proc/sys/vm/dirty_ratio
echo 5 > /proc/sys/vm/dirty_background_ratio

I/O scheduler elevator

In the article Linux IO scheduler, 4 linux i/o scheduler are described in detail. For most cases, you can only see minimal difference by changing Liux IO scheduler. In few cases, for example, on systems with a lot of device read and write cache, such as some RAID controllers and many SANs, any kernel scheduling just gets in the way. The operating system is sorting the data it knows about, but that's not considering what data is already sitting in the controller or SAN cache. The noop scheduler which just pushes data quickly toward the hardware, can improve performance if your hardware has its own large cache to worry about.

However, for localdisk without raid controller, anticipatory seems good choice.

Linux default io scheduler is CFQ, so if you don't have particular reason not to chose it, go for it.

Read caching and swapping

Liux will try to use an extra RAM for caching the filesystem, when the system runs low on RAM, the kernel has a decision to make. Rather than reducing the size of its buffer cache, instead the OS might swap inactive disk pages out. A tuneable name swappiness controlls how often to consider this action.

You can adust it in


or add a line in /etc/sysctl.conf


A value of 0 prefers shrinking the filesystem cache rather than using swap, which is recommended behavior for most of applications(especially for applications using external faster disks).


Comments powered by CComment