In the other article, I iniroduced hugepages and transparent hugepages, here I'm going to show you how to enable, use, monitor, and disable hugepages.


Enable HugePages and Transparent HugePages

Starting with Red Hat Enterprise Linux 7.1, there are two ways of reserving huge pages: at bo o t ti me and at run ti me. Reserving at boot time increases the possibility of success because the memory has not yet been significantly fragmented. However, on NUMA machines, the number of pages is automatically split among NUMA nodes. The run-time method allows you to reserve huge pages per NUMA node. If the run-time reservation is done as early as possible in the boot process, the probability of memory fragmentation is lower.

Configure HugePages at boot time

The page size the HugeTLB subsystem supports depends on the architecture. On the AMD 64 and Intel 64 architecture, 2 MB huge pages and 1 GB gigantic pages are supported.

1. Create a HugeTLB pool for 1 GB pages by appending the following line to the kernel command-line options:

default_hugepagesz=1G hugepagesz=1G

2. Create a file named /usr/l i b/systemd /system/hug etl b-g i g anti c-pag es. service with the following content:

Description=HugeTLB Gigantic Pages Reservation

3. Create a file named /usr/l i b/systemd /hug etl b-reserve-pag es with the following content:

if [ ! -d $nodes_path ]; then
echo "ERROR: $nodes_path does not exist"
exit 1
echo $1 > $nodes_path/$2/hugepages/hugepages-
# This example reserves 2 1G pages on node0 and 1 1G page on node1.
# can modify it to your needs or add more lines to reserve memory in
# other nodes. Don't forget to uncomment the lines, otherwise then won't
# be executed.
# reserve_pages 2 node0
# reserve_pages 1 node1

4. Modify /usr/l i b/systemd /hug etl b-reserve-pag es according to the comments in the file.
5. Run the following commands to enable early boot reservation:

# chmod +x /usr/lib/systemd/hugetlb-reserve-pages
# systemctl enable hugetlb-gigantic-pages

Config HugePages at run time

Suppose you want to allocate 1GB to hugepage at run time:

echo "512" > /proc/sys/vm/nr_hugepages

Or run the following commands to just add hugepage to a particular NUMA node

echo "512" > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

In the example, it defines 512 pages in 2MB to NUMA node 0

# numastat -cm | egrep "Node|Huge"
                 Node 0 Node 1 Total
AnonHugePages        68      2    70
HugePages_Total    1024      0  1024
HugePages_Free     1024      0  1024
HugePages_Surp        0      0     0

You can also adds 1024 hugepages to node1

# echo 1024 >/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
# numastat -cm | egrep "Node|Huge"
                 Node 0 Node 1 Total
AnonHugePages        68      2    70
HugePages_Total    1024   2048  3072
HugePages_Free     1024   2048  3072
HugePages_Surp        0      0     0

Also, you want to define the maxnum number of additional hugepages through overcommitting memory.

echo 1024 > /proc/sys/vm/nr_overcommit_hugepages

It defines the maximum number of additional huge pages that can be created and used by
the system through overcommitting memory.  What does it mean?

It indicates that the system obtains that number of huge pages from the kernel's normal page pool if the persistent huge page pool is exhausted. As these surplus huge pages become unused, they are then freed and returned to the kernel's normal page pool.

Config Transparent Hugepages

Huge pages can be difficult to manage manually, and often require significant changes to code in order to be used effectively. As such, Red Hat Enterprise Linux 6 and after release implemented the use of transparent huge pages (THP). THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. So huge pages do not need to be reserved manually. The THP feature has two modes of operation: system-wide and per-process.

system wide THP

When THP is enabled system-wide, the kernel tries to assign huge pages to any process when it is possible to allocate huge pages and the process is using a large contiguous virtual memory area.

Process wide THP

If THP is enabled perprocess, the kernel only assigns huge pages to individual processes' memory areas specified with the mad vi se() system call. To use transparent huge pages only in process wide, set

# echo madvise > /sys/kernel/mm/transparent_hugepage/enabled


that the THP feature only supports 2-MB pages and  Transparent huge pages are enabled by default.

To check the current THP status

# cat /sys/kernel/mm/transparent_hugepage/enabled

To enable transparent huge pages

# echo always > /sys/kernel/mm/transparent_hugepage/enabled

To disable transparent huge pages

# echo never > /sys/kernel/mm/transparent_hugepage/enabled
# echo never > /sys/kernel/mm/transparent_hugepage/defrag

To disable transparent huge pages system wide

To prevent applications from allocating more memory resources than necessary, you can disable huge pages system-wide and only enable them inside MAD V_HUGEPAGE madvise regions by running:

# echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

To disable direct compaction

Sometimes, providing low latency to short-lived allocations has higher priority than immediately achieving the best performance with long-lived allocations. In such cases, direct compaction can be disabled while leaving THP enabled. Direct compaction is a synchronous memory compaction during the huge page allocation. Disabling direct compaction provides no guarantee of saving memory, but can decrease the risk of higher latencies during frequent page faults.

Note that if the workload benefits significantly from THP, the performance decreases.

To disable direct compaction, run:

# echo madvise > /sys/kernel/mm/transparent_hugepage/defrag

For comprehensive information on transparent huge pages, see the /usr/share/doc/kernel-doc-kernel_version/Documentation/vm/transhuge.txt file, which is available after installing the kernel-doc package.

To disable THP at boot time

Append the following to the kernel command line in grub.conf: