Why We Disable Linux’s THP Feature for Databases

What is THP

THP is an important feature of the Linux kernel. It maps page table entries to larger page sizes to reduce page faults. This improves the translation lookaside buffer (TLB) hit ratio. TLB is a memory cache used by the memory management unit to improve the translation speed from virtual memory addresses to physical memory addresses.

How Linux manages its memory

To understand the harm THP can cause, let’s consider how Linux manages its physical memory.

  • If the asynchronous memory reclaim can’t keep up with the memory application, Linux triggers the synchronous direct reclaim. In such cases, all memory-related threads synchronously take part in freeing memory. When enough memory becomes available, the threads start to get the memory space they apply for.

When THP causes performance fluctuation

If your system performance fluctuates, how can you be sure THP is the cause? I’d like to share three symptoms that we’ve found are related to THP.

The most typical symptom: sys cpu rises

Based on our customer support experience, the most typical symptom of THP-caused performance fluctuation is sharply rising system CPU utilization.

The indirect symptom: sys load rises

Many memory issues are not as obvious as those described above. When the system allocates or other high-level memory, it doesn’t perform memory compaction directly and leave you an obvious trace. Instead, it often mixes the compaction with other tasks, such as direct memory reclaim.

  1. Direct memory reclaim
  2. Direct memory compaction
  3. Out of memory (OOM) collection
  1. Observe the memory fragmentation index. Execute cat /sys/kernel/debug/extfrag/extfrag_index to get the index. Focus on the fragmentation index of the block whose order is >= 3. If the number is close to 1,000, the fragmentation is severe; if it's close to 0, the memory is insufficient.
  2. View the memory fragmentation status. Execute cat /proc/buddyinfo and cat /proc/pagetypeinfo to show the status. (Refer to the Linux manual page for details.) Focus on the number of pages whose order is >= 3.
    Compared to buddyinfo, pagetypeinfo displays more detailed information grouped by migration types. The buddy system implements anti-fragmentation through migration types. Note that if all the Unmovable pages are grouped in order < 3, the kernel slab objects have severe fragmentation. In such cases, you need to troubleshoot the specific cause of the problem using other tools.
  3. For kernels that support the Berkeley Packet Filter (BPF), such as CentOS 7.6, you may also perform quantitative analysis on the latency using drsnoop or compactsnoop developed by PingCAP.
  4. (Optional) Trace the mm_page_alloc_extfrag event with ftrace. Due to memory fragmentation, the migration type steals physical pages from the backup migration type.

The atypical symptom: abnormal RES usage

Sometimes, when the service starts on an AARCH64 server, dozens of gigabytes of physical memory are occupied. By viewing the /proc/pid/smaps file, you may see that most memory is used for THP. Because AARCH64's CentOS 7 kernel sets its page size as 64 KB, its resident memory usage is many times larger than that of the x86_64 platform.

How to deal with THP

For applications that are not optimized to store their data contiguously, or applications that have sparse workloads, enabling THP and THP defrag is detrimental to the long-running services.

cat /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Join our community

If you have any other questions about database performance tuning, or would like to share your expertise, feel free to join the TiDB Community Slack workspace.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
PingCAP

PingCAP

PingCAP is the team behind TiDB, an open-source MySQL compatible NewSQL database. Official website: https://pingcap.com/ GitHub: https://github.com/pingcap