close
close

Linux 6.11 will offer more precise control over swapping

LINUX KERNEL

As part of the memory management changes that are planned to be merged for the upcoming Linux 6.11 cycle, more precise control over the swappiness parameter used to determine how aggressively pages are swapped out of physical system memory and into disk swap space is allowed.

With the new Meta code, a swappiness argument is supported for memory.reclaim. This allows for more precise control of swapiness behavior without overriding the global swappiness setting.

RAM memory modules

Dan Schatzberg of Meta explains in the patch the addition of swappiness= support to memory.reclaim:

Allow proactive reclaimers to submit an additional swappiness=(val) argument to memory.reclaim. This overrides the global or per-memory swappiness setting for this reclaim attempt.

For example:

echo “2M swappiness=0” > /sys/fs/cgroup/memory.reclaim

will perform a recovery on the rootcg with a swappiness setting of 0 (no swap) regardless of the sysctl vm.swappiness setting.

Userspace proactive reclaimers use the memory.reclaim interface to trigger reclamation. The memory.reclaim interface does not allow to influence the balance between files and anonymous during proactive reclamation. The only approach is to adjust the vm.swappiness parameter. However, there are a few reasons why we want to control the balance between files and anonymous during proactive reclamation, independently of reactive reclamation:

* Swap-out should be limited to manage SSD write endurance. In near-OOM situations, we accept many swap-outs to avoid OOMs. Since these are typically rare events, they have relatively little impact on write endurance. However, proactive recovery runs continuously and therefore has a greater impact on SSD write endurance. Therefore, it is desirable to control swap-out for proactive recovery separately from reactive recovery

* Some userspace OOM Killers like systemd-oomd(1) support OOM elimination on swap exhaustion. This makes sense if swap exhaustion is triggered by reactive recovery, but less so if it is triggered by proactive recovery (e.g. one might see OOMs when free memory is sufficient but anonymous is particularly cold). Therefore, it is desirable for proactive recovery to reduce or stop swap out before the threshold at which OOM elimination occurs.

In the case of Meta’s Senpai proactive collector, we adjust vm.swappiness before writes to memory.reclaim(2). This solution has been in production for almost two years and meets our needs for controlling proactive and reactive collection behavior, but it’s still not ideal for a number of reasons:

* vm.swappiness is a global setting, setting it may conflict/interfere with other system administrators who want to control vm.swappiness. In our case, we need to disable Senpai before setting vm.swappiness.

* vm.swappiness is stateful – so a crash or reboot of Senpai can leave a misconfigured setting. This requires additional management to record the “desired” setting and ensure Senpai always adapts to it.

With this patch we avoid these drawbacks related to the global vm.swappiness setting.

Good news for systemd-oomd users and others wanting more control over Linux swapiness behavior.