Trying out zswap

Now that I've got 64 GB of RAM to play around with, I'm trying out zswap.

I'm running swappiness at 200 (the maximum):
sudoedit /etc/sysctl.conf

# Swappiness
vm.swappiness=200

... and I'm running the nohang low-memory handler.

sudo add-apt-repository ppa:oibaf/test
sudo apt update
sudo apt install nohang
sudo systemctl enable --now nohang-desktop.service

I've set the priority of each swap drive to 1 (pri=1) in /etc/fstab:
<file system> <mount point> <type> <options> <dump> <pass>

UUID=21ebe95a-cdd6-40d1-b5a2-ff44a768b47d	none	swap	discard,noatime,pri=1	0	0

UUID=c39fbec2-4aa6-4255-b6c3-e9540b397713	none	swap	discard,noatime,pri=1	0	0

UUID=d4398e10-8183-4a5a-88a6-9d830a6f2a6d	none	swap	discard,noatime,pri=1	0	0

... so the drives are swapped to in round-robin fashion, to speed up the swapping.

zswap intercepts data that's bound to be swapped out to the swap drive(s), and compresses it, instead. Then, when that in-memory compressed swap pool is filled to the user-configurable percentage of total memory, it swaps it out to the swap drive(s).

Set up zswap:

sudo su
echo z3fold > /sys/module/zswap/parameters/zpool
echo 50 > /sys/module/zswap/parameters/max_pool_percent
echo lz4 > /sys/module/zswap/parameters/compressor
echo Y > /sys/module/zswap/parameters/enabled
echo z3fold >> /etc/initramfs-tools/modules
echo lz4 >> /etc/initramfs-tools/modules
update-initramfs -u
exit
exit

sudoedit /etc/default/grub
Add the part in bold:

GRUB_CMDLINE_LINUX_DEFAULT="noplymouth threadirqs preempt=full tsc=reliable numa=on nohz=1-11 rcu_nocbs=1-11 zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=50 zswap.zpool=z3fold"

sudo update-grub

Then reboot.

If one uses 50% of their memory (in this case, 32 GB), at 2:1 compression, that would give (32*2)+6=70 GB of swap space + 32 GB RAM = 102 GB of total space to work with, 32 GB more than it's actually got (64 GB RAM + 6 GB swap).

One would really have to seriously over-amp their memory usage to reach an out-of-memory condition... in which case nohang intervenes.

After I reboot, I'll run sudo nohang -m (the memory consumption test in nohang) to see how the system responds, and report back.

One can see whether zswap is enabled by:
dmesg | grep zswap

You should see something like:
[ 0.911797] zswap: loaded using pool lz4/z3fold

One can see the compression ratio after everything is set up and running by:
sudo bash -c 'echo "scale=2; " $(</sys/kernel/debug/zswap/stored_pages) " * 4096 /" $(</sys/kernel/debug/zswap/pool_total_size) | bc'
... although you'll get a 'divide by zero' runtime error if there is no compression.

One can see how it's working by:
sudo grep -R . /sys/kernel/debug/zswap/

[EDIT]
Huh... that didn't work at all... zswap didn't even compress any of the pages, nor did it swap to disk.

I even tried tail /dev/zero to consume all the memory... no compression of the pages, no swapping to disk.

Ok, I'm reverting the changes.

[EDIT 2]
Heh... I figured out why it's never swapping... it's because I'm a potato. :potato:

I'd configured sudoedit /etc/sysctl.conf vm.overcommit_ratio=85 and vm.overcommit_memory=2 so it could only take 85% of the total of memory and swap space... 64+6=70*0.85=10.5 GB. That's larger than the swap space, so the swap space is never touched.

I'm going to change that to vm.overcommit_ratio=99 and vm.overcommit_memory=2... that'll leave 700 MB of 'memory space' free (ie: it'll fill all of memory and all but 700 MB of swap space).

I did that on the recommendation of a Linux guru, so that in an OOM condition, the machine's still got a bit of memory to allow me the responsiveness to shut down a memory-gobbling application... and it apparently works. :crazy_face:

I'll update after I get everything done again.

[EDIT 3]
Before I re-enable zswap, I'm going to make sure I can actually swap data out to the swap drives.

sudoedit /etc/sysctl.conf

# Swappiness
vm.swappiness=200

# VM Settings
vm.compact_memory=1
vm.compaction_proactiveness=100
vm.overcommit_memory=2
vm.overcommit_ratio=99
vm.page-cluster=4
vm.zone_reclaim_mode=4
vm.watermark_scale_factor=125
vm.watermark_boost_factor = 15000
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.dirty_expire_centisecs = 1000
vm.dirty_writeback_centisecs = 250
vm.dirtytime_expire_seconds = 300

All of these settings are located at

/proc/sys/vm/

... you can either overwrite the files from sudo su with the necessary settings (for example: echo 125 > /proc/sys/vm/watermark_scale_factor), or put the settings in /etc/sysctl.conf.

1 Like

You can add a check (I'm not quite sure which value would be preferable... I'm using the swap file size here) with a conditional prior to executing the arithmetic:

sudo bash -c 'echo "scale=2";  if ($(</sys/kernel/debug/zswap/stored_pages) > 0) then; $(</sys/kernel/debug/zswap/stored_pages) " * 4096 /" $(</sys/kernel/debug/zswap/pool_total_size) | bc'

As long as it's greater than zero it will run the calculation. You can add an else clause after to reflect that swap wasn't in use.

It will prevent leaving execution of the script by error then, in case you want another operation to take place after.

Ok, I got it to use the swap drives... I had to uninstall nohang to do it... it would shut down any memory gobbler at ~1.2 GB of space left.

Now I can max out both memory and swap space to 99%, so now I can try out zswap. Then I'll reinstall and configure nohang.

[EDIT]
Ok, I got it working:

sudo grep -R . /sys/kernel/debug/zswap/

/sys/kernel/debug/zswap/same_filled_pages:7889
/sys/kernel/debug/zswap/stored_pages:171393
/sys/kernel/debug/zswap/pool_total_size:243048448
/sys/kernel/debug/zswap/duplicate_entry:0
/sys/kernel/debug/zswap/written_back_pages:0
/sys/kernel/debug/zswap/reject_compress_poor:35
/sys/kernel/debug/zswap/reject_kmemcache_fail:0
/sys/kernel/debug/zswap/reject_alloc_fail:0
/sys/kernel/debug/zswap/reject_reclaim_fail:0
/sys/kernel/debug/zswap/pool_limit_hit:0

sudo bash -c 'echo "scale=2; " $(</sys/kernel/debug/zswap/stored_pages) " * 4096 /" $(</sys/kernel/debug/zswap/pool_total_size) | bc'
2.84

That means the machine now acts as though it's got:
32 GB * 2.84 = 90.88 + 32 GB RAM + 6 GB Swap = 128.88 GB of working space. That's just over double what it's actually got.

And I figured out that I don't really need nohang, I just have to tweak the settings in /proc/sys/vm/ (and reflected in /etc/sysctl.conf) so that the system issues a signal to kill to any memory-gobbling program.

I've got most everything set up the way I want it, I just have to change /etc/sysctl.conf vm.overcommit_ratio and /proc/sys/vm/overcommit_ratio from 99 to 95, and everything should be good. That'll kill a memory-gobbling program when total space reaches 3.5 GB free, so the machine should remain fairly responsive.

2 Likes