Cache warmup to speed up your computer

If you've got lots of RAM and a small OS installation size (smaller than the amount of free RAM you've got), you can really speed your computer up by accessing all the files after the computer starts up. Then the files are cached in memory and needn't be fetched from the drive.

There are several ways to do this... some people cp (copy) the files to /dev/null, some dd the entire drive (this picks up sectors that have no data, however) and some cat (concatenate) files to /dev/null.

Here's what I've done:

#!/bin/bash

# NAME: cachewarmup.sh
# DESC: Adapted from https://askubuntu.com/questions/1184103/trying-to-cp-a-directory-to-dev-null/1184105#1184105
# DATE: 27 Feb 2023

# Change to root directory
cd /

# Check that we're sudo
[[ $(id -u) != 0 ]] && { echo "Must be called with sudo" >&2 ; exit 2 ; }

# Check that a proper starting path has been included as an argument
[[ $1 != /* ]] && { echo "Input flag must be the starting directory ( 'sudo cachewarmup /' for root directory )" >&2 ; exit 3 ; }

# NOTE: Put Omitted Paths below:
# NOTE: DO NOT USE VARIABLES SUCH AS $USER
# =============================
OP0="/mnt/"
OP1="/media/owner"
OP2=/dev/null
OP3=/dev/null
OP4=/dev/null
OP5=/dev/null
OP6=/dev/null
OP7=/dev/null
OP8=/dev/null
OP9=/dev/null
# =============================

SkipCnt=0	# Count of skipped files
StalCnt=0	# Count of files that existed last run, but don't now
ZeroCnt=0	# Count of zero sized files
DirCnt=0	# Count of directories
LinkCnt=0	# Count of symbolic links
CdevCnt=0	# Count of character devices
BdevCnt=0	# Count of block devices
PipeCnt=0	# Count of pipes
SockCnt=0	# Count of sockets
FileCnt=0	# Count of files cache-loaded

# Start timer
StartSec=$SECONDS

# Clear the screen
clear

echo "Updating Locate's database... please wait..."
# Update locate's database for recently added files
updatedb > /dev/null 2>&1

# Clear the screen
clear

# Hide the cursor
tput civis

# Set IFS delimiter to nothing
IFS=""

# You can use either one of these two, listed fastest to slowest
while read -r file; do
#for file in $(find "$1" -maxdepth 10000 -xdev -ignore_readdir_race); do

	# Count of skipped files
	if [[ "$file" =~ "$OP0" || "$file" =~ "$OP1" || "$file" =~ "$OP2" || "$file" =~ "$OP3" || "$file" =~ "$OP4" || "$file" =~ "$OP5" || "$file" =~ "$OP6" || "$file" =~ "$OP7" || "$file" =~ "$OP8" || "$file" =~ "$OP9" ]] ; then (( SkipCnt++ ))
	# Count of files that existed last run, but don't now
	elif [[ ! -e "$file" ]] ; then (( StalCnt++ ))
	# Count of zero sized files
	elif [[ ! -s "$file" ]] ; then (( ZeroCnt++ ))
	# Count of directories
	elif [[ -d "$file" ]] ; then (( DirCnt++ ))
	# Count of symbolic links
	elif [[ -h "$file" || -L "$file" ]] ; then (( LinkCnt++ ))
	# Count of character devices
	elif [[ -c "$file" ]] ; then (( CdevCnt++ ))
	# Count of block devices
	elif [[ -b "$file" ]] ; then (( BdevCnt++ ))
	# Count of pipes
	elif [[ -p "$file" ]] ; then (( PipeCnt++ ))
	# Count of sockets
	elif [[ -S "$file" ]] ; then (( SockCnt++ ))
	# File must exist, and not be any of the above
	elif [[ -f "$file" && -s "$file" ]] ; then
		# Process these files to cache-load them.
		# You can use any one of these three, listed fastest to slowest
		tar -cS --no-recursion --warning=none "$file" &>/dev/null
		# cp --preserve=all --reflink=never "$file" /dev/null
		# cat "$file" 1>/dev/null

		# Count of files cache-loaded
		(( FileCnt++ ))
	else
		# Count of any files not otherwise processed
		(( SkipCnt++ ))
	fi

	# Print stats to screen
	printf "  Directories:		%d\n"		"$DirCnt"
	printf "  Cached Files:		%d\n"		"$FileCnt"
	printf "  Symbolic Links:	%d\n"		"$LinkCnt"
	printf "  Block Devices:	%d\n"		"$BdevCnt"
	printf "  Character Devices:	%d\n"	"$CdevCnt"
	printf "  Pipes:		%d\n"			"$PipeCnt"
	printf "  Sockets:		%d\n"			"$SockCnt"
	printf "  Zero Sized files:	%d\n"		"$ZeroCnt"
	printf "  Skipped files:	%d\n"		"$SkipCnt"
	printf "  Stale files:		%d\n"		"$StalCnt"
	echo
	# Erase old screen output on lines 11, 12 and 13 (to account for wrap of long lines)
	tput cup 11 1;tput el;printf "%s" "$file"
	tput cup 12 0;tput el
	tput cup 13 0;tput el
	# Calculate elapsed time
	Elapsed=($SECONDS - $StartSec)
	# Print elapsed time to line 14
	tput cup 14 1;tput el;printf "Time %dh:%dm:%ds" "$((($Elapsed)/3600))" "$(((($Elapsed)%3600)/60))" "$((($Elapsed)%60))"
	# Return cursor to top-left
	tput cup 0 0
done <<<"$(locate "$1")"

# Make cursor visible
tput cnorm

You can learn about bash test operators here.

You'd save that file as cachewarmup to /usr/local/bin, then make it executable by right-clicking the file, selecting Properties, going to the Permissions tab, and selecting Allow executing file as program.

You'd call it like this:
sudo cachewarmup {path}

Or, if you want to nice it up, you can go to Zorin Menu > System Tools > Startup Applications:
Name: Cache Warmup
Command: nice gnome-terminal -- /bin/sh -c 'echo Warming up ARC cache...; sleep 60; sudo cachewarmup /; sleep 300' 19
That gives it lowest priority.

So for instance, to make it go through all files starting from the root directory:
sudo cachewarmup /

It automatically excludes any files under /mnt/* and /media/$USER/* (ie: files on external drives). You can add additional paths to exclude.

Now, most people running ext4 will have sysctl managing their memory cache... so I had to come up with something that works for those running ext4 or similar, and for those running the ZFS filesystem. The difference is, for a ZFS filesystem, those files are also cached in the ZFS ARC (Adaptive Replaceable Cache) and L2ARC (Level 2 ARC).

So for those running ZFS, you can force Linux to more aggressively clear out its VM cache, in favor of the ZFS ARC:
echo 150 | tee /proc/sys/vm/vfs_cache_pressure
echo 3 | tee /proc/sys/vm/drop_caches
echo 2 | tee /proc/sys/vm/overcommit_memory

I won't give advice on tuning ZFS to hold onto its cache more aggressively, as it takes quite a bit of tuning and testing, and is machine-specific... but you can make that ARC cache pretty much persistent until you reboot... and if you're able to force the tail end of the ZFS ARC cache into L2ARC (hint: increase l2arc_headroom to pick up a larger portion of the ARC, and enable L2ARC prefetch), then you get a boost the next time you boot, because L2ARC is non-volatile.

When I run the above after booting, my memory usage goes from 12% to ~82%. Practically all of my file system then resides in memory. Once I upgrade to 64 GB RAM, I'll have plenty of room to put everything into memory. Remember, idle memory is wasted time.

[EDIT]
There's still a glitch in the layout if a read/write error occurs, and for some reason it still accesses external drives... I'll work on it... after I get a new USB stick. Good thing I've got backup upon backup upon backup. :grinning:

[EDIT 2]
New USB stick obtained and populated (from backup) with the data that had been on the failed USB stick, further testing and improvement in progress.

[EDIT 3]
Whew! I finally got the final bug fixed... bash is a bit potato :potato: on string handling, and combined with regex, it's full-potato :potato::potato:... I was getting false positives for Skipped Files because of a blank string! Imagine that... a file path and file name, being equated to a blank string... that's a bug. Fixed now, I set $OP2 through $OP9 (the unused Omitted Paths variables) equal to /dev/null, and the code works. It's now ~4 times faster than the first code iteration.

2 Likes