OpenBSD Amsterdam

We're running dedicated vmm(4)/vmd(8) servers to host opinionated VMs.

The post written about rdist(1) on johan.huldtgren.com sparked us to write one as well. It's a great, underappreciated, tool. And we wanted to show how we wrapped doas(1) around it.

There are two services in our infrastructure for which we were looking to keep the configuration in sync and to reload the process when the configuration had indeed changed. There is a pair of nsd(8)/unbound(8) hosts and a pair of hosts running relayd(8)/httpd(8) with carp(4) between them.

We didn't have a requirement to go full configuration management with tools like Ansible or Salt Stack. And there wasn't any interest in building additional logic on top of rsync or repositories.

Enter rdist(1), rdist is a program to maintain identical copies of files over multiple hosts. It preserves the owner, group, mode, and mtime of files if possible and can update programs that are executing.

The only tricky part with rdist(1) is that in order to copy files and restart services, owned by a privileged user, has to be done by root. Our solution to the problem was to wrap doas(1) around rdist(1).

We decided to create a separate user account for rdist(1) to operate with on the destination host, for example:

ns2# useradd -m rupdate

Create an ssh key on the source host where you want to copy from:

ns1# ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_rdist

Copy the public key to the destination host for the rupdate user in .ssh/authorized_keys.

In order to wrap doas(1) around rdistd(1) we have to rename the original file. It's the only way we were able to do this.

Move rdistd to rdistd-orig on the destination host:

ns2# mv /usr/bin/rdistd /usr/bin/rdistd-orig

Create a new shell script rdistd with the following:

#!/bin/sh
/usr/bin/doas /usr/bin/rdistd-orig -S

Make it executable:

ns2# chmod 555 /usr/bin/rdistd

Add rupdate to doas.conf(5) like:

permit nopass rupdate as root cmd /usr/bin/rdistd
permit nopass rupdate as root cmd /usr/bin/rdistd-orig

Once that is all done we can create the files needed for rdist(1).

To copy the nsd(8) and unbound(8) configuration we created a distfile like:

HOSTS = ( rupdate@ns2.example.com )

FILES = ( /var/nsd )

EXCL = ( nsd.conf *.key *.pem )

${FILES} -> ${HOSTS}
	install ;
	except /var/nsd/db ;
	except /var/nsd/etc/${EXCL} ;
	except /var/nsd/run ;
	special "logger rdist update: $REMFILE" ;
	cmdspecial "rcctl reload nsd" ;

unbound:
/var/unbound/etc/unbound.conf -> ${HOSTS}
	install ;
	special "logger rdist update: $REMFILE" ;
	cmdspecial "rcctl reload unbound" ;

The distfile describes the destination HOSTS, the FILES which need to be copied and need to be EXCLuded. When it runs it will copy the selected FILES to the destination HOSTS, except the directories listed.

The install command is used to copy out-of-date files and/or directories.

The except command is used to update all of the files in the source list except for the files listed in name list.

The special command is used to specify sh(1) commands that are to be executed on the remote host after the file in name list is updated or installed.

The cmdspecial command is similar to the special command, except it is executed only when the entire command is completed instead of after each file is updated.

In our case the unbound(8) config doesn't change very often, so we used a label to only update this when needed. With:

ns1# rdist unbound

To keep our relayd(8)/httpd(8) in sync we did something like:

HOSTS = ( rupdate@relayd2.example.com )

FILES = ( /etc/acme /etc/ssl /etc/httpd.conf /etc/relayd.conf /etc/acme-client.conf )

${FILES} -> ${HOSTS}
	install ;
	special "logger rdist update: $REMFILE" ;
	cmdspecial "rcctl restart relayd httpd" ;

If you want cron(8) to pick this via the system script daily(8) you can save the file as /etc/Distfile.

To make sure the correct username and key are used you can add this to your .ssh/config file:

Host ns2.example.com
	User rupdate
	IdentityFile ~/.ssh/id_ed25519_rdist

When you don't store the distfile in /etc you can add the following to your .profile:

alias rdist='rdist -f ~/distfile'

Running rdist will result in the following type of logging on the destination host:

==> /var/log/daemon <==
Nov 13 09:59:15 name2 rdistd-orig[763]: ns2: startup for ns1.example.com

==> /var/log/messages <==
Nov 13 09:59:15 ns2 rupdate: rdist update: /var/nsd/zones/reverse/192.168.10.0

==> /var/log/daemon <==
Nov 13 09:59:16 ns2 nsd[164]: zone 10.168.192.in-addr.arpa read with success                     

You can follow us on Twitter and Mastodon.

OpenBSD Amsterdam was in search of a lightweight toolset to keep track of resource usage, at a minimum the CPU load generated by the vmm(4)/vmd(8) hosts and the traffic from and to the hosts. A couple of weeks ago we ended up with a workable MRTG setup. While it worked, it didn't look very pretty.

In a moment of clarity, we thought about using RRDtool. Heck, why shouldn't we give it a try? From the previous tooling, we already had some required building blocks in place to make MRTG understand the CPU Cores and uptime from OpenBSD.

Before we start:

# pkg_add rrdtool

We decided to split the collection of the different OIDs (SNMP Object Identifiers) into three different scripts, which cron(1) calls, from a wrapper script.

  • uptime.sh
  • cpu_load.sh
  • interface.sh

uptime.sh

#!/bin/sh
test -n "$1" || exit 1
HOST="$1"
COMMUNITY="public"
UPTIMEINFO="/tmp/${HOST}-uptime.txt"
TICKS=$(snmpctl snmp get ${HOST} community ${COMMUNITY} oid hrSystemUptime.0 | cut -d= -f2)
DAYS=$(echo "${TICKS}/8640000" | bc -l)
HOURS=$(echo "0.${DAYS##*.} * 24" | bc -l)
MINUTES=$(echo "0.${HOURS##*.} * 60" | bc -l)
SECS=$(echo "0.${MINUTES##*.} * 60" | bc -l)
test -n "$DAYS" && printf '%s days, ' "${DAYS%.*}" > ${UPTIMEINFO}
printf '%02d\\:%02d\\:%02d\n' "${HOURS%.*}" "${MINUTES%.*}" "${SECS%.*}" >> ${UPTIMEINFO}

This is a seperate script, due to the uptime usage of both hosts in both graphs.

The origins for this script can be found detailled in our MRTG Setup.

cpu_load.sh

test -n "$1" || exit 1
HOST="$1"
COMMUNITY="public"
RRDFILES="/var/rrdtool"
IMAGES="/var/www/htdocs"
WATERMARK="OpenBSD Amsterdam - https://obsda.ms"
RRDTOOL="/usr/local/bin/rrdtool"
CPUINFO="/tmp/${HOST}-cpu.txt"
UPTIME=$(cat /tmp/${HOST}-uptime.txt)
NOW=$(date "+%Y-%m-%d %H:%M:%S %Z" | sed 's/:/\\:/g')

if ! test -f "${RRDFILES}/${HOST}-cpu.rrd"
then
echo "Creating ${RRDFILES}/${HOST}-cpu.rrd"
${RRDTOOL} create ${RRDFILES}/${HOST}-cpu.rrd \
        --step 300 \
        DS:ds0:GAUGE:600:U:U \
        RRA:MAX:0.5:1:20000
fi

snmpctl snmp walk ${HOST} community ${COMMUNITY} oid hrProcessorLoad | cut -d= -f2 > ${CPUINFO}
CORES=$(grep -cv "^0$" ${CPUINFO})
CPU_LOAD_SUM=$(awk '{sum += $1} END {print sum}' ${CPUINFO})
CPU_LOAD=$(echo "scale=2; ${CPU_LOAD_SUM}/${CORES}" | bc -l)

${RRDTOOL} update ${RRDFILES}/${HOST}-cpu.rrd N:${CPU_LOAD}

${RRDTOOL} graph ${IMAGES}/${HOST}-cpu.png \
        --start -43200 \
        --title "${HOST} - CPU" \
        --vertical-label "% CPU Used" \
        --watermark "${WATERMARK}" \
        DEF:CPU=${RRDFILES}/${HOST}-cpu.rrd:ds0:AVERAGE \
        AREA:CPU#FFCC00 \
        LINE2:CPU#CC0033:"CPU" \
        GPRINT:CPU:MAX:"Max\:%2.2lf %s" \
        GPRINT:CPU:AVERAGE:"Average\:%2.2lf %s" \
        GPRINT:CPU:LAST:" Current\:%2.2lf %s\n" \
        COMMENT:"\\n" \
        COMMENT:"  SUM CPU Load / Active Cores = % CPU Used\n" \
        COMMENT:"  Up for ${UPTIME} at ${NOW}"

On the first run, RRDtool will create the .rrd file. On every subsequent run, it will update the file with the collected values and update the graph.

The origins for this script can be found detailled in our MRTG Setup.

interface.sh

test -n "$1" || exit 1                                                                             
test -n "$2" || exit 1                                                                             
HOST="$1"                                                                                          
INTERFACE="$2"                                                                                     
COMMUNITY="public"                                                                                 
RRDFILES="/var/rrdtool"
IMAGES="/var/www/htdocs"
WATERMARK="OpenBSD Amsterdam - https://obsda.ms"
RRDTOOL="/usr/local/bin/rrdtool"
UPTIME=$(cat /tmp/${HOST}-uptime.txt)
NOW=$(date "+%Y-%m-%d %H:%M:%S %Z" | sed 's/:/\\:/g')                                              

if ! test -f "${RRDFILES}/${HOST}-${INTERFACE}.rrd"                                                
then
echo "Creating ${RRDFILES}/${HOST}-${INTERFACE}.rrd"                                               
${RRDTOOL} create ${RRDFILES}/${HOST}-${INTERFACE}.rrd \                                           
        --step 300 \
        DS:ds0:COUNTER:600:0:1250000000 \
        DS:ds1:COUNTER:600:0:1250000000  \
        RRA:AVERAGE:0.5:1:600 \
        RRA:AVERAGE:0.5:6:700 \
        RRA:AVERAGE:0.5:24:775 \
        RRA:AVERAGE:0.5:288:797 \
        RRA:MAX:0.5:1:600 \
        RRA:MAX:0.5:6:700 \
        RRA:MAX:0.5:24:775 \
        RRA:MAX:0.5:288:797
fi

IN=$(snmpctl snmp get ${HOST} community ${COMMUNITY} oid ifInOctets.${INTERFACE} | cut -d= -f2)    
OUT=$(snmpctl snmp get ${HOST} community ${COMMUNITY} oid ifOutOctets.${INTERFACE} | cut -d= -f2)  
DESCR=$(snmpctl snmp get ${HOST} community ${COMMUNITY} oid ifDescr.${INTERFACE} | cut -d= -f2 | tr
-d '"')

${RRDTOOL} update ${RRDFILES}/${HOST}-${INTERFACE}.rrd N:${IN}:${OUT}                              

${RRDTOOL} graph ${IMAGES}/${HOST}-${INTERFACE}.png \                                              
        --start -43200 \
        --title "${HOST} - ${DESCR}" \
        --vertical-label "Bits per Second" \
        --watermark "${WATERMARK}" \
        DEF:IN=${RRDFILES}/${HOST}-${INTERFACE}.rrd:ds0:AVERAGE \                                  
        DEF:OUT=${RRDFILES}/${HOST}-${INTERFACE}.rrd:ds1:AVERAGE \                                 
        CDEF:IN_CDEF="IN,8,*" \
        CDEF:OUT_CDEF="OUT,8,*" \
        AREA:IN_CDEF#00FF00:"In " \
        GPRINT:IN_CDEF:MAX:"Max\:%5.2lf %s" \
        GPRINT:IN_CDEF:AVERAGE:"Average\:%5.2lf %s" \                                              
        GPRINT:IN_CDEF:LAST:" Current\:%5.2lf %s\n" \                                              
        LINE2:OUT_CDEF#0000FF:"Out" \
        GPRINT:OUT_CDEF:MAX:"Max\:%5.2lf %s" \
        GPRINT:OUT_CDEF:AVERAGE:"Average\:%5.2lf %s" \                                             
        GPRINT:OUT_CDEF:LAST:" Current\:%5.2lf %s\n" \                                             
        COMMENT:"\\n" \
        COMMENT:"  Up for ${UPTIME} at ${NOW}"

To pinpoint the network interface you want to measure the bandwith for, this command prints the available interfaces:

snmpctl snmp walk [host] community [community] oid ifDescr

This will output a list like:

ifDescr.1="em0"
ifDescr.2="em1"
ifDescr.3="enc0"
ifDescr.4="lo0"
ifDescr.5="bridge880"
ifDescr.6="vlan880"
ifDescr.13="pflog0"
ifDescr.669="tap0"
ifDescr.670="tap1"

The number behind ifDescr is the one that you need to feed to interface.sh, for example:

# interface.sh 5

Finally the wrapper.sh script calls all the aforementioned scripts:

#!/bin/sh
SCRIPTS="/var/rrdtool"
for i in $(jot 2 1); do ${SCRIPTS}/uptime.sh host${i}.domain.tld; done
for i in $(jot 2 1); do ${SCRIPTS}/cpu_load.sh host${i}.domain.tld; done
${SCRIPTS}/interface.sh host1.domain.tld 12
${SCRIPTS}/interface.sh host2.domain.tld 11

The resulting graphs:

To serve the graphs we use httpd(8) with the following config:

server "default" {
        listen on * port 80
        location "/.well-known/acme-challenge/*" {
                root "/acme"
                request strip 2
        }
        location * {
                block return 302 "https://$HTTP_HOST$REQUEST_URI"
        }
}

server "default" {
        listen on * tls port 443
        tls {
                certificate "/etc/ssl/default-fullchain.pem"
                key "/etc/ssl/private/default.key"
        }
        location "/.well-known/acme-challenge/*" {
                root "/acme"
                request strip 2
        }
        root "/htdocs"
}

All the scripts can be found in our Git Repository.

You can follow us on Twitter and Mastodon.

For OpenBSD Amsterdam we were looking for a lightweight method to keep track of, at least, traffic and CPU load generated by the vmm(4)/vmd(8) hosts.

We had some experience with Observium, which doesn't run well on OpenBSD, and LibreNMS. For some reason we were unable to get LibreNMS working on 6.4 nor on -current (6.5), so we decided to look elsewhere.

Considering our needs and what is available on OpenBSD we decided to go back in time and have a look at MRTG again.

Getting MRTG working with OpenBSD snmpd and collecting traffic is not a very big deal, cfgmaker is your friend! Getting CPU load was more of a challenge.

First we had to figure out what the SNMP OID were for the CPU, as the default ones, in the MRTG documentation didn't cover them. We also had to consider the multi-core machines we are running.

After some digging in the MIBS we found 'hrProcessorLoad' in /usr/local/share/mibs/HOST-RESOURCES-MIB.txt.

$ snmpctl snmp walk <host> community <string> oid hrProcessorLoad
hrProcessorLoad.1=24
hrProcessorLoad.2=57
hrProcessorLoad.3=33
hrProcessorLoad.4=26
hrProcessorLoad.5=21
hrProcessorLoad.6=25
hrProcessorLoad.7=77
hrProcessorLoad.8=68
hrProcessorLoad.9=61
hrProcessorLoad.10=54
hrProcessorLoad.11=24
hrProcessorLoad.12=50
hrProcessorLoad.13=0
hrProcessorLoad.14=0
hrProcessorLoad.15=0
hrProcessorLoad.16=0
hrProcessorLoad.17=0
hrProcessorLoad.18=0
hrProcessorLoad.19=0
hrProcessorLoad.20=0
hrProcessorLoad.21=0
hrProcessorLoad.22=0
hrProcessorLoad.23=0
hrProcessorLoad.24=0

With some Startpage/DuckDuckGo-fu we stumbled upon a script that pulled a specific OID and ran some calculations on the CPU load based on the total number of cores and the sum of the load across these cores.

Here is the heavily modified version of that script.

#!/bin/sh
test -n "$1" || exit 1
HOST="$1"
CPUINFO="/tmp/cpuinfo.${HOST}"

snmpctl walk ${HOST} oid hrProcessorLoad | cut -d= -f2 > ${CPUINFO}
CORES=$(grep -cv "^0$" ${CPUINFO})
CPU_LOAD_SUM=$(awk '{sum += $1} END {print sum}' ${CPUINFO})
CPU_LOAD=$(echo "scale=2; ${CPU_LOAD_SUM}/${CORES}" | bc -l)
echo "$CPU_LOAD"
echo "$CPU_LOAD"

It reads all the CPU information from the host and writes the load of each core in a temporary file in /tmp. The cores are counted and the sum is calculated. Since SMT / Hyper Threading is off by default, we excluded the cores which are not taking any load.

MRTG expects two values, it primairily operates in inbound and outbound traffic, we print the $CPU_LOAD twice. Job done! Not quite... It also expects the uptime to be presented in a readable format as well as the hostname.

So... TimeTicks here we come! To collect the uptime of an OpenBSD machine we need to query hrSystemUptime.

hrSystemUptime OBJECT-TYPE
    SYNTAX     TimeTicks
    MAX-ACCESS read-only
    STATUS     current
    DESCRIPTION
        "The amount of time since this host was last
        initialized.  Note that this is different from
        sysUpTime in the SNMPv2-MIB [RFC1907] because
        sysUpTime is the uptime of the network management
        portion of the system."
    ::= { hrSystem 1 }

The snmpctl command is:

$ snmpctl snmp get <host> community <string> oid hrSystemUptime.0 
0=15259107187

In order to get anything that resembles time we can read there are a number of calculations that need to happen.

15259107187 / 8640000 = days (+remainder) = 176.6107855324074
0.6107855324074 * 24 = hours (+remainder) = 14.65885277777778
0.65885277777778 * 60 = minutes (+remainder) = 39.53116666666667
0.53116666666667 * 60 = seconds.milliseconds = 31.87

Together with Roman Zolotarev we came up with the following part of the script:

TICKS=$(snmpctl snmp get ${HOST} oid hrSystemUptime.0 | cut -d= -f2)
DAYS=$(echo "${TICKS}/8640000" | bc -l)
HOURS=$(echo "0.${DAYS##*.} * 24" | bc -l)
MINUTES=$(echo "0.${HOURS##*.} * 60" | bc -l)
SECS=$(echo "0.${MINUTES##*.} * 60" | bc -l)
test -n "$DAYS" && printf '%s days, ' "${DAYS%.*}"
printf '%02d:%02d:%02d\n' "${HOURS%.*}" "${MINUTES%.*}" "${SECS%.*}"

Which results in 176 days, 14:39:31

The last part which MRTG expects is the hostname. This can be collected with:

snmpctl snmp get ${HOST} oid sysName.0 | cut -d= -f2 | tr -d '"'

All done!

What MRTG gets from the script is something like:

3.50
3.50
138 days, 02:37:03
server1.openbsd.amsterdam

The complete script can be found in our Git Repository.

You can follow us on Twitter and Mastodon.