Ads blocking with OpenBSD unbound(8)
1838 words, 9 minutes
The Internet is full of Ads and Trackers. And a way to avoid those is to simply not reach the stinky servers. This can be partially done using a local DNS resolver.
This article is a reboot of both the 2019 Blocking Ads using unbound on OpenBSD and Storing unbound logs into InfluxDB posts ; hopefully improved.
Introduction
DNS Ads blocking is fairly simple: when you were supposed to make an Internet request to some servers known to host Ads and Trackers, then you just don’t!
This requires you to set up and maintain a smart DNS server. You also have to tell your devices (smartphones, tablets, computers …) to use it. Under the hood, the DNS server tells your devices that the domain names they’re looking for don’t exist.
There are such ready-to-use solutions available. Pi-hole and AdGuard Home are some well-known solutions. uBlock Origin works in another way but uses the same kind of algorithm to protect your privacy: detects Bad resources and not let your go there.
Here, the bad domain names are grabbed using some of the same sources also used by those projects.
Ingredients needed for this recipe:
- Grafana to render the statistics ;
- InfluxDB to store the information ;
- syslogd(8) and awk(1) to turn DNS queries into statistics ;
- collectd(1) and shell script to store unbound statistics and logs ;
- unbound(8) and shell script to get and block DNS queries.
The DNS server
Looks like unbound(8) came in with OpenBSD 5.2.
Anyway, v1.15.0 is now available stock in OpenBSD 7.1/amd64.
Sourcing Ads and Trackers lists
I’m using a combinaison of sources that are used by Pi-hole, AdGuard Home and uBlock. I write a simple shell script that parses the lists and turn them into a format that unbound(8) will understand:
# cat /home/scripts/unbound-adhosts
#!/bin/sh
PATH="/bin:/sbin:/usr/bin:/usr/sbin"
_tmp="$(mktemp)" # Temp file to use while parsing
_out="/var/unbound/etc/unbound-adhosts.conf" # Unbound formatted zone file
# AdGuard Home
function adguardhome {
# AdGuard DNS filter
_src="https://adguardteam.github.io/AdGuardSDNSFilter/Filters/filter.txt"
ftp -MVo - "$_src" | \
sed -nre 's/^\|\|([a-zA-Z0-9\_\-\.]+)\^$/local-zone: "\1" always_nxdomain/p'
# AdAway default blocklist
_src="https://adaway.org/hosts.txt"
ftp -MVo - "$_src" | \
awk '/^127.0.0.1 / { print "local-zone: \"" $2 "\" always_nxdomain" }'
}
# From Pi-hole
function stevenblack {
_src="https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts"
ftp -MVo - "$_src" | \
awk '/^0.0.0.0 / { print "local-zone: \"" $2 "\" always_nxdomain" }'
}
# StopForumSpam, toxic domains
function stopforumspam {
_src="https://www.stopforumspam.com/downloads/toxic_domains_whole.txt"
ftp -MVo - "$_src" | \
awk '{ print "local-zone: \"" $1 "\" always_nxdomain" }'
}
# uBlock Origin
function ublockorigin {
# Malicious Domains Unbound Blocklist
_src="https://malware-filter.gitlab.io/malware-filter/urlhaus-filter-unbound.conf"
ftp -MVo - "$_src" | grep '^local-zone: '
# Peter Lowe's Ad and tracking server list
_src="https://pgl.yoyo.org/adservers/serverlist.php?showintro=0;hostformat=hosts"
ftp -MVo - "$_src" | \
awk '/^127.0.0.1 / { print "local-zone: \"" $2 "\" always_nxdomain" }'
# AdGuard Français
_src="https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/FrenchFilter/sections/adservers.txt"
ftp -MVo - "$_src" | \
sed -nre 's/^\|\|([a-zA-Z0-9\_\-\.]+)\^.*$/local-zone: "\1" always_nxdomain/p'
}
# Grab and format the data
adguardhome >> "$_tmp"
stevenblack >> "$_tmp"
stopforumspam >> "$_tmp"
ublockorigin >> "$_tmp"
# Clean entries
sed -re 's/\.\" always/" always/' "$_tmp" | \
egrep -v "\"(t.co)\"" | \
sort -u -o "$_tmp"
chmod 0644 "$_tmp"
# Take action is required
diff -q "$_out" "$_tmp" 1>/dev/null
case $? in
0) rm "$_tmp" && exit 0;;
1)
mv "$_tmp" "$_out" && \
doas -u _unbound unbound-checkconf 1>/dev/null && \
exec doas -u _unbound unbound-control reload 1>/dev/null
;;
*) echo "$0: something bad happened!"; exit 1;;
esac
exit 0
#EOF
Cron regularly synchronizes the list content with a dedicated unbound(8) zone file:
# crontab -l
(...)
# Update DNS block list
0~5 */6 * * * -s /home/scripts/unbound-adhosts
(...)
The zone file content can now be used by unbound(8).
Configuration
Enable statistics, configure logs, include the Ads/Trackers FQDN zone file:
# cat /var/unbound/etc/unbound.conf
(...)
statistics-cumulative: yes
extended-statistics: yes
(...)
use-syslog: yes
log-queries: no
log-replies: yes
log-local-actions: yes
(...)
include: /var/unbound/etc/unbound-adhosts.conf
(...)
Then apply the new unbound(8) configuration:
# rcctl restart unbound
From now on, each time a client will request DNS resolution for a bad domain, it’ll get an NXDOMAIN and the query will not be processed.
The usage data
The logs and metrics end in InfluxDB so that I can render a pretty dashboard. There’s nothing special to do on the InfluxDB side. Simply create the database(s) and send data to it/those.
Collect the metrics
A shell script parses unbound statistics and write them down into a special InfluxDB measurement:
# cat /home/scripts/collectd-unbound
#!/bin/sh
#
# CollectD Exec unbound(8) stats
# Configure "extended-statistics: yes"
#
PATH="/bin:/sbin:/usr/bin:/usr/sbin"
HOSTNAME="${COLLECTD_HOSTNAME:-$(hostname -s)}"
INTERVAL="${COLLECTD_INTERVAL:-10}"
while sleep "$INTERVAL"; do
doas -u _unbound unbound-control stats_noreset | \
egrep -v "^(histogram\.|time\.now|time\.elapsed)" | \
sed -re "s;^([^=]+)=([0-9\.]+);PUTVAL $HOSTNAME/exec-unbound/gauge-\1 interval=$INTERVAL N:\2;"
awk -v h=$HOSTNAME -v i=$INTERVAL \
'END { print "PUTVAL " h "/exec-unbound/gauge-num.adhosts interval=" i " N:" FNR }' \
/var/unbound/etc/unbound-adhosts.conf
done
exit 0
#EOF
# cat /etc/doas.conf
(...)
permit nopass _collectd as _unbound cmd unbound-control
(...)
# cat /etc/collectd.conf
(...)
<Plugin exec>
Exec _collectd "/home/scripts/collectd-unbound"
</Plugin>
(...)
# rcctl restart collectd
In InfluxDB, the data will look like this:
> SELECT * FROM "exec_value" WHERE "instance"='unbound' ORDER BY DESC LIMIT 10
name: exec_value
time host instance type type_instance value
---- ---- -------- ---- ------------- -----
2022-10-02T17:03:01.66013246Z openbsd unbound gauge num.query.authzone.down 0
2022-10-02T17:03:01.660101373Z openbsd unbound gauge num.query.authzone.up 0
2022-10-02T17:03:01.660069948Z openbsd unbound gauge key.cache.count 4030
2022-10-02T17:03:01.660033432Z openbsd unbound gauge infra.cache.count 491
2022-10-02T17:03:01.659930095Z openbsd unbound gauge rrset.cache.count 37499
2022-10-02T17:03:01.659893329Z openbsd unbound gauge msg.cache.count 108713
2022-10-02T17:03:01.659857007Z openbsd unbound gauge unwanted.replies 9
2022-10-02T17:03:01.659820476Z openbsd unbound gauge unwanted.queries 0
2022-10-02T17:03:01.659784111Z openbsd unbound gauge num.query.aggressive.NXDOMAIN 882
2022-10-02T17:03:01.659747595Z openbsd unbound gauge num.query.aggressive.NOERROR 256
Parse the logs
OpenBSD syslogd(8) has a feature that allows sending some logs to an external program. I decided I would write an awk(1) script that you get the logs from syslogd, parse and format them into an InfluxDB proper dataset and use curl(1) to actually save the data.
Authentication is configured on my InfluxDB instance. So curl(1) has to use login/password to be able to store the data. But I noticed that if you use the “–user” flag, then one can see the credentials using ps(1). So I’m using an extra credential file for curl(1).
# cat /home/scripts/unbound-logs2influxdb
#!/usr/bin/awk -f
BEGIN {
# Build an associative array (_ptr[ip]=hostname) of known DNS clients.
_fs = FS; FS = "[\" ]+" # Dirty hack to parse unbound logs.
_ptr["127.0.0.1"] = "localhost"
while (getline < "/var/unbound/etc/unbound-tumfatig.conf") {
if ($0 ~ /^local-data-ptr:/) { # only parse PTR.
split($3, _fqdn, "\."); _ptr[$2] = _fqdn[1]
}
}
close($0)
FS = _fs # Rollback dirty hack.
}
$3 == "unbound:" && $5 == "info:" { # Only parse unbound info logs.
if($7 == "static") { # Local zone: authoritative DNS.
split($8, _client, "@") # Client format is IP@PORT
if (_ptr[_client[1]] == "") { _host = "<unknown>" } # If no PTR.
else { _host = _ptr[_client[1]] }
_rec = "unbound_static,host=" $2 ",name=" $9
_rec = _rec ",type=" $10 ",class=" $11
_rec = _rec ",clientip=" _client[1]
_rec = _rec ",client=" _host " matched=1i"
} else if($7 == "always_nxdomain") { # Local zone: AD blocks.
split($8, _client, "@") # Client format is IP@PORT
if (_ptr[_client[1]] == "") { _host = "<unknown>" } # If no PTR.
else { _host = _ptr[_client[1]] }
_rec = "unbound_adblock,host=" $2 ",name=" $6
_rec = _rec ",type=" $10",class=" $11
_rec = _rec ",clientip=" _client[1]
_rec = _rec ",client=" _host " matched=1i"
} else if(NF == 13) { # DNS queries have 13 fields.
if (_ptr[$6] == "") {
_host = "<unknown>" # Set hostname to '<unknown>'
} else { _host = _ptr[$6] } # if no PTR exists in zone file.
_rec = "unbound_queries,host=" $2 ",name=" $7 ",clientip=" $6
_rec = _rec ",client=" _host ",type=" $8 ",class=" $9
_rec = _rec ",return_code=" $10 ",from_cache=" $12
_rec = _rec " time_to_resolve=" $11 ",response_size=" $13 "i"
}
# Build Influxdb protocol line using curl
_cmd = "/usr/local/bin/curl -s -XPOST "
_cmd = _cmd "-K /home/scripts/unbound-logs2influxdb.conf "
_cmd = _cmd "--data-binary \"" _rec "\""
# Run the curl command = Insert data in InfluxDB
system(_cmd)
}
# cat /home/scripts/unbound-logs2influxdb.conf
# InfluxDB credentials
url = "https://influxdb_host:8086/write?db=db_name&precision=s"
user = "db_user:db_pass"
The script is run by syslogd(8) and the configuration file contains credentials. So both files require special care regarding permissions and ownership:
# ls -alh /home/scripts/unbound-logs2influxdb*
-rwxr-x--- 1 root _syslogd 1.9K Oct 2 16:04 /home/scripts/unbound-logs2influxdb*
-rw-r----- 1 root _syslogd 505B Sep 29 00:51 /home/scripts/unbound-logs2influxdb.conf
syslogd(8) has a special configuration to allow unbound(8) logs and only them to be send and parsed by the script:
# cat /etc/syslog.conf
(...)
!!unbound
*.* |/home/scripts/unbound-logs2influxdb
!*
(...)
# rcctl restart syslogd
20221014 UPDATE: I’m running syslogd(8) with the -Z
flag for historical
reasons. If you don’t, the awk script will have to be modified to match
field numbers. Thanks @MattPovey2 for the note.
The parsed logs can now be queried from influxdb:
> SELECT * FROM "unbound_adblock" ORDER BY DESC LIMIT 5
name: unbound_adblock
time class client clientip host matched name type
---- ----- ------ -------- ---- ------- ---- ----
2022-10-02T22:14:24Z IN ThinkPad-de-Joel 192.0.0.16 unbound 1 s.youtube.com. A
2022-10-02T22:13:35Z IN - 192.0.0.12 unbound 1 www.googleadservices.com. HTTPS
2022-10-02T22:13:35Z IN - 192.0.0.12 unbound 1 www.googleadservices.com. A
2022-10-02T22:13:34Z IN - 192.0.0.12 unbound 1 s.youtube.com. HTTPS
2022-10-02T22:13:34Z IN - 192.0.0.12 unbound 1 s.youtube.com. A
The dashboard
Doing things is great but checking what you’re doing is better. You could regularly run influxdb commands and even parse results and send emails. But you can also set up a moootiful Web page with Grafana.
One can get the corresponding JSON file from the Grafana dashboard gallery . A local copy is also available here .
Extra - DNS performance
For the most impatients and/or curious, it is possible to benchmark unbound(8) using commonly used domain names. Grab and parse the Top 10 milion domains (based on Open PageRank data) so that they can be used by dnsperf(1).
# pkg_add dnsperf
# ftp https://www.domcop.com/files/top/top10milliondomains.csv.zip
Trying 94.130.193.220...
Requesting https://www.domcop.com/files/top/top10milliondomains.csv.zip
100% |************************************************************| 112 MB 00:09
117800727 bytes received in 9.77 seconds (11.49 MB/s)
# unzip top10milliondomains.csv.zip
# awk -F '[",]' '{ if($5 != "Domain") { print $5 " A" }; \
if($5 ~/^[a-k]/) { print $5 " MX" }; if(FNR == 100000) exit }' \
top10milliondomains.csv > top100k.txt
# dnsperf -s 192.168.0.1 -c 5 -d top100k.txt
Statistics:
Queries sent: 145937
Queries completed: 145821 (99.92%)
Queries lost: 116 (0.08%)
Response codes: NOERROR 137226 (94.11%), SERVFAIL 278 (0.19%), NXDOMAIN 8317 (5.70%)
Average packet size: request 33, response 79
Run time (s): 236.612169
Queries per second: 616.286984
Average Latency (s): 0.155683 (min 0.000112, max 4.990909)
Latency StdDev (s): 0.300890
You can see that unbound(8) replies but is a bit out of power. Not all queries were served. And collectd seemed to have difficulty getting some of the stats during such load.
Looking at the logs, warnings popped out:
warning: cannot increase max open fds from 512 to 4152
warning: continuing with less udp ports: 460
warning: increase ulimit or decrease threads, ports in config to remove this warning
This means my unbound configuration is not tuned properly for such a load. In real conditions, I’m way bellow 8 req/s. So it’ll be ok for me.
And that’s all for now!