Working from home has raised the importance of monitoring my internet connectivity. But a simple binary check won't suffice.
Even with a simple setup like I have at home - some switches and routers and an ISP modem - it's hard to identify which of the components is causing a QOS issue or even completely disconnects me from the internet.

Is it my ISP that's cutting service? Maybe the router is lagging behind? Should I change an RJ cable or switch providers?

A few colleagues have expressed interest in how I monitor my own connectivity using a simple RPi 1 B+, so I've decided to simply share.

Methodology

I monitor connectivity and QOS by pinging the network equipment.
We want to monitor each and every hop (i.e. device) in our own home, as well as the ISPs gateway and a WAN IP or site to verify actual internet connectivity:

  1. The RPi is responsible for performing the pings, gathering the results, displaying data and raising alerts.
  2. We need to connect the RPi as far away as possible from the ISPs modem or gateway to the internet. This would allow us to calculate ping latency correctly (more on that later) as well as identify correct internet cutout reasons.
  3. So if you have a switch, connected to a router, connected to an ISP modem, you would connect your RPi to said switch.
  4. We also need to pick a stable WAN IP to validate internet access with. We need a stable service, such as Google's 8.8.8.8 DNS. Reasoning being that if Google's DNS is down, we have bigger issues than our home connectivity.
    For redundancy you could ping multiple WAN ips, such as CloudFlare's 1.1.1.1 and OpenDNS' 208.67.222.222.

A typical home network setup might look something similar to this:

Installation

Currently the official installation script only supports armv7l and newer CPU architectures. Older architectures would have to resort to a package manager installation.
You can find out your RPi's architecture here, or follow according to type:

  • armv6, armv7a -> RPi 1, RPi 2, RPi zero:
    Install via apt:
$ sudo apt update
$ sudo apt install netdata -y
$ sudo systemctl start netdata
$ sudo systemctl enable netdata

Setting up Netdata

We need to make sure Netdata is binding to the right IP address.
Let's find out the RPi's IP address:

$ hostname -I | awk '{print $1}'

Then edit Netdata's configuration:

$ nano /etc/netdata/netdata.conf 

Update the file with your correct Ip address:

[global]
        run as user = netdata
        web files owner = root
        web files group = root
        # Netdata is not designed to be exposed to potentially hostile
        # networks. See https://github.com/netdata/netdata/issues/164
        bind socket to IP = YOUR_IP_HERE
        dbengine multihost disk space = 512

We're not gonna deal with securing Netdata from outside access. You may read about it here.

Testing

Let's restart Netdata:

$ sudo systemctl restart netdata
$ sudo systemctl status netdata

Now browse to http://YOUR_SERVER_IP_ADDRESS:19999 and you should see Netdata in action:

Setting up network monitoring with fping

<update date="29.01.2023">

Latest Netdata (1.37.x) requires fping 5.1.
While getting the testing version of fping 5.1 to work on my bulleseye 64bit RPi3B was easy, I couldn't get Netdata to trigger it. And no error was written to the logs either.
For now I had to reort to using the less capable go.d/ping.

</update>

It's important to remember that our goal is to monitor all the home network equipment as well as the internet gateway. We're gonna do this by having Netdata ping all hops periodically using fping.
Let's make sure fping is installed:

$ fping -v
fping: Version 4.2
fping: comments to david@schweikert.ch

If it's not available, simply install it:

$ sudo apt install -y fping

With Netdata, monitors are installed simply by copying templates from /usr/lib/netdata/ to /etc/netdata/. This is done using a built in script:

$ cd /etc/netdata/
$ ./edit-config fping.conf
$ ./edit-config health.d/fping.conf

Now edit the fping configuration:

$ nano fping.conf

We'll go over the setup line by line.
Here you'll need to enter the individual IP addresses of all of your home network equipment, as well as the WAN IP address we're using to validate internet access (8.8.8.8 in our case):

# a space separated list of hosts to fping
# we suggest to put names here and the IPs of these names in /etc/hosts

hosts="SWITCH_IP ROUTER_IP GATEWAY_IP 8.8.8.8"

We'll setup Netdata to ping all of these in bursts of 5 pings, each 3.3 minutes, and display information in a 10 minutes interval:

# The update frequency of the chart in minutes - the default is inherited from netdata

update_every=10

# The time in milliseconds (1 sec = 1000 ms) to ping the hosts
# by default 5 pings per host per iteration
# fping will not allow this to be below 20ms

ping_every=3333

The reason I've chosen these frequencies is that my trusty old RPi is already pre-occupied doing other things, and a higher rate proved to consume too many resources. With a dedicated RPi 3 you can reach almost a constant stream by setting update_every to 1 and ping_every to 20.

Let's restart Netdata:

$ sudo systemctl restart netdata

Wait 10 minutes and then check the fping section at http://YOUR_SERVER_IP_ADDRESS:19999/#menu_fping_submenu_8_8_8_8:

Monitoring your connection

Monitoring your connection is done via analyzing Netdata output and alarms.

Analyzing Netdata output

As observable in the image above, Netdata provides constant QOS data for each of your devices.

  • You get latency ("ping") data, which naturally accumulates over hops. So you can easily identify which of your devices contributes the most latency.
    My RPi to internet latency is ~65ms.
  • Ping loss data helps you identify which of your devices or gateways is the most problematic in terms of cutting your internet connection. Is it your home router or the ISPs modem? Netdata provides a quick and easy way to have a verdict.

Alarms

Alarms are logged events that appear after clicking the Alarms menu at the top. I usually switch straight to the logs tabs, which best TL;DRs the history of my connection:

Looking at the log above, we can deduce a couple of insights:

  1. At 11:18:11PM, the ISPs modem had a small hiccup. This is evident by the fact that out of all of my devices, only it and the WAN site have started to exhibit ping loss (I've smudged its IP for security reasons).
    To be more precise - this is the time in which the alarm was cleared, meaning regular service was restored.
  2. At 1:25:14AM the connectivity from house was cut out completely. As evident, the ISPs modem as well as all of my own equipment were functioning correctly, so it must have been something entirely at the ISPs side.
    At the time this image was taken, service has not been restored yet.

More Netdata insights

Prior to monitoring my connectivity, my RPi was used as a PiHole ad blocker and local DNS, and it still is. So I'm using Netdata to monitor this service health as well. All in all the RPi 1 has proven to be a capable machine.
Should I elaborate on the PiHole setup as well? Ping me in the comments please.