A little while ago, I wrote an article about two wishes for dev tooling, in which one of the main wishes was a way to monitor code quality in a less pass/fail sense than normal testing. What if, instead of just tracking whether eslint reported any severe problems with your codebase, you could also keep track of how many warnings it logged? If instead of just tracking whether tests passed or failed, you could also keep track of how long they took to run? I'm using Code Climate for some code quality measures, but I wanted a way to track arbitrary other things, just like how GitHub Actions can run arbitrary code and give you some sort of output.
I finally got the kick of inspiration I needed reading Nelson Minar's blog about tracking CO₂ levels with a sensor, Telegraf, InfluxDB, and Grafana. Now, for my DIY system, I'm only going to use one of those components: InfluxDB.
InfluxDB is a tool built for serious, large, complex data problems. I'm using it here for simple, unserious vanity metrics. But in my defense, these metrics are eventually useful. The example above, the size of my node_modules directory on disk, isn't really that important. Sure, it'll have some effect on how the size of the server VM, but a fraction of this code will be transferred to the client or even loaded and used - a lot of this is dead code, or non-code things in modules like long README files or tests that they forgot to exclude from the npm package. But, on the other hand, you'll run into the occasional module that's not just 100kb on disk, but 5mb, and that's an issue. Or a new dependency that makes one of Placemark's geospatial data converters take 5mb of data to boot up. So it's good to know.
So I'm tracking a lot of these metrics right now:
That's just the start, though. Adding a new metric is extremely easy, and I'm still sending a relatively tiny amount of data to InfluxDB.
This is thankfully pretty simple. I use InfluxDB Cloud, with their free, rate-limited plan for now. I might upgrade to a pay-as-you-go plan, which would be a few dollars a month to have better information retention. And then I send statistics from GitHub Actions. RIght now those stats are sent from every test run, and they aren't even segmented by branch, but Placemark uses a simple branching model - one feature branch at a time, with a stable and continuously-deployed main branch.
InfluxData themselves maintain a GitHub Action that'll download and install the client, and optionally, the server, for you. I store the configuration values in my repository's secrets, load them up with influx config create, and now I'm ready to start logging some data!
Here's the first data point that I count: the number of TODO comments in the codebase:
This is using the line protocol, and appending each new metric to the same file. Then, at the end of the whole process, I just write that whole file to the database:
And, voila! Data is stored in a fancy, flexible time-series database. I'd bet something similar is possible with another database, like TimescaleDB, and you could even get this kind of process going with Google Sheets as a backend. It works pretty well, and at this point is a solution that costs nothing and doesn't require any new infrastructure-building.
I only have a few days of data, but already I'm starting to get a little extra satisfaction from dropping dependencies and then seeing the number go down. It's fun to know.