The (Second) Coming of Privacy-First Website Analytics: Roll your own
Old enough to remember the days of Webalizer? Back to a time before GDPR and user's privacy were on any developer's mind. Yet a time where tools like Webalizer offered a straightforward, no-frills approach to understanding web traffic. These rudimentary systems, often lacking the complexity of modern solutions, were a staple in the webmaster's toolkit. They provided basic insights with a simple and straight forward approach.
Thinking back to that time I can't help wondering... How did we get where we are today where the default has become privacy invasive tracking such as Google Analytics. Before you utter "but the new GA4 is cookieless" – it still uses cookies, but due to clever marketing everyone assumes it doesn't. It's cookieless, less cookies, not free of cookies 🍪🫠
In this post I will list some alternatives, and show you a couple privacy friendly alternatives that can be hosted on your own servers. They don't have all the bells and whistles of Google Analytics, but in my experience Google Analytics can't really be trusted nowadays anyway with a myriad of flaws and issues.
The Rise of Privacy-Focused Alternatives
Recent developments in regulation, such as GDPR and CCPA, along with increasing awareness of users' privacy, have paved the way for new players in the analytics field. Privacy-friendly website analytics tools like as Fathom, Plausible, Matomo, Umami and Piwik are gaining traction, offering a stark contrast to the traditional, often invasive, methods of data collection.
Some of these tools have gone a step further by embracing the open-source movement, allowing users to self-host their analytics platforms. This shift aligns perfectly with the contemporary ethos of data ownership and user privacy.
Why should you care about privacy friendly analytics?
In the EU there have been several legal rulings and movements in the past year that should give you some good reasons to seriously consider using alternatives to Google Analytics. These cases were ruled before the new EU-US Privacy Framework came into effect, so the outcome might have been different today.
I won't go into the details, but here are some links where you can read more about these rulings from my favourite source of all privacy related news, Rie Aleksandra Walle:
Embracing Self-Hosting in the DevOps Era
In the beginning of my career I was an advocate for self-hosting everything. However, as the technological landscape shifted towards cloud services and Platform as a Service (PaaS) models, I found myself recommending these new options and defaulting to SaaS. Recently though, there's been a resurgence in my support and love for self-hosting services.
Over the years of my career I have seen myself shift from self-hosting everything, then proposing self-hosting during the shift to cloud and PaaS, and now shifting a bit back to self-hosting again.
The last decade has been a time when outsourcing everything non-core to your business has been the norm for SaaS companies. However, with the advent of DevOps, containerization, and an increasing focus on user privacy, owning your data has become more crucial than ever. The drawbacks of self-hosting have diminished, making it a viable option for many.
Before diving into self-hosting, consider these factors to weight the pros/cons of that approach versus using a SaaS:
- The criticality of the service to your business.
- The maintenance requirements and associated efforts.
- The sensitivity of the data processed by the system.
- The due diligence required when sharing data with another processor.
In my experience, tools like Plausible and Umami are noteworthy for their user-friendliness and privacy-centric features. I've used Plausible for the analytics on this site. When trying to configure the same for the company I work for I faced challenges integrating it with AWS RDS PostgreSQL, leading me to explore Umami as an alternative.
Plausible vs Umami
Both tools gives you a nice and simple overview of views, referrers/sources, geo-area of visitors, browser/OS/device-category. The one thing I'm missing is more advanced support for filtering data based on different criteria, both of them becomes a bit too simplistic to make it easy to generate and view reports on specific user behaviour.
I found Plausible's way to navigate and filter the result set based on things such as a specific "utm_source" more user friendly, but still possible in Umami (though it wasn't obvious from the interface).
The place they seem to differentiate the most is how they have approach Goals/Funnels and events. These features can be very useful when aim for understanding user behaviour and optimizing website performance.
So which one of these would I recommend? it's hard to say, they are very much similar, but both has some pros/cons that could impact your decision based on how you utilize your statistics. If I had to select one, I think it would be Plausible at this point, but seeing Umami's roadmap it might change in the future.
Below you can find the links documenting how to host Plausible and Umami on your own. Let me know if you would like a post about how to install and host these services on AWS ECS or Docker in general, though they should be fairly straight forward to get up and running.