The (Second) Coming of Privacy-First Website Analytics: Roll your own

Old enough to remember the days of Webalizer? Back to a time before GDPR and user's privacy were on any developer's mind. Yet a time where tools like Webalizer offered a straightforward, no-frills approach to understanding web traffic. These rudimentary systems, often lacking the complexity of modern solutions, were a staple in the webmaster's toolkit. They provided basic insights with a simple and straight forward approach.

Webalizer analytics, anno 1999.

Thinking back to that time I can't help wondering... How did we get where we are today where the default has become privacy invasive tracking such as Google Analytics. Before you utter "but the new GA4 is cookieless" – it still uses cookies, but due to clever marketing everyone assumes it doesn't. It's cookieless, less cookies, not free of cookies 🍪🫠

Inspecting website cookies after creating an empty page with default GA4 setup.

In this post I will list some alternatives, and show you a couple privacy friendly alternatives that can be hosted on your own servers. They don't have all the bells and whistles of Google Analytics, but in my experience Google Analytics can't really be trusted nowadays anyway with a myriad of flaws and issues.

The Rise of Privacy-Focused Alternatives

Recent developments in regulation, such as GDPR and CCPA, along with increasing awareness of users' privacy, have paved the way for new players in the analytics field. Privacy-friendly website analytics tools like as Fathom, Plausible, Matomo, Umami and Piwik are gaining traction, offering a stark contrast to the traditional, often invasive, methods of data collection.

Some of these tools have gone a step further by embracing the open-source movement, allowing users to self-host their analytics platforms. This shift aligns perfectly with the contemporary ethos of data ownership and user privacy.

Why should you care about privacy friendly analytics?

In the EU there have been several legal rulings and movements in the past year that should give you some good reasons to seriously consider using alternatives to Google Analytics. These cases were ruled before the new EU-US Privacy Framework came into effect, so the outcome might have been different today.

I won't go into the details, but here are some links where you can read more about these rulings from my favourite source of all privacy related news, Rie Aleksandra Walle:

Rie Aleksandra Walle on LinkedIn: #gdpr #googleanalytics #dataprotection | 85 comments
🔥 Swedish DPA orders controllers to STOP USING Google Analytics and fines one of them SEK 12 million, following noyb.eu's 101 transfers campaign. The DPA's… | 85 comments on LinkedIn
Rie Aleksandra Walle on LinkedIn: #gdpr #privacy #dataprotection #googleanalytics | 62 comments
🔥 Google Analytics-use yet again found to violate the GDPR, this time in Norway. The DPA Datatilsynet investigated 3 main legal questions: 1. Was personal… | 62 comments on LinkedIn

Embracing Self-Hosting in the DevOps Era

In the beginning of my career I was an advocate for self-hosting everything. However, as the technological landscape shifted towards cloud services and Platform as a Service (PaaS) models, I found myself recommending these new options and defaulting to SaaS. Recently though, there's been a resurgence in my support and love for self-hosting services.

Over the years of my career I have seen myself shift from self-hosting everything, then proposing self-hosting during the shift to cloud and PaaS, and now shifting a bit back to self-hosting again.

The last decade has been a time when outsourcing everything non-core to your business has been the norm for SaaS companies. However, with the advent of DevOps, containerization, and an increasing focus on user privacy, owning your data has become more crucial than ever. The drawbacks of self-hosting have diminished, making it a viable option for many.

Before diving into self-hosting, consider these factors to weight the pros/cons of that approach versus using a SaaS:

  • The criticality of the service to your business.
  • The maintenance requirements and associated efforts.
  • The sensitivity of the data processed by the system.
  • The due diligence required when sharing data with another processor.

In my experience, tools like Plausible and Umami are noteworthy for their user-friendliness and privacy-centric features. I've used Plausible for the analytics on this site. When trying to configure the same for the company I work for I faced challenges integrating it with AWS RDS PostgreSQL, leading me to explore Umami as an alternative.

Plausible vs Umami

Umami on the left, Plausible to the right.

Both tools gives you a nice and simple overview of views, referrers/sources, geo-area of visitors, browser/OS/device-category. The one thing I'm missing is more advanced support for filtering data based on different criteria, both of them becomes a bit too simplistic to make it easy to generate and view reports on specific user behaviour.

I found Plausible's way to navigate and filter the result set based on things such as a specific "utm_source" more user friendly, but still possible in Umami (though it wasn't obvious from the interface).

Filtering for UTM Campaign in Umami.
Filtering for UTM Campaign in Plausible.

The place they seem to differentiate the most is how they have approach Goals/Funnels and events. These features can be very useful when aim for understanding user behaviour and optimizing website performance.

So which one of these would I recommend? it's hard to say, they are very much similar, but both has some pros/cons that could impact your decision based on how you utilize your statistics. If I had to select one, I think it would be Plausible at this point, but seeing Umami's roadmap it might change in the future.

Below you can find the links documenting how to host Plausible and Umami on your own. Let me know if you would like a post about how to install and host these services on AWS ECS or Docker in general, though they should be fairly straight forward to get up and running.

Getting started | Plausible docs
The easiest way to get started with Plausible is with our official managed service in the Cloud. It takes 2 minutes to start counting your stats with a worldwide CDN, high availability, backups, security and maintenance all done for you by us. You also get access to all the premium features and the advanced bot filtering. Our managed hosting can save a substantial amount of developer time and resources. For most sites this ends up being the best value option and the revenue goes to funding the maintenance and further development of Plausible. So you’ll be supporting open source software and getting a great service! Here’s the full comparison between the managed hosting and self-hosting. The section below is for self-hosting our analytics on your server and managing your infrastructure.

How to install Plausible

Docs: Installation – Umami

How to install Umami