Thoughts on WHAT to track and report
I’ve been thinking about not just HOW to track, but WHAT to track. And these are related. My tracking method will, to some extent, dictate what I can track. For example, using a simple pixel image or reference to a URL in HTML or CSS will not be able to send me the URL of the referring page.
And in meeting my goal of not using cookies and not keeping any personally identifiable information, I won’t be able to track users paths through a website.
This is perfectly OK for some applications. It’s not OK for everyone, but if you need that level of detail, we can still report the ratio of conversions against page views.
I wasn’t going to add event tracking, but maybe I’ll add events after all to help with this. This WILL require a JS tracking code to be installed.
We’ll see. Initially I’m happy with views per page over time, browser usage metrics and referring pages/traffic sources. None of these need personally identifiable information. It’s all anonymised and aggregate.
IP’s are personally identifiable information and will be logged in server logs (unless I turn this off), but I’ve seen it argued that, as long as your careful with log rotation and deletion (including backups), there’s a case for keeping this data temporarily without consent.
4 Responses
Hi. I’m the author of the GDPR post your linking to and I hope you’ll make Kownter into a great analytics alternative with time!
You can’t answer the “what to track” question without answering this question first “what is the purpose of the tracking.” You need to answer that first.
Everyone needs: technical data (for development purposes), page views, time on page, and referral/campaign data.
Some generic technical data can be useful. However, you don’t need the exact resolution of someone’s screen: round down to nearest 10. You don’t need the exact User-Agent: strip of component version numbers (e.g. `Mozilla/5.0 (Stuff/2.0) SoAndSo/34.2 Safari/343.13` can be reduced to `Mozilla (Stuff) SoAndSo Safari`). Likewise, you don’t need to combine this data. Store technical data separately and don’t link them together. Technical data is only relevant for trends like “the average screen sizes are increasing” or “Firefox is becoming more popular”.
Referral data is super-useful to web publishers because you can discover who talks about your articles (hi there!). However, you should store these separately as well. Don’t associate it with any other data than the destination pages. Some webapps leak personal data in the referral header, but these will usually have low volumes of traffic. You should also delete low-volume referral links; if less than X number of users was referred by the same link, then delete that referral link after 10 days or so. The same goes for campaign/source tracking data.
Speaking of campaign tracking: please use URL fragments/hashes rather than query parameters to improve privacy and caching (example implementation). There is one known issue with this method which can be resolved by using “?#” instead of “#”.
P.S.: Personally I’d like to see working time-on-page tracking.
P.P.S.: Don’t hesitate to email me to discuss things further.
Thanks for such detailed and helpful comments, you really seem to know your stuff? Can I ask if you’ve tried to build something like this before?
I’m not sure this will ever become an alternative for many people but it’s enough for my quite limited needs and the exercise of building it is a very useful one.
I think I’ve probably answered the “what is the purpose” in my own head but not written the answer down here (yet?). But in this case I’m also just exploring what is technically possible. Once I know what I CAN do I can then work out what I NEED to do. For example, I, personally, am not actually that interested in time on page. But it’s interesting to know that you could collect it because, as you say, others may want it.
But these are all great comments and I appreciate you taking time to share your knowledge here. Thank you!!
I’ve not built anything yet, no. I’ve got years worth of notes for when I finally find the time to build something like this. I’ve kind of implemented something to track popular pages on my blog.
I’ve tried using Piwik but it’s just so slow, has so many odd limitations, and it does such a poor job of visualizing data. So I thought I’d try to cheer you on a bit as you build the perfect replacement! (No pressure.)
Time on page is important. It’s not a problem if people spend 5 seconds on your front page or other portal pages. However, if people navigate away from your blog posts in less than a minute, they didn’t even have time to read the first few paragraphs. If this trend affects all your posts, then you need to investigate design issues or try to figure out why people are leaving. If everyone move away from a particular post in the first five seconds, then that post either didn’t live up to expectations or failed to grab people’s attention. You’d better investigate ways to fix that page too. Do people spend more time reading posts on fruit even though more people visit posts on vegetables? Maybe write more about fruits to attract engaged readers. GAnalytics fails at this completely as they don’t track time-on-last/only-page.
Hah. Thanks. I’m not sure this will be the perfect replacement, but I appreciate being cheered on!
Popular page tracking is something I do quite a lot for clients in WordPress. I could probably make a plugin from it. But being able to have a really simple read-only API for popular pages would be a real boon for me. So that’s another possible motivation for doing this.
I get why time on page could be important. But I just don’t use it. I write what I write and I make what I make. I’m interested in the reach, but I’m not likely to change what I do as a result. It’s mere curiosity. Which is one reason I want a solution that avoids any personal data: because I don’t really have a basis for processing!