Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Check client hints for headless browsers #108

Closed
peterbo opened this issue Feb 19, 2024 · 8 comments
Closed

[BUG] Check client hints for headless browsers #108

peterbo opened this issue Feb 19, 2024 · 8 comments

Comments

@peterbo
Copy link

peterbo commented Feb 19, 2024

Option "block headless browsers" is active in TrackingSpamPrevention, but headless browsers are tracked nevertheless:

block-headless

@snake14
Copy link
Contributor

snake14 commented Feb 19, 2024

Hi @peterbo . Thank you for taking the time to create this issue. I was unable to reproduce the problem. I tested with Matomo 5.0.2, TrackingSpamPrevention version 5.0.0, and Headlesss Chrome 121.1. When TrackingSpamPrevention was enabled, requests weren't tracked from the headless browser and they were when the plugin was disabled.

Could you please provide more background information, like your Matomo/plugin version?

@peterbo
Copy link
Author

peterbo commented Feb 20, 2024

Hi @snake14 - we're seeing a lot of those in different instances. The problem seems to be, that bots use a "normal" chrome as a user agent, but identify as a headless chrome in the client hints:

[20/Feb/2024:10:53:13 +0100] "POST /matomo.php?action_name=Welcome&idsite=2&rec=1&r=577211&h=9&m=53&s=13&url=https%3A%2F%2Fwww.example.com%2Fen%2F&_id=&_idn=1&send_image=0&_refts=0&cookie=1&res=1280x720&dimension4=en&dimension8=content&pv_id=8jAVo2&pf_net=500&pf_srv=747&pf_tfr=366&uadata=%7B%22fullVersionList%22%3A%5B%7B%22brand%22%3A%22Not%20A(Brand%22%2C%22version%22%3A%2299.0.0.0%22%7D%2C%7B%22brand%22%3A%22HeadlessChrome%22%2C%22version%22%3A%22121.0.6167.57%22%7D%2C%7B%22brand%22%3A%22Chromium%22%2C%22version%22%3A%22121.0.6167.57%22%7D%5D%2C%22mobile%22%3Afalse%2C%22model%22%3A%22%22%2C%22platform%22%3A%22Linux%22%2C%22platformVersion%22%3A%225.15.0%22%7D HTTP/1.1" 204 0 "https://www.example.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.6167.57 Safari/537.36"

The client hints seem to be reported in the end. Therefore, TrackingSpamPrevention should also check client hints, if the browser identifies as headless, i.e. https://github.com/matomo-org/matomo/blob/de2d14a6dda52c3445cfdcce50dcadaa1e87d7da/plugins/DevicesDetection/Columns/BrowserEngine.php#L40

@snake14
Copy link
Contributor

snake14 commented Feb 20, 2024

Thank you @peterbo . That's very helpful. I'll mark this issue for our Product team to review and prioritise.

@snake14 snake14 added this to the For Prioritisation milestone Feb 20, 2024
@snake14 snake14 changed the title [BUG] Blocking of headless browsers doesn't seem to be working correctly [BUG] Check client hints for headless browsers Feb 27, 2024
@matomoto
Copy link

matomoto commented Mar 1, 2024

Joined from the matomo forum ...

The User Agent has nothing information to detect the browser as an Headless Browser:
HTTP/1.1" 204 0 "https://www.example.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.6167.57 Safari/537.36"
So, it is not possible to detect the browser as an Headless Browser.

Matomo used a Device Detector: https://github.com/matomo-org/device-detector/
So far it is understandable, the Device Detector creates a key=value for the Matomo Tracking URL-Querystring:
uadata={"fullVersionList":[{"brand":"Not A(Brand","version":"99.0.0.0"},{"brand":"HeadlessChrome","version":"121.0.6167.57"},{"brand":"Chromium","version":"121.0.6167.57"}],"mobile":false,"model":"","platform":"Linux","platformVersion":"5.15.0"}
and in one of the subinfos, the browser is detected as an Headless Browser:
{"brand":"HeadlessChrome","version":"121.0.6167.57"}

More Infos about uadata: matomo-org/matomo#20128

The possible Solution is, that the TrackingSpamPrevention Plugin used this information. But i have no idea how comes the uadata into the URL-Querystring, ... and it is possible to use this data in the TrackingSpamPrevention Plugin.

@atom-box
Copy link

Another user reported this today

We have tons of garbage Headless Chrome visits in our logs which are not filtered correctl and we need to remove manually via the GDPR tooling.

@atom-box
Copy link

A user reported many headless browsers are still undetected:

Monthly end is nearing, so it is soon time again for me to enter the GDPR tools of Matomo, search for all visits from Headless Chrome which sneaked through SpamPrevention and delete them to get our reports right.
I'd love to get rid of that manual activity. ;)

@AltamashShaikh
Copy link
Contributor

@peterbo @atom-box Created #129 to look for clientHints too to detect headless browsers

@AltamashShaikh
Copy link
Contributor

@peterbo @atom-box Released v5.0.5 of the plugin with the fix 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants