Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hidden text passed to Readablity #930

Open
ivanlabsii opened this issue Dec 22, 2024 · 2 comments
Open

Hidden text passed to Readablity #930

ivanlabsii opened this issue Dec 22, 2024 · 2 comments

Comments

@ivanlabsii
Copy link

This is the sample page:
https://www.reuters.com/legal/qualcomm-saw-nuvia-buy-chance-save-14-billion-year-arm-fees-ceo-tells-jury-2024-12-18/

It contains the hidden text in this form:
<span style="border: 0px; clip: rect(0px, 0px, 0px, 0px); clip-path: inset(50%); height: 1px; margin: -1px; overflow: hidden; padding: 0px; position: absolute; width: 1px; white-space: nowrap;">, opens new tab</span>

This is passed as a cleaned span to Readability, though it should be obvious that it is hidden:
<span>, opens new tab</span>

@gijsk
Copy link
Contributor

gijsk commented Dec 31, 2024

Can you elaborate on what you mean by "obvious"? On what basis/heuristic do you think readability should discard text like this?

(reuters' use of visually-hidden text that is semantically non-hidden is heavily frowned upon from an accessibility perspective... but I suppose that's not stopping them. 😞 )

@ivanlabsii
Copy link
Author

ivanlabsii commented Dec 31, 2024

@gijsk You have just identified heuristics.

Can you say why you wouldn't use them aside from "purity"? If browser engine developers were not tolerant to issues like this, we wouldn't have the web in today's form at all. To deliver remotely acceptable results any HTML parsing package must work under assumption that everything can happen, that is the standard behaviour accepted in early 90s.

Also I do agree with your point, but it has nothing to do with my issue. If you have to pick if it is proper to support 99% of users or 1%, it is normal to support 99% and not 1%. But I do agree with your idea that 1% should be supported too, just Readability API should have something like VoiceReaderAccessibility boolean parameter to switch its behaviour to properly support that 1% too. I would be very happy to see it and would use that on day 1 when it is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants