Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DomException when simplifying nested elements caused by invalid attribute names #918

Merged
merged 3 commits into from
Jan 2, 2025

Conversation

dbowring
Copy link
Contributor

@dbowring dbowring commented Oct 19, 2024

Copies the same fix for copying invalid property names (such as ") used in _setNodeTag to _simplifyNestedElements.

You can reproduce this error running the following code in node (with jsdom installed):

const { JSDOM } = require("jsdom");
const Readability = require("./Readability.js");

const sampleHTML = '<div "=""><div><div>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</div></div></div>';
const dom = new JSDOM(sampleHTML, { contentType: "text/html" });
const readability = new Readability(dom.window.document);
const parsed = readability.parse();
console.log(parsed.textContent);

Without the fix:

$ node demo.js  

./readability/node_modules/jsdom/lib/jsdom/living/helpers/validate-names.js:10
    throw DOMException.create(globalObject, [`"${name}" did not match the Name production`, "InvalidCharacterError"]);
    ^
[DOMException [InvalidCharacterError]: """ did not match the Name production]

Node.js v20.18.0

With the fix:

$ node demo.js
Lorem ipsum dolor sit amet, consectetur adipiscing elit.

dbowring added a commit to spaceduck-com/readability-cli that referenced this pull request Oct 20, 2024
dbowring added a commit to spaceduck-com/readability-cli that referenced this pull request Oct 20, 2024
@dbowring
Copy link
Contributor Author

dbowring commented Oct 21, 2024

Added the demo from the original comment as a test (70cedd6), but it also requires changes the way attributes are compared for the tests (to ignore invalid attributes, if they are there they are fine, but if we need to remove them that should also be fine), so you may wish to drop that commit if the test changes are not acceptable.

@dbowring dbowring force-pushed the fix-simplify-invalid-attributes branch from 402e5de to f48f5e3 Compare October 21, 2024 00:55
@dbowring dbowring force-pushed the fix-simplify-invalid-attributes branch from f48f5e3 to 70cedd6 Compare October 21, 2024 01:21
Copy link
Contributor

@gijsk gijsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had put up #889 for this but then neglected to update it when it became outdated/conflicted. I've unfortunately not been as active as I'd like due to some health issues so I'm sorry it's taken so long. Anyway, I've sort of merged the two pulls and will approve and merge this into the repo and then close my own older PR.

@gijsk gijsk merged commit 4b22103 into mozilla:main Jan 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants