-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Get::html() for all platforms #163
base: master
Are you sure you want to change the base?
Conversation
@complexspaces sorry to bother, but what do you think of the changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for my tardiness reviewing this, I must have missed the email notification when it came in.
The Windows and Linux implementations look good, but I have a question about the HTML parsing implemented for macOS based on some local testing on macOS 15.3.
@@ -347,6 +353,18 @@ fn add_clipboard_exclusions(clipboard: &mut Clipboard, exclude_from_history: boo | |||
} | |||
} | |||
|
|||
fn extract_html(html: String) -> Option<String> { | |||
let start_tag = "<body>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Are these guaranteed to appear in enough cases? For example in Firefox I see <body>
tags wrapping the content when I copy from example.com
. However if I copy the same domain out of Chrome it directly starts with <h1>
. If I copy a nested block out of Logseq it omits them and starts directly with <ul>
as the top tag too.
As-is, IIUC, this will miss a lot of valid HTML cases from real browsers today because it returns None
if the <body>
wrapper can't be found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually made a PR on @Gae24's fork to remove this function. I also think it's not necessary and should be left to the consuming end, if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely a question worth considering either way. I don't have a Linux system available to test this currently but on Windows I am returned a <div>
directly when copying from Firefox and the same <h1>
as Chrome returns on macOS.
If we wanted to keep the behavior closer between the platforms we could strip out <body>
and anything before it on both sides of the HTML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd guess what's written to the clipboard is entirely application-dependent. Applications other than browsers (Word or similar) probably have a different format as well.
Which leads me to say transforming the clipboard HTML content would have to be left to the consumer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly it was added because some tests were failing. Testing example.com
on Linux while Firefox wraps the content with <meta http-equiv="content-type" content="text/html; charset=utf-8"><div></div>
, Edge that's still Chromium, simply gives <h1>
let start_index = html.find(start_tag)? + start_tag.len(); | ||
// Locate the end index of the </body> tag | ||
let end_index = html.find(end_tag)?; | ||
|
||
Some(html[start_index..end_index].to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: drop(String::drain(...))
twice may be more efficient for large blocks of HTML since it keeps the string.
@Fruup Please see my comment in #157 (comment). I would prefer not to add more methods to the top-level |
I was unable to run tests, since there was a panic on lib.rs:283, but by running the example I was able to confirm it works on windows and linux.
Closes #130 and #157.