Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Get::html() for all platforms #163

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Gae24
Copy link

@Gae24 Gae24 commented Oct 12, 2024

I was unable to run tests, since there was a panic on lib.rs:283, but by running the example I was able to confirm it works on windows and linux.
Closes #130 and #157.

@Gae24
Copy link
Author

Gae24 commented Nov 1, 2024

@complexspaces sorry to bother, but what do you think of the changes?

@Fruup
Copy link

Fruup commented Jan 31, 2025

@Gae24 Nice work!

We should also include a method get_html here.


Would also like this to be merged :)

Copy link
Collaborator

@complexspaces complexspaces left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for my tardiness reviewing this, I must have missed the email notification when it came in.

The Windows and Linux implementations look good, but I have a question about the HTML parsing implemented for macOS based on some local testing on macOS 15.3.

@@ -347,6 +353,18 @@ fn add_clipboard_exclusions(clipboard: &mut Clipboard, exclude_from_history: boo
}
}

fn extract_html(html: String) -> Option<String> {
let start_tag = "<body>";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Are these guaranteed to appear in enough cases? For example in Firefox I see <body> tags wrapping the content when I copy from example.com. However if I copy the same domain out of Chrome it directly starts with <h1>. If I copy a nested block out of Logseq it omits them and starts directly with <ul> as the top tag too.

As-is, IIUC, this will miss a lot of valid HTML cases from real browsers today because it returns None if the <body> wrapper can't be found.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually made a PR on @Gae24's fork to remove this function. I also think it's not necessary and should be left to the consuming end, if needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely a question worth considering either way. I don't have a Linux system available to test this currently but on Windows I am returned a <div> directly when copying from Firefox and the same <h1> as Chrome returns on macOS.

If we wanted to keep the behavior closer between the platforms we could strip out <body> and anything before it on both sides of the HTML.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess what's written to the clipboard is entirely application-dependent. Applications other than browsers (Word or similar) probably have a different format as well.

Which leads me to say transforming the clipboard HTML content would have to be left to the consumer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly it was added because some tests were failing. Testing example.com on Linux while Firefox wraps the content with <meta http-equiv="content-type" content="text/html; charset=utf-8"><div></div>, Edge that's still Chromium, simply gives <h1>

Comment on lines +361 to +365
let start_index = html.find(start_tag)? + start_tag.len();
// Locate the end index of the </body> tag
let end_index = html.find(end_tag)?;

Some(html[start_index..end_index].to_string())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: drop(String::drain(...)) twice may be more efficient for large blocks of HTML since it keeps the string.

@complexspaces
Copy link
Collaborator

We should also include a method get_html here.

@Fruup Please see my comment in #157 (comment). I would prefer not to add more methods to the top-level Clipboard structure for the time being to keep the API surface smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants