Implement Get::html() for all platforms #163

Gae24 · 2024-10-12T18:07:38Z

I was unable to run tests, since there was a panic on lib.rs:283, but by running the example I was able to confirm it works on windows and linux.
Closes #130 and #157.

Signed-off-by: Gae24 <[email protected]>

Gae24 · 2024-11-01T19:29:32Z

@complexspaces sorry to bother, but what do you think of the changes?

Fruup · 2025-01-31T07:08:34Z

@Gae24 Nice work!

~~We should also include a method get_html here.~~

Would also like this to be merged :)

complexspaces

I apologize for my tardiness reviewing this, I must have missed the email notification when it came in.

The Windows and Linux implementations look good, but I have a question about the HTML parsing implemented for macOS based on some local testing on macOS 15.3.

complexspaces · 2025-01-31T08:57:53Z

src/platform/osx.rs

@@ -347,6 +353,18 @@ fn add_clipboard_exclusions(clipboard: &mut Clipboard, exclude_from_history: boo
 	}
 }

+fn extract_html(html: String) -> Option<String> {
+	let start_tag = "<body>";


Question: Are these guaranteed to appear in enough cases? For example in Firefox I see <body> tags wrapping the content when I copy from example.com. However if I copy the same domain out of Chrome it directly starts with <h1>. If I copy a nested block out of Logseq it omits them and starts directly with <ul> as the top tag too.

As-is, IIUC, this will miss a lot of valid HTML cases from real browsers today because it returns None if the <body> wrapper can't be found.

I actually made a PR on @Gae24's fork to remove this function. I also think it's not necessary and should be left to the consuming end, if needed.

It's definitely a question worth considering either way. I don't have a Linux system available to test this currently but on Windows I am returned a <div> directly when copying from Firefox and the same <h1> as Chrome returns on macOS.

If we wanted to keep the behavior closer between the platforms we could strip out <body> and anything before it on both sides of the HTML.

I'd guess what's written to the clipboard is entirely application-dependent. Applications other than browsers (Word or similar) probably have a different format as well.

Which leads me to say transforming the clipboard HTML content would have to be left to the consumer.

If I remember correctly it was added because some tests were failing. Testing example.com on Linux while Firefox wraps the content with <meta http-equiv="content-type" content="text/html; charset=utf-8"><div></div>, Edge that's still Chromium, simply gives <h1>

complexspaces · 2025-01-31T09:00:07Z

src/platform/osx.rs

+	let start_index = html.find(start_tag)? + start_tag.len();
+	// Locate the end index of the </body> tag
+	let end_index = html.find(end_tag)?;
+
+	Some(html[start_index..end_index].to_string())


Nit: drop(String::drain(...)) twice may be more efficient for large blocks of HTML since it keeps the string.

complexspaces · 2025-01-31T09:12:43Z

We should also include a method get_html here.

@Fruup Please see my comment in #157 (comment). I would prefer not to add more methods to the top-level Clipboard structure for the time being to keep the API surface smaller.

Gae24 added 3 commits October 12, 2024 19:12

implement get html operation

d42f5e6

Signed-off-by: Gae24 <[email protected]>

add test for html() get operation

aea5012

cleanup wayland code

326ede7

Gae24 force-pushed the get-html branch from c699be2 to 326ede7 Compare October 13, 2024 10:53

Gae24 added 2 commits October 17, 2024 08:50

fix fmt errors

0958834

fix compilation on macOS

f9b466b

Gae24 force-pushed the get-html branch from 87c5290 to f9b466b Compare October 17, 2024 06:54

macOS: extract html

d445228

Gae24 force-pushed the get-html branch from 63f038e to d445228 Compare October 17, 2024 08:34

fix windows

52ce714

Gae24 mentioned this pull request Jan 2, 2025

Implement Clipboard Event Api servo/servo#33576

Merged

4 tasks

complexspaces requested changes Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Get::html() for all platforms #163

Implement Get::html() for all platforms #163

Gae24 commented Oct 12, 2024 •

edited

Loading

Gae24 commented Nov 1, 2024

Fruup commented Jan 31, 2025 •

edited

Loading

complexspaces left a comment

complexspaces Jan 31, 2025

Fruup Jan 31, 2025

complexspaces Jan 31, 2025

Fruup Jan 31, 2025

Gae24 Jan 31, 2025

complexspaces Jan 31, 2025

complexspaces commented Jan 31, 2025

Implement Get::html() for all platforms #163

Are you sure you want to change the base?

Implement Get::html() for all platforms #163

Conversation

Gae24 commented Oct 12, 2024 • edited Loading

Gae24 commented Nov 1, 2024

Fruup commented Jan 31, 2025 • edited Loading

complexspaces left a comment

Choose a reason for hiding this comment

complexspaces Jan 31, 2025

Choose a reason for hiding this comment

Fruup Jan 31, 2025

Choose a reason for hiding this comment

complexspaces Jan 31, 2025

Choose a reason for hiding this comment

Fruup Jan 31, 2025

Choose a reason for hiding this comment

Gae24 Jan 31, 2025

Choose a reason for hiding this comment

complexspaces Jan 31, 2025

Choose a reason for hiding this comment

complexspaces commented Jan 31, 2025

Gae24 commented Oct 12, 2024 •

edited

Loading

Fruup commented Jan 31, 2025 •

edited

Loading