FR: `capture-html` , but with `eww-readable` #53

stefan2904 · 2022-12-16T15:08:03Z

As always, thanks for the useful package!

Feature Request / Proposal

I propose to add a capture protocol that behaves like capture-html currently does, but (if nothing is selected), sends the whole page's html (i.e., document.body) to emacs and uses eww-readable to extract the interesting html (before converting it with pandoc). To basically a mix of the two existing protocols.

Use case

Let's say I want to capture a webpage into org. If I only want to capture selected part(s) of the website, then capture-html works perfectly.
On the other hand, if I want to capture the full webpage, and want to use the power of eww-readable, but want to capture the content I have already open, this is currently not possible. (Or I did misunderstand something.)

Background

What I mean: If I understand things correctly,

org-protocol-capture-html--with-pandoc (used by the capture-html protocol)

user selects some text in browser, clicks on bookmarklet
selected text is send to emacs via org-protocol
function converts the html (coming from the browser) to org

org-protocol-capture-html--capture-eww-readable (used by the capture-eww-readable protocol`)

user clicks on bookmarklet (which does not send selected text to emacs)
function uses org-protocol-capture-html--url-html to download html directly
function then converts the html (downloaded by curl) to org

The difference being that in capture-html the html is retrieved by the browser, while in capture-eww-readable it is retrieved by emacs/curl.

This makes a difference when the content being captured is, e.g., a page behind a paywall, or dynamically generated content. It is also a difference when the text is inserted into the DOM by some javascript which loads the content after the page itself is loaded.

The text was updated successfully, but these errors were encountered:

stefan2904 · 2022-12-16T15:11:40Z

So basically (coming from someone who does not know elisp)

modify the capture-html bookmarklet to send document.body if nothing is selected)
copy org-protocol-capture-html--with-pandoc function and protocol
modify it to use org-protocol-capture-html--eww-readabl on (plist-get data :body) (or after the sanitization, not sure)

(probably there is a nicer way with less duplication of code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: `capture-html` , but with `eww-readable` #53

FR: `capture-html` , but with `eww-readable` #53

stefan2904 commented Dec 16, 2022 •

edited

Loading

stefan2904 commented Dec 16, 2022 •

edited

Loading

FR: capture-html , but with eww-readable #53

FR: capture-html , but with eww-readable #53

Comments

stefan2904 commented Dec 16, 2022 • edited Loading

Feature Request / Proposal

Use case

Background

stefan2904 commented Dec 16, 2022 • edited Loading

FR: `capture-html` , but with `eww-readable` #53

FR: `capture-html` , but with `eww-readable` #53

stefan2904 commented Dec 16, 2022 •

edited

Loading

stefan2904 commented Dec 16, 2022 •

edited

Loading