You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I propose to add a capture protocol that behaves like capture-html currently does, but (if nothing is selected), sends the whole page's html (i.e., document.body) to emacs and uses eww-readable to extract the interesting html (before converting it with pandoc). To basically a mix of the two existing protocols.
Use case
Let's say I want to capture a webpage into org. If I only want to capture selected part(s) of the website, then capture-html works perfectly.
On the other hand, if I want to capture the full webpage, and want to use the power of eww-readable, but want to capture the content I have already open, this is currently not possible. (Or I did misunderstand something.)
Background
What I mean: If I understand things correctly,
org-protocol-capture-html--with-pandoc (used by the capture-html protocol)
user selects some text in browser, clicks on bookmarklet
selected text is send to emacs via org-protocol
function converts the html (coming from the browser) to org
org-protocol-capture-html--capture-eww-readable (used by the capture-eww-readable protocol`)
user clicks on bookmarklet (which does not send selected text to emacs)
function uses org-protocol-capture-html--url-html to download html directly
function then converts the html (downloaded by curl) to org
The difference being that in capture-html the html is retrieved by the browser, while in capture-eww-readable it is retrieved by emacs/curl.
This makes a difference when the content being captured is, e.g., a page behind a paywall, or dynamically generated content. It is also a difference when the text is inserted into the DOM by some javascript which loads the content after the page itself is loaded.
The text was updated successfully, but these errors were encountered:
As always, thanks for the useful package!
Feature Request / Proposal
I propose to add a capture protocol that behaves like
capture-html
currently does, but (if nothing is selected), sends the whole page's html (i.e.,document.body
) to emacs and useseww-readable
to extract the interesting html (before converting it with pandoc). To basically a mix of the two existing protocols.Use case
capture-html
works perfectly.eww-readable
, but want to capture the content I have already open, this is currently not possible. (Or I did misunderstand something.)Background
What I mean: If I understand things correctly,
org-protocol-capture-html--with-pandoc
(used by thecapture-html
protocol)org-protocol-capture-html--capture-eww-readable
(used by thecapture-eww-readable
protocol`)org-protocol-capture-html--url-html
to download html directlyThe difference being that in
capture-html
the html is retrieved by the browser, while incapture-eww-readable
it is retrieved by emacs/curl.This makes a difference when the content being captured is, e.g., a page behind a paywall, or dynamically generated content. It is also a difference when the text is inserted into the DOM by some javascript which loads the content after the page itself is loaded.
The text was updated successfully, but these errors were encountered: