Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: consider article:author meta tag as a source of author name metadata #938

Closed
danielnixon opened this issue Dec 29, 2024 · 2 comments · Fixed by #942
Closed

Comments

@danielnixon
Copy link
Contributor

The article:author meta tag is "meant" to contain a URL (see https://developers.facebook.com/blog/post/2013/06/19/platform-updates--new-open-graph-tags-for-media-publishers-and-more/).

On many sites it does seem to contain a URL, but on a number of sites I've tested it contains the author's name.

One example is https://www.atlasobscura.com/articles/the-deck-of-cards-that-made-tarot-a-global-phenomenon

On that site, we have:

<meta property="article:author" content="Laura June Topolsky">

On that site, there are no other better sources of author name, so Readability consults the DOM and arrives at an unfortunate author string of Laura June Topolsky July 10, 2015.

My suggestion:

  1. Consult that meta field when working out the byline (https://github.com/mozilla/readability/blob/main/Readability.js#L1783-L1789)
  2. ... but ignore it if it contains a URL (assume if it contains a non-empty string that isn't a valid URL, it's probably the author's name
  3. Prefer that to the message DOM check that often results in byline's containing extraneous data (often publish date)
@danielnixon
Copy link
Contributor Author

danielnixon commented Dec 29, 2024

Another example of that field containing a non-URL: https://www.newyorker.com/magazine/2024/12/16/president-emmanuel-macron-has-plunged-france-into-chaos

<meta property="article:author" content="Lauren Collins">

Also: https://www.msn.com/en-us/news/world/south-korean-president-apologizes-for-declaring-martial-law-as-he-faces-impeachment-vote/ar-AA1vpHO2

<meta property="article:author" content="Stella Kim">

danielnixon added a commit to danielnixon/readability that referenced this issue Jan 1, 2025
@danielnixon
Copy link
Contributor Author

PR: #942

@gijsk gijsk closed this as completed in #942 Jan 2, 2025
gijsk added a commit that referenced this issue Jan 2, 2025
* Handle article:author meta tag. Fixes #938

* Add newly found BBC byline, revert apparently unnecessarily regex change.

---------

Co-authored-by: Gijs Kruitbosch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant