RSS Feed is not crawled, when cloudflare bot detection is in place #3037

jassi0001 · 2025-01-09T08:23:27Z

IMPORTANT

Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)

I have read the CONTRIBUTING.md and followed the provided tips
I accept that the issue will be closed without comment if I do not check here
I accept that the issue will be closed without comment if I do not fill out all items in the issue template.

Explain the Problem

Can not ad Feed to News App. Cloudflare Botdetection ist in place.

Steps to Reproduce

Explain what you did to encounter the issue

Adding the Feed https://www.sparbote.de/rss
getting Error 403 , "Just a moment..." back from Cloudflare's Botdetection

System Information

News app version: 25.1
Nextcloud version: 30.0.2
Cron type: (system cron/python updater/...)
PHP version: 8.3
Database and version: MariaDB
Browser and version: Edge
OS and version: W11 24H2

Contents of nextcloud/data/nextcloud.log

Paste output here

Contents

Problem seems to be with useragent, because Cloudflare can not identify an no Bot. Problem described here as well: https://www.zenrows.com/blog/curl-bypass-cloudflare#bypass-cloudflare-in-curl. This is following my post in the Issue #2966

SMillerDev · 2025-01-09T08:28:02Z

This is not a bug, there is nothing News can do to force websites to accept the requests.

jassi0001 · 2025-01-09T08:33:52Z

ok, thanks. I understand, but as described in the zenrows artikel, the crawler has to look like a human. And there seems to be a way, with the right header informatione.

Here the info from cloudflare community:
https://community.cloudflare.com/t/custom-bot-getting-403-from-cloudflare/342129

The linked Document leads to this form:
When you take a look at https://docs.google.com/forms/d/e/1FAIpQLSdqYNuULEypMnp4i5pROSc-uP6x65Xub9svD27mb8JChA_-XA/viewform

can you tell, which user agent header the news crawler is using? Than i can try to fill the form from cloudflare to have it possible whitelisted. I think it's "userAgent":"CloudNews/1776 CFNetwork/1568.300.101 Darwin/24.2.0", right?

wofferl · 2025-01-09T09:03:01Z

ok, thanks. I understand, but as described in the zenrows artikel, the crawler has to look like a human. And there seems to be a way, with the right header informatione.

Here the info from cloudflare community: https://community.cloudflare.com/t/custom-bot-getting-403-from-cloudflare/342129

The linked Document leads to this form: When you take a look at https://docs.google.com/forms/d/e/1FAIpQLSdqYNuULEypMnp4i5pROSc-uP6x65Xub9svD27mb8JChA_-XA/viewform

can you tell, which user agent header the news crawler is using? Than i can try to fill the form from cloudflare to have it possible whitelisted. I think it's "userAgent":"CloudNews/1776 CFNetwork/1568.300.101 Darwin/24.2.0", right?

The News App uses NextCloud-News/VERSION (e.g NextCloud-News/25.1.2) as user agent.

But as you can see at the botton of the cloudflare form you need some kind of verification like ip list or reverse dns where the crawler come from.
Since this is no single service it will be hard to get the News App white listed.

Here is another good summary of the problem:
https://openrss.org/blog/using-cloudflare-on-your-website-could-be-blocking-rss-users

I think users should report the problem to the news providers and Cloudflare to get this right in the future.

SMillerDev · 2025-01-09T09:43:53Z

ok, thanks. I understand, but as described in the zenrows artikel, the crawler has to look like a human. And there seems to be a way, with the right header informatione.

It might be able to temporarily trick Cloudflare if it does, but that's not a solution to the problem that the RSS feed can't be fetched by a bot.

jassi0001 added the bug label Jan 9, 2025

SMillerDev closed this as not planned Won't fix, can't repro, duplicate, stale Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSS Feed is not crawled, when cloudflare bot detection is in place #3037

RSS Feed is not crawled, when cloudflare bot detection is in place #3037

jassi0001 commented Jan 9, 2025 •

edited

Loading

SMillerDev commented Jan 9, 2025

jassi0001 commented Jan 9, 2025 •

edited

Loading

wofferl commented Jan 9, 2025

SMillerDev commented Jan 9, 2025

RSS Feed is not crawled, when cloudflare bot detection is in place #3037

RSS Feed is not crawled, when cloudflare bot detection is in place #3037

Comments

jassi0001 commented Jan 9, 2025 • edited Loading

IMPORTANT

Explain the Problem

Steps to Reproduce

System Information

SMillerDev commented Jan 9, 2025

jassi0001 commented Jan 9, 2025 • edited Loading

wofferl commented Jan 9, 2025

SMillerDev commented Jan 9, 2025

jassi0001 commented Jan 9, 2025 •

edited

Loading

jassi0001 commented Jan 9, 2025 •

edited

Loading