-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSS Feed is not crawled, when cloudflare bot detection is in place #3037
Comments
This is not a bug, there is nothing News can do to force websites to accept the requests. |
ok, thanks. I understand, but as described in the zenrows artikel, the crawler has to look like a human. And there seems to be a way, with the right header informatione. Here the info from cloudflare community: The linked Document leads to this form: can you tell, which user agent header the news crawler is using? Than i can try to fill the form from cloudflare to have it possible whitelisted. I think it's "userAgent":"CloudNews/1776 CFNetwork/1568.300.101 Darwin/24.2.0", right? |
The News App uses NextCloud-News/VERSION (e.g NextCloud-News/25.1.2) as user agent. But as you can see at the botton of the cloudflare form you need some kind of verification like ip list or reverse dns where the crawler come from. Here is another good summary of the problem: I think users should report the problem to the news providers and Cloudflare to get this right in the future. |
It might be able to temporarily trick Cloudflare if it does, but that's not a solution to the problem that the RSS feed can't be fetched by a bot. |
IMPORTANT
Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)
Explain the Problem
Can not ad Feed to News App. Cloudflare Botdetection ist in place.
Steps to Reproduce
Explain what you did to encounter the issue
System Information
Contents of nextcloud/data/nextcloud.log
Paste output here
Contents
Problem seems to be with useragent, because Cloudflare can not identify an no Bot. Problem described here as well: https://www.zenrows.com/blog/curl-bypass-cloudflare#bypass-cloudflare-in-curl. This is following my post in the Issue #2966The text was updated successfully, but these errors were encountered: