Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network usage #49

Open
Wilaz opened this issue Sep 14, 2024 · 13 comments
Open

Network usage #49

Wilaz opened this issue Sep 14, 2024 · 13 comments

Comments

@Wilaz
Copy link

Wilaz commented Sep 14, 2024

I'm not sure what causes this, but GChan has been using 80+ Mbps of bandwidth while sitting idle doing nothing.
attached are the logs
GChan.log

@Issung
Copy link
Owner

Issung commented Oct 17, 2024

Hi @Wilaz the GChan program has been plagued recently with errors. The high use of bandwith you are seeing comes from requests being sent way too fast and getting repeatedly blocked. These errors come from new rate limiting implemented on the 4chan servers that kick in when too many requests are sent. I myself was IP banned from 4chan and even if I had time to fix the issues I would not be able to test them. I've finally been unbanned and I have time now to update the program to try and play nicely with these new restrictions.

I'm letting you all know I have this work in progress, as you can see on GitHub or follow in t he Discord in #⁠github-alerts. I write this program in my spare time, which is very sparse now compared to when I started the program. Any donations are very much appreciated, no matter the size. As gratitude for the work done, and motivation for the work to come. Thank you for listening and sticking around.

Donation options:
https://github.com/sponsors/Issung
https://www.paypal.com/paypalme/Issung
https://ko-fi.com/issung

@Wilaz
Copy link
Author

Wilaz commented Oct 17, 2024

Do you accept Monero (XMR)?

@Issung
Copy link
Owner

Issung commented Oct 25, 2024

@Wilaz Yes, thank you :)
474nPEA2498X3Lad1LFfVPJBEhs1USog77UK5AHUWed67CmHd8TCnjM2j9BvYUPT7ZTTfQJgHmsZY7uoxYrG7AKiFDXDVWS

@vlad-patras
Copy link

Hi @Issung
This commit fixes the issues with bandwidth and thread html not being downloaded:
Fix 429 / Too many requests
Cookies or some other state on the web client triggers Cloudflare's DoS protection, so the change is to create a new instance for every request.

Also, unrelated to the issue, all URLs are now https. Setting that instead of http would avoid a redirect (which is handled internally by the web client).

@Issung
Copy link
Owner

Issung commented Oct 29, 2024

@vlad-patras Hi mate, thanks for the heads up.

A big branch is currently in development here to entirely overhaul the rate limiting to obey the 4chan suggestions, switch from WebClient to HttpClient and perform everything with async.

I will use Fiddler to investigate the difference in behaviour between WebClient and HttpClient, and I'll also verify what you're saying about http/https. The documentation says to use whichever you like, but maybe it's out of date.

@Issung
Copy link
Owner

Issung commented Oct 29, 2024

You're right about the HTTPs move, thank you. This will help :)
image

@vlad-patras
Copy link

I just tried the branch, no issues so far.
It doesn't look like it loads the database from the previous version, but I just copied over with the copy to clipboard feature.

@vlad-patras
Copy link

vlad-patras commented Nov 21, 2024

After using the branch a bit more I seem to have found two issues, both have to do with saving the threads and resuming later.

Issue 1: When starting GChan with saved threads, if there was a change to a thread, all assets will be downloaded again.
This seems to be because SeenAssetIds in Thread.cs is not loaded from the database (even though the field comment says it should be). The application will start with an empty set and upon scraping the thread it will see every result as a new asset.

Issue 2: If downloads are in queue for a thread and the application is closed, it will not resume download after opening (unless the thread has updates).
This seems to be caused by the new functionality to avoid scraping if the thread was not updated. When a thread has no changes (detected through IfModifiedSince header) the 4chan server will respond with 302 Not Modified. This is generally OK since it avoids unnecessary processing. However when a thread never gets further updates (ex. due to reaching post limit), it will result lost files.

@vlad-patras
Copy link

These changes should fix issues above: Fix resuming download after app close and re-open

SeenAssetIds is initialized with saved IDs from DB, which I assume was intended given the field comment. This avoids re-downloads since there's already a check not to download seen ids.

IfModifiedSince is disabled if previous known file count is greater than seen ids count. This results in the thread being scraped again and new assets to be queued.

@Issung
Copy link
Owner

Issung commented Nov 22, 2024

@vlad-patras I love how familiar you're getting with the code, that you can spot problems I've missed! This is the exact type of thing I was hoping to get out of releasing early beta versions. I'll get that fixed soon as I get a moment :)

@Issung
Copy link
Owner

Issung commented Nov 29, 2024

@vlad-patras New release with your fixes here: https://github.com/Issung/GChan/releases/tag/v6.3-beta :)

@vlad-patras
Copy link

vlad-patras commented Nov 29, 2024

@Issung Thanks! It's great having a project still maintained to keep up with all the 4chan changes.

I got the latest beta changes and looks to be working fine.

There is another small issue I encountered. When the subject can't be determined, even with the new rules, sanitization fails with a null reference. These threads are quite rare, but here is an example: vm/thread/1516424

2024-11-27 23:34:10.7624 [Error] System.NullReferenceException: Object reference not set to an instance of an object. ...\Thread.cs:126 System.NullReferenceException: Object reference not set to an instance of an object. at GChan.Utils.RemoveCharactersFromString(String input, Char[] chars) in ....\Utils.cs:line 231

I just added a null check for me so it starts downloading. Maybe the first comment can be trimmed instead of ignored when too long. But the "No Subject" doesn't bother me.

@Issung
Copy link
Owner

Issung commented Nov 29, 2024

Too easy, new release with the fix: https://github.com/Issung/GChan/releases/tag/v6.3.1-beta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants