Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to Download Requested ZIP Files #48

Open
blaschke opened this issue Jun 29, 2022 · 1 comment
Open

Failing to Download Requested ZIP Files #48

blaschke opened this issue Jun 29, 2022 · 1 comment
Labels

Comments

@blaschke
Copy link

The download of congressional records fails if the govinfo.gov server cannot deliver the requested ZIP file within two minutes (i.e., four retries at 30 seconds intervals). Here's an example from the cr2.log file:

INFO:root:Sending request on 2022-06-28 14:09
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=2, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=1, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=0, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:root:Request headers received with code 503
WARNING:root:Unexpected condition, not continuing: 503
WARNING:root:Failed to download file https://www.govinfo.gov/content/pkg/CREC-2018-01-10.zip
WARNING:root:fdsysDL received report that download for 2018-01-10 did not complete.
INFO:root:No record on this day, not trying to extract
WARNING:root:[Errno 2] No such file or directory: 'output/2018/CREC-2018-01-10/mods.xml', skipping.

The corresponding ZIP file in the above example does indeed exist. It simply takes more than two minutes for the govinfo.gov server to create it on the fly. Please note that once the ZIP file is created, it is available for download (for an unknown amount of time, though at least 24 hours). Running the download code again a couple of minutes after the first failure then gives you the file without trouble.

Perhaps a solution to this is simply to send a request to the server for all congressional records in question in order for the server to create the ZIP files, then come around again to collect them in the hope that the server had enough time to create them.

Moreover, I noticed that the code runs through all dates within a given range to check if there are files available. It seems a waste of resources to do that. I suggest to use the govinfo API (https://api.govinfo.gov/docs/ ) to check first which congressional records are actually available within a given date range.

@nclarkjudd nclarkjudd added the bug label Feb 11, 2024
@nclarkjudd
Copy link
Contributor

Hi @blaschke --- thanks, belatedly, for your report.

I wasn't in a position to handle anything other than critical bugs when you created this issue, although I should have responded anyway.

If you are still interested, you are welcome to send a MR that implements the changes you suggest. I simply had not noticed how badly this repo needed freshening up, and I am attending to that now. While I may not have the overhead to make this improvement myself, I have spruced things up here so that it should be easier to contribute fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants