You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The download of congressional records fails if the govinfo.gov server cannot deliver the requested ZIP file within two minutes (i.e., four retries at 30 seconds intervals). Here's an example from the cr2.log file:
INFO:root:Sending request on 2022-06-28 14:09
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=2, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=1, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=0, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:root:Request headers received with code 503
WARNING:root:Unexpected condition, not continuing: 503
WARNING:root:Failed to download file https://www.govinfo.gov/content/pkg/CREC-2018-01-10.zip
WARNING:root:fdsysDL received report that download for 2018-01-10 did not complete.
INFO:root:No record on this day, not trying to extract
WARNING:root:[Errno 2] No such file or directory: 'output/2018/CREC-2018-01-10/mods.xml', skipping.
The corresponding ZIP file in the above example does indeed exist. It simply takes more than two minutes for the govinfo.gov server to create it on the fly. Please note that once the ZIP file is created, it is available for download (for an unknown amount of time, though at least 24 hours). Running the download code again a couple of minutes after the first failure then gives you the file without trouble.
Perhaps a solution to this is simply to send a request to the server for all congressional records in question in order for the server to create the ZIP files, then come around again to collect them in the hope that the server had enough time to create them.
Moreover, I noticed that the code runs through all dates within a given range to check if there are files available. It seems a waste of resources to do that. I suggest to use the govinfo API (https://api.govinfo.gov/docs/ ) to check first which congressional records are actually available within a given date range.
The text was updated successfully, but these errors were encountered:
Hi @blaschke --- thanks, belatedly, for your report.
I wasn't in a position to handle anything other than critical bugs when you created this issue, although I should have responded anyway.
If you are still interested, you are welcome to send a MR that implements the changes you suggest. I simply had not noticed how badly this repo needed freshening up, and I am attending to that now. While I may not have the overhead to make this improvement myself, I have spruced things up here so that it should be easier to contribute fixes.
The download of congressional records fails if the govinfo.gov server cannot deliver the requested ZIP file within two minutes (i.e., four retries at 30 seconds intervals). Here's an example from the cr2.log file:
INFO:root:Sending request on 2022-06-28 14:09
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=2, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=1, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:urllib3.util.retry:Incremented Retry for (url='/content/pkg/CREC-2018-01-10.zip'): Retry(total=0, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Retry: /content/pkg/CREC-2018-01-10.zip
DEBUG:urllib3.connectionpool:https://www.govinfo.gov:443 "GET /content/pkg/CREC-2018-01-10.zip HTTP/1.1" 503 None
DEBUG:root:Request headers received with code 503
WARNING:root:Unexpected condition, not continuing: 503
WARNING:root:Failed to download file https://www.govinfo.gov/content/pkg/CREC-2018-01-10.zip
WARNING:root:fdsysDL received report that download for 2018-01-10 did not complete.
INFO:root:No record on this day, not trying to extract
WARNING:root:[Errno 2] No such file or directory: 'output/2018/CREC-2018-01-10/mods.xml', skipping.
The corresponding ZIP file in the above example does indeed exist. It simply takes more than two minutes for the govinfo.gov server to create it on the fly. Please note that once the ZIP file is created, it is available for download (for an unknown amount of time, though at least 24 hours). Running the download code again a couple of minutes after the first failure then gives you the file without trouble.
Perhaps a solution to this is simply to send a request to the server for all congressional records in question in order for the server to create the ZIP files, then come around again to collect them in the hope that the server had enough time to create them.
Moreover, I noticed that the code runs through all dates within a given range to check if there are files available. It seems a waste of resources to do that. I suggest to use the govinfo API (https://api.govinfo.gov/docs/ ) to check first which congressional records are actually available within a given date range.
The text was updated successfully, but these errors were encountered: