You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am attempting to index a WARC from Archive-It using ipwb from the current master branch [abaa35a](https://github.com/oduwsdl/ipwb/commit/abaa35ab13902dd06948a3e8d7b466ee89bf780f) and am receiving a timeout error. The WARC is about 131 MB and contains 3600 records, as reported by the index indexer.
At record 3375, the indexing output stalls and eventually says:
IPFS failed to add, retrying attempt 1/59-ANNUAL-KBAWJW-20110217001046-00000-crawling113.us.archive.org-6682.warc: 3375/3600
(<class 'requests.exceptions.ConnectionError'>, ConnectionError(ReadTimeoutError("HTTPConnectionPool(host='localhost', port=5001): Read timed out.")), <traceback object at 0x1059252c0>)
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ipwb/indexer.py", line 66, in push_to_ipfs
http_header_ipfs_hash = push_bytes_to_ipfs(hstr)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ipwb/indexer.py", line 367, in push_bytes_to_ipfs
res = ipfs_client().add_bytes(bytes_in) # bytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ipfshttpclient/utils.py", line 195, in wrapper
res = cmd(*args, **kwargs) # type: ty.Dict[str, T]
^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ipfshttpclient/client/base.py", line 229, in wrapper2
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ipfshttpclient/client/__init__.py", line 264, in add_bytes
return self._client.request('/add', decoder='json',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ipfshttpclient/http_common.py", line 594, in request
return stream_decode_full(closables, res,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ipfshttpclient/http_common.py", line 189, in stream_decode_full
result = list(response_iter) # type: ty.List[T_co]
^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/ipfshttpclient/http_common.py", line 131, in __next__
data = next(self._response_iter) # type: bytes
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/requests/models.py", line 822, in generate
raise ConnectionError(e)
Kubo/ipfs 0.28.0
CTRL-C'ing the IPFS daemon keeps the process open. Killing the daemon, restarting the daemon, and restarting the indexing allows for the process to succeed, i.e., the indexing completes. I am wondering if this is an issue with indexing a certain quantity of records and overloading the daemon before it has responded. This would allow the second run of the daemon to not need to re-add records from the previous run and perhaps complete.
Note that this effect does not seem to occur on WARCs that are slightly smaller (like this one (125 MB) and this one (108 MB) from the same crawl)).
The text was updated successfully, but these errors were encountered:
I attempted to reproduce this on a machine that had ipfs 0.18 installed and the issue did not occur -- the large WARC indexed completely with success.
EDIT: I then updated my version of kubo to the latest, 0.29.0 and the error still did not occur. Next, I'll try reproducing this with ipfs 0.28.0
EDIT2: I attempted to replicate this issue with kubo 0.28.0 and was unable to do so. Closing this ticket and will re-open if the issues shows up again.
I am attempting to index a WARC from Archive-It using ipwb from the current master branch
[abaa35a](https://github.com/oduwsdl/ipwb/commit/abaa35ab13902dd06948a3e8d7b466ee89bf780f)
and am receiving a timeout error. The WARC is about 131 MB and contains 3600 records, as reported by the index indexer.At record 3375, the indexing output stalls and eventually says:
Kubo/ipfs 0.28.0
CTRL-C'ing the IPFS daemon keeps the process open. Killing the daemon, restarting the daemon, and restarting the indexing allows for the process to succeed, i.e., the indexing completes. I am wondering if this is an issue with indexing a certain quantity of records and overloading the daemon before it has responded. This would allow the second run of the daemon to not need to re-add records from the previous run and perhaps complete.
Note that this effect does not seem to occur on WARCs that are slightly smaller (like this one (125 MB) and this one (108 MB) from the same crawl)).
The text was updated successfully, but these errors were encountered: