You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for such a useful tool! Amazing and important work.
Context: I want to download a very large collection that has 439,642 items. I'd like to do a backup, but do it in phases over multiple days/weeks by running the tool for a few hours each day, picking up where I left off.
Each time I start the ia tool in the download directory on my drive, it properly detects the items it has already downloaded, but this can take quite some time to go through all of those items before it finally 'catches up' to the first un-downloaded file in the collection. It is going to get very painful to restart a download from, say, item 200,000 when the tool has to scan through the first 200k items just to see they're already present locally...
I notice the tool outputs a line for each download of the form
<file> (n/m): dddd - success
If the tool took its calculated m and allowed the user to define a pagesize p or number of pages P, and divided m by p or P, then the user could abort the tool (say to stop transfers for the night, then restart it the next day, specifying to restart at page 'p' such that the tool could more quickly 'seek' past the large number of items that have already been downloaded, without the tool having to scan always from item 1.
(Follow-up: Perhaps some filesystems will have issues storing so many items in a single download dir as well; the collection itself doesn't appear to have folder structure, so this 'page' feature would also benefit from storing downloads in subdirs automatically named eg, 'page_n_of_m' so that 400k files didn't all end up in one download dir.)
(Follow-up #2: This way of 'paginating' a collection download would also ease collaboration of many people each downloading a portion of a collection, then consolidating their downloads into the entire collection later; eg. persons A,B,C all agree to split a collection into pages of 1,000 items; A agrees to download pages 1, person B downloads page 2, C downloads page 3 ...)
The text was updated successfully, but these errors were encountered:
Russtopia
changed the title
Consider alowing user to define 'pages' for very large collections, and an option to resume downloads at page n
Consider allowing user to define 'pages' for very large collections, and an option to resume downloads at page w/o scanning from start each run
Aug 20, 2023
Thanks for such a useful tool! Amazing and important work.
Context: I want to download a very large collection that has 439,642 items. I'd like to do a backup, but do it in phases over multiple days/weeks by running the tool for a few hours each day, picking up where I left off.
Each time I start the
ia
tool in the download directory on my drive, it properly detects the items it has already downloaded, but this can take quite some time to go through all of those items before it finally 'catches up' to the first un-downloaded file in the collection. It is going to get very painful to restart a download from, say, item 200,000 when the tool has to scan through the first 200k items just to see they're already present locally...I notice the tool outputs a line for each download of the form
<file> (n/m): dddd - success
If the tool took its calculated
m
and allowed the user to define a pagesizep
or number of pagesP
, and dividedm
byp
orP
, then the user could abort the tool (say to stop transfers for the night, then restart it the next day, specifying to restart at page 'p' such that the tool could more quickly 'seek' past the large number of items that have already been downloaded, without the tool having to scan always from item 1.(Follow-up: Perhaps some filesystems will have issues storing so many items in a single download dir as well; the collection itself doesn't appear to have folder structure, so this 'page' feature would also benefit from storing downloads in subdirs automatically named eg, 'page_n_of_m' so that 400k files didn't all end up in one download dir.)
(Follow-up #2: This way of 'paginating' a collection download would also ease collaboration of many people each downloading a portion of a collection, then consolidating their downloads into the entire collection later; eg. persons A,B,C all agree to split a collection into pages of 1,000 items; A agrees to download pages 1, person B downloads page 2, C downloads page 3 ...)
The text was updated successfully, but these errors were encountered: