Consider allowing user to define 'pages' for very large collections, and an option to resume downloads at page w/o scanning from start each run #604

Russtopia · 2023-08-20T05:17:13Z

Thanks for such a useful tool! Amazing and important work.

Context: I want to download a very large collection that has 439,642 items. I'd like to do a backup, but do it in phases over multiple days/weeks by running the tool for a few hours each day, picking up where I left off.

Each time I start the ia tool in the download directory on my drive, it properly detects the items it has already downloaded, but this can take quite some time to go through all of those items before it finally 'catches up' to the first un-downloaded file in the collection. It is going to get very painful to restart a download from, say, item 200,000 when the tool has to scan through the first 200k items just to see they're already present locally...

I notice the tool outputs a line for each download of the form

<file> (n/m): dddd - success

If the tool took its calculated m and allowed the user to define a pagesize p or number of pages P, and divided m by p or P, then the user could abort the tool (say to stop transfers for the night, then restart it the next day, specifying to restart at page 'p' such that the tool could more quickly 'seek' past the large number of items that have already been downloaded, without the tool having to scan always from item 1.

(Follow-up: Perhaps some filesystems will have issues storing so many items in a single download dir as well; the collection itself doesn't appear to have folder structure, so this 'page' feature would also benefit from storing downloads in subdirs automatically named eg, 'page_n_of_m' so that 400k files didn't all end up in one download dir.)

(Follow-up #2: This way of 'paginating' a collection download would also ease collaboration of many people each downloading a portion of a collection, then consolidating their downloads into the entire collection later; eg. persons A,B,C all agree to split a collection into pages of 1,000 items; A agrees to download pages 1, person B downloads page 2, C downloads page 3 ...)

The text was updated successfully, but these errors were encountered:

jjjake · 2023-08-28T18:07:09Z

See: #605 (comment)

Russtopia changed the title ~~Consider alowing user to define 'pages' for very large collections, and an option to resume downloads at page n~~ Consider allowing user to define 'pages' for very large collections, and an option to resume downloads at page w/o scanning from start each run Aug 20, 2023

jjjake closed this as completed Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider allowing user to define 'pages' for very large collections, and an option to resume downloads at page w/o scanning from start each run #604

Consider allowing user to define 'pages' for very large collections, and an option to resume downloads at page w/o scanning from start each run #604

Russtopia commented Aug 20, 2023 •

edited

Loading

jjjake commented Aug 28, 2023

Consider allowing user to define 'pages' for very large collections, and an option to resume downloads at page w/o scanning from start each run #604

Consider allowing user to define 'pages' for very large collections, and an option to resume downloads at page w/o scanning from start each run #604

Comments

Russtopia commented Aug 20, 2023 • edited Loading

jjjake commented Aug 28, 2023

Russtopia commented Aug 20, 2023 •

edited

Loading