You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Conserve's current format puts blocks into subdirectories with a 3-hex-digit name, from the first 12 bytes of the hash. So there are up to 1<<12 or 4096 of them. This introduces a blocking mkdir ahead of writing each block file.
The point of this is to reduce the size of any single directory, although that is probably less of a concern on most local filesystems than in years past. It may actually help with rclone/Box, if the client regularly reads whole directories. It may still be a good idea for VFAT USB drives.
It's probably a loss on scalable local filesystems? In particular walking the list of blocks needs to read up to 4096 directories.
There are several options, and in order of priority:
Remember which subdirectories are known to exist (because we already wrote or saw a block in them) and then there's no need to create them.
In addition, at the start of a backup, read the block directory to see which prefixes are present and remember them. This has the added benefit of quickly answering whether a given hash can possibly be present.
Make it tunable so that we can at least experiment with different settings, where 0 means no subdirectories. (It should be stored in some archive metadata. It may not be worth allowing this to be changed once the archive exists.)
I mention the first two first because they are direct efficiency wins that don't require a format change or guessing what's likely to be optimal in any situation, or making the user guess.
The text was updated successfully, but these errors were encountered:
#179 seems pretty interesting but I'm not having the time to join the conversation.
But after #173 I wanted to focus on performance (I'm into bug hunting) and encryption.
I'll probably respond under the week (I'm working weekends).
One more thought from #177, cc @road2react and @WolverinDEV:
Conserve's current format puts blocks into subdirectories with a 3-hex-digit name, from the first 12 bytes of the hash. So there are up to
1<<12
or 4096 of them. This introduces a blockingmkdir
ahead of writing each block file.The point of this is to reduce the size of any single directory, although that is probably less of a concern on most local filesystems than in years past. It may actually help with rclone/Box, if the client regularly reads whole directories. It may still be a good idea for VFAT USB drives.
It's probably a loss on scalable local filesystems? In particular walking the list of blocks needs to read up to 4096 directories.
There are several options, and in order of priority:
I mention the first two first because they are direct efficiency wins that don't require a format change or guessing what's likely to be optimal in any situation, or making the user guess.
The text was updated successfully, but these errors were encountered: