You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to propose optimizing block management for uncompressed blocks in DwarFS. As it currently stands, uncompressed blocks are treated the same way as compressed blocks, meaning they are still loaded into memory and read sequentially from the beginning of the block from disk. This approach can be inefficient, especially when there is frequent access to uncompressed blocks. By allowing random access to the block without reading everything before the segment we need, or even not loading the block into memory at all, we could potentially save a significant amount of private memory.
mmap() could potentially enable efficient random access to uncompressed blocks and possibly eliminate the need to manually load them into memory entirely.
This feature would also be beneficial for the mkdwarfs process. If uncompressed blocks do not occupy private memory, they would not need to be counted toward the --max-lookback-blocks (-B) quota. This approach could effectively enlarge the deduplication lookup window without increasing the memory footprint. This idea is orthogonal to the proposal in #138, and these two methods can be combined to further optimize the deduplication process. For uncompressed blocks, they can still extend with byte granularity since mmap() allows for cheap random access.
I hope this proposal makes sense and I look forward to hearing your thoughts on its feasibility.
The text was updated successfully, but these errors were encountered:
This is a great observation and for the first case, it's trivial to implement. I've got it working in a branch and will push the code once I've got a proper internet connection.
I would like to propose optimizing block management for uncompressed blocks in DwarFS. As it currently stands, uncompressed blocks are treated the same way as compressed blocks, meaning they are still loaded into memory and read sequentially from the beginning of the block from disk. This approach can be inefficient, especially when there is frequent access to uncompressed blocks. By allowing random access to the block without reading everything before the segment we need, or even not loading the block into memory at all, we could potentially save a significant amount of private memory.
mmap()
could potentially enable efficient random access to uncompressed blocks and possibly eliminate the need to manually load them into memory entirely.This feature would also be beneficial for the
mkdwarfs
process. If uncompressed blocks do not occupy private memory, they would not need to be counted toward the--max-lookback-blocks
(-B) quota. This approach could effectively enlarge the deduplication lookup window without increasing the memory footprint. This idea is orthogonal to the proposal in #138, and these two methods can be combined to further optimize the deduplication process. For uncompressed blocks, they can still extend with byte granularity sincemmap()
allows for cheap random access.I hope this proposal makes sense and I look forward to hearing your thoughts on its feasibility.
The text was updated successfully, but these errors were encountered: