-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2: Metadata logging, custom attributes, inline files, and a major version bump #85
Conversation
@geky this is awesome! with inline files, it looks like lfs becomes viable for small fs-es. what about GC/wiping though? |
I wasn't planning on it this release. Since we can add a wiping function on a minor release it's less of a priority. My goal is to get this set of changes out of the way before looking at the next steps. |
@geky fair enough! |
If you configured (in your code) that max length of name is (say) 256 bytes, but the file system you mount has just (say) 100 stored in the superblock, would that work? Or do you have to have them equal? If they have to be equal, you would not be able to mount two different file systems in the same application. |
Yep, this was the specific problem I was trying to solve. So now if you mount a filesystem with 100-byte filenames on a device that uses 256-byte filenames, littlefs will respect the 100-byte filenames and return LFS_ERR_NAMETOOLONG if you try to create a file that exceeds 100 bytes. The only downside is you need to be careful about what settings you use to format. If you format with 256-byte filenames on a PC and then try to mount on the 100-byte filename device, mount will fail with LFS_ERR_INVAL (and a debug message saying file_max is too big). |
7447752
to
533b3bf
Compare
021b02d
to
a326e47
Compare
eb7b7c7
to
cb62bf2
Compare
The separation of data-structure vs entry type has been implicit for a while now, and even taken advantage of to simplify the traverse logic. Explicitely separating the data-struct and entry types allows us to introduce new data structures (inlined files).
Previously, commits could only come from memory in RAM. This meant any entries had to be buffered in their entirety before they could be moved to a different directory pair. By adding parameters for specifying commits from existing entries stored on disk, we allow any sized entries to be moved between directory pairs with a fixed RAM cost.
This only required adding NULLs where commit statements were not fully initialized. Unfortunately we still need -Wno-missing-field-initializers because of a bug in GCC that persists on Travis. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60784 Found by apmorton
This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
I've got a PR in the ESP8266 for a LittleFS filesystem extension, and it's been running great with v1 ( I upgraded the PR to I just wrote a small MCVE (the only differences in v1 vs. v2 in the userland are in the cfg setup, as commented). Basically, it seems in V2 if I open a file in O_CREAT mode and write a small amount, then close it, the file is lost completely...
When compiled with GCC I can read the data in V1:
and in V2-alpha the file reopen fails
Have others seem similar behavior, or am I doing something simply wrong here? Thx |
Now with graphs! Images are stored on the branch gh-images in an effort to avoid binary bloat in the git history. Also spruced up SPEC.md and README.md and ran a spellechecker over the documentation. Favorite typo so far was dependendent, which is, in fact, not a word.
Also fixed issue where migration would not handle large dirs due to v1 iteration changing the pair of the directory.
Hi @apmorton, sorry about the late response, I didn't realize I never responded to this comment.
This is actually used (abused?) to create signed tags that can be summed with tags to modify specific fields. This was useful for changing ids in the tag be relative offsets. You're right this may be less safe. It may be better to have a sort of That being said, it did look like it would require a bit of work, so I'm going to consider it low priority until v2 is released. I may circle back around to look at it later. Feel free to create an issue or a PR for a safer @earlephilhower, I'm looking into your issue now, sorry about the long delay. Thanks for the valuable MCVE! |
The issue here is how commits handle padding to the nearest program size. This is done by exploiting the size field of the LFS_TYPE_CRC tag that completes the commit. Unfortunately, during developement, the size field shrank in size to make room for more type information, limiting the size field to 1024. Normally this isn't a problem, as very rarely do program sizes exceed 1024 bytes. However, using a simulated block device, user earlephilhower found that exceeding 1024 caused littlefs to crash. To make this corner case behave in a more user friendly manner, I've modified this situtation to treat >1024 program sizes as small commits that don't match the prog size. As a part of this, littlefs also needed to understand that non-matching commits indicate an "unerased" dir block, which would be needed for portability (something which notably lacks testing). This raises the question of if the tag size field size needs to be reconsidered, but to change that at this point would need a new major version. found by earlephilhower
Thanks @earlephilhower for the bug report. I've pushed up a fix, but let me know if there's still issues. Turns out the problem was prog_size > 1024 (the tag size limit). littlefs wasn't properly handling this and causing a crash. I haven't heard of any storage devices that have program sizes that large, but it doesn't hurt to handle this case better. I've modified this situation so it will treat any prog_size > 1024 as prog_size = block_size. This isn't perfect, but allows littlefs to work with large prog_sizes. It's possible to use multiple commits to pad the prog_size and avoid the tag size limit, but I will consider this up in the air for a future improvement if we start seeing many devices with >1024 prog_sizes. |
lfs_file_sync was not correctly setting the LFS_F_ERRED flag. Fortunately this is a relatively easy fix. LFS_F_ERRED prevents further issues from occuring when cleaning up resources with lfs_file_close. found by TheLoneWolfling
Sorry about the delays, my absence, and completely missing the target goal of this release. From what I've learned future releases will be very different. Fortunately, I'm not dead, and hopefully I will have more time to work on littlefs, though I have quite a bit of backlog to work through. Sorry if I have yet to get to your open issues. With that out of the way, littlefs v2 is ready to merge! Baring emergencies, I will be releasing v2 tomorrow. Documentation is complete, with an updated DESIGN.md and SPEC.md. As a part of this release, a best-effort migration function is available that can convert a v1 filesystem to v2 filesystem in most cases. If it fails, an error is returned and the v1 filesystem is left unmodified. More info here, and I will be documenting this as a part of the v2 release notes in GitHub. Also, following the discussion on #127, releases now have three special branch/tags that are generated as a part of CI:
I believe most v2 specific issues that have been raised are now fixed on the v2-alpha branch. Issues that are present in both v1 and v2 can wait until after merging v2. As always, feel free to leave any feedback, feedback is always useful. And thanks for the support so far. |
In v2, the lookahead_buffer was changed from requiring 4-byte alignment to requiring 8-byte alignment. This was not documented as well as it could be, and as FabianInostroza noted, this also implies that lfs_malloc must provide 8-byte alignment. To protect against this, I've also added an assert on the alignment of both the lookahead_size and lookahead_buffer. found by FabianInostroza and amitv87
- Shifting signed 32-bit value by 31 bits is undefined behaviour This was an interesting one as on initial inspection, `uint8_t & 1` looks like it will result in an unsigned variable. However, due to uint8_t being "smaller" than int, this actually results in a signed int, causing an undefined shift operation. - Identical inner 'if' condition is always true (outer condition is 'true' and inner condition is 'true'). This was caused by the use of `if (true) {` to avoid "goto bypasses variable initialization" warnings. Using just `{` instead seems to avoid this problem. found by keck-in-space and armandas
This is an expirement to determine which field in the tag structure is the most critical: tag id or tag size. This came from looking at NAND storage and discussions around behaviour of large prog_sizes. Initial exploration indicates that prog_sizes around 2KiB are not _that_ uncommon, and the 1KiB limitation is surprising. It's possible to increase the lfs_tag size to 12-bits (4096), but at the cost of only 8-bit ids (256). [---- 32 ----] a [1|-3-|-- 8 --|-- 10 --|-- 10 --] b [1|-3-|-- 8 --|-- 8 --|-- 12 --] This requires more investigation, but in order to allow us to change the tag sizes with minimal impact I've artificially limited the number of file ids to 0xfe (255) different file ids per metadata pair. If 12-bit lengths turn out to be a bad idea, we can remove the artificial limit without backwards incompatible changes. To avoid breaking users already on v2-alpha, this change will refuse _creating_ file ids > 255, but should read file ids > 255 without issues.
I've thrown together a last minute comparison of the relative performance of littlefs v1 and v2. This was built by simulating both v1 and v2 locally and measuring the number of bytes read/prog/erased as well as the total size of the filesystem at the end. I then divided this result by the count*size of the files in the test to get a multiplicative cost, which is a bit easier to compare. Smaller numbers are better. As expected, the performance does not change much for reading (and even gets worse on NAND). However, the prog/erase performance is much better. This is desired as erasing has a much higher runtime penalty than reading, sometimes even ~100x the cost (citation needed). There's also a moderate improvement to storage consumption thanks to inline files, though note that the storage consumption converges as the file size increases. Also note that these are on the relatively small scale of storage. NOR flashNAND flashMCU internal flashSD/eMMC |
Ok, v2 merge time! |
Thanks, @geky. Your fix worked wonders on the 8266 port and now we're back in action. Appreciate your help on it! I will re-check if we might use a smaller program size since it seems I'm doing something a little different than your general use case. |
This is the culmination of work that just started as adding custom attributes (#23). And now it's coming in with metadata logging, inline files, xor globals, independent cache sizes, and a bunch of other smaller features piggybacking on what is going to have to be a major version bump. It's been an unexpectedly crazy journey.
Note! There's still a lot to do, mostly around documentation and prepare for a major version bump.But I wanted to get this out earlier for feedback.So! What's new?
Big things:
Metadata logging
You heard it here folks, the metadata pairs can now be updated incrementally. This makes them act like little two-block logs.
Logs have a few challenges, most notably, you either pay a O(n) RAM cost or you pay a O(n^2) for garbage collection (SPIFFS cleverly gets around this by twiddling previously bits, but this is limited to raw flash). But because these logs are limited to two blocks, the O(n^2) cost isn't that bad. And if your cache size is large enough, the number of disk reads is reduced to O(n). This is similar to the tradeoffs in nvstore I believe.
There's also several other tricks that I need to document before this gets merged, such as only using one pass for name lookups and using xor-linked-lists to enable forward+backward iteration.
This should improve low-level performance immensely, especially on storage with expensive erase operations (cough NOR flash cough). Additionally, entries are trivially resizable, which caused a bit of a hiccup for v1.
Unfortunately, this ended up requiring a change to the structures stored on disk. So this will probably be the major bump to littlefs v2. One fortune is that the changes are only to metadata. And hey guess what, all metadata blocks are pairs where one block is redundant. So in theory it should be easy to write an upgrade function that runs in place, simply using the backup blocks in the metadata pairs. I'm going to try to implement this before merging v2 in.
Custom attributes
It's now possible to get and set custom attributes on files, directories, and the superblock.
This isn't quite the same as getxattr/setxattr, instead uses a byte identifier to lookup attributes. Additionally the API is a bit different to support atomic updates with file writes.
Most notably, custom attributes take advantage of the config struct added for optional config per file (Added possibility to open multiple files with LFS_NO_MALLOC enabled #58).
More info over here: Manage generic attributes #23
Inline files
Now small files (<1KiB) can be inlined directly in the directory block instead of getting their own block, which can waste a lot of space on devices with larger block sizes or fewer blocks.
Note: Because inline files must be entirely stored in a file's cache, a configurable inline_max attribute is written to the superblock at format time. If the device has less RAM and a smaller cache_size, the filesystem won't be able to mount. By default inline_max = cache_size.EDIT: This is has been fixed! Now littlefs can read filesystems with any inline size, it's only write time that is limited to cache_size.
Combined with metadata logging, inline files have a lot of potential. The littlefs should no longer be a bad choice for internal flash and NAND as long as you make your inline_max large enough.
Xor globals, move problem solved
So this kinda came out of nowhere. But while implementing inline files, I ran into a bit of a problem. The way littlefs handles moves depends on the uniqueness of an directory entry's contents. But inline file's aren't unique at all, and by making two inline files with the same data every thing broke.
So I dusted an old idea I had about maintaining distributed global state and managed to get it working. This pushed everything back a bit, but I think the end result is a big improvement.
Basically every metadata pair has a copy of global state, and during mount they all get xored together. This allows a commit to any metadata pair the opportunity to change global state for atomically.
Toss on an entry for moves and some logic to fix them, and now littlefs can recover from moves O(1) worst case. This is a big improvement from the O(n) search in v1.
As an added plus, by putting this logic in the dir commit function, littlefs was able to gracefully handle shortcut renames in the same directory by accident.
This logic also has potential for future improvements (global free list?), but does have a big RAM cost, since each global element has a copy in every directory structure.
Independent cache sizes
Before, the read_size and prog_size config options doubled as a way to increase the RAM used for read/prog operations. However with metadata logging, this presented a paradox. Log updates benefit from small prog sizes, but other operations benefit from large prog sizes.
Additionally, there were some issues floating around with performance with large read sizes, since large read sizes previously required the full size get read, this made ctz-list traversal expensive.
So now there's a new cache_size config option independent of read_size and prog_size. read_size and prog_size should be as small as allowed by the underling device, and cache_size should be as big as you're willing to give littlefs RAM.
Expanding superblocks
A minor feature, but riding on the coattails of the breaking changes is the addition of expanding superblocks. Instead of permanently allocating blocks 0 and 1 and the superblock, littlefs now starts off by reusing blocks 0 and 1 as the root directory. However, as soon as it sees more than 8 erases on blocks 0 and 1 (currently an arbitrary number, but may be smarter in the future), littlefs splits out the root directory from the superblock.
But this doesn't stop there, if littlefs again sees more than 8 erases, it will add another superblock. Because each new superblock requires a full lifetime before it causes a modification to travel to it's parent, the number of erases needed for the next superblock grows exponentially.
This is a benefit for both small device, which no longer need the extra superblock, and large devices, which could exceed the number of erase cycles on the superblock if the root directory was written 10^10 times.
Small things:
Configurable name max
I added an inline_max configuration option to the superblock, since knowing this is required for portability and there's not really a good value to assume. So why not other configurables? Now the superblock also tracks name max, which means you can push the RAM consumption of the lfs_stat struct down and still be portable.
Added lfs_fs_size
A function to get a count of the used blocks on the filesystem. This is just a wrapper over lfs_traverse, but should be easier to use. Probably could have come in on a minor release, oh well.
Dropped global file buffer for local file buffers
Thanks to @dpgeorge's patch for file-level config (Added possibility to open multiple files with LFS_NO_MALLOC enabled #58), we have a better way to provide files with buffers.
Since we're bumping the major version, might as well remove the old way.
Better update tracking
Metadata logging, with resizable entries, means that there's a lot of state changes flying around.
This required a review and ultimately a rewrite of the way littlefs manages multiple open files and dirs. The end result is a pretty decent update system for managing linked data structures.
It's an internal change, but notably now file's don't have to rescan their metadata pairs every write.
Renamed lfs_crc -> lfs_crc32
I've been looking at CRC APIs and this was the most common naming pattern and arguments I found. I figured it would be a good idea to adopt the standard and avoid the indirect reference to crc.
Related issues
All of these changes are to tackle the biggest issues users have raised so far. A big thanks for all the feedback and for those who have had to wait.
TODO
Biggest thing missing right now is documentation. And the commit history is a complete mess of terrible commit practice that I need to clean up. Also I intend to put together some strategy from upgrading devices with littlefs v1.EDIT: All that remains now is 1. implement "migration" functionality and testing and 2. update documentation.littlefs v2 is ready to merge!
Thanks for everyone's help and input in getting here.
cc @guillaumerems, @dpgeorge, @rojer, @dannybenor, @davidsaada, @ARMmbed/mbed-os-storage, @kegilbert, @deepikabhavnani