Add custom attributes, inline files, and resizable entries #48

geky · 2018-04-08T23:04:18Z

Minor version bump to v1.4 - API is backwards compatible
Minor disk version bump to v1.2 - Disk structures are backwards compatible (can be upgraded)

This adds internal resizable entries, which enables inline files and custom attributes. Two features that have been heavily requested.

Note: Currently this pr needs a lot of testing! ~~And big-endian support needs to be fixed~~.

What's new:

Internal support for resizing entries, this is a building block for new features
Inline files

Now small files (<1024B) can be inlined directly in the directory block instead of getting their own block, which can waste a lot of space on devices with large block sizes.

Note: Because inline files must be entirely stored in a files cache, a configurable inline_size attribute is written to the superblock at format time. If the device has less RAM and a smaller read_size, the filesystem won't be able to mount. By default inline_size = read_size.
Custom attributes

It's now possible to get and set custom attributes on files, directories, and the superblock.

This isn't quite the same as getxattr/setxattr, instead uses a byte identifier to lookup attributes. Additionally the API is a bit different to avoid the cost of buffering file attributes and enable atomic updates to files.

More info over here: Manage generic attributes #23
A function to get a count of the used blocks on the filesystem: lfs_fs_size

Note: this is just a wrapper over lfs_traverse, but should be easier to use

Related issues:

Manage generic attributes #23 - custom attributes
implement generic attribute implementation #31 - time as custom attribute (alternative solution)
lfs_disk_entry #22 - alen questions
Suitable for microcontroller onboard flash? #29 - nand devices (inline files can help)
Discussion: littlefs on SPI NAND #11 - internal flash devices (inline files can help)
Available space #45 - available space question

TODO:

The separation of data-structure vs entry type has been implicit for a while now, and even taken advantage of to simplify the traverse logic. Explicitely separating the data-struct and entry types allows us to introduce new data structures (inlined files).

Previously, commits could only come from memory in RAM. This meant any entries had to be buffered in their entirety before they could be moved to a different directory pair. By adding parameters for specifying commits from existing entries stored on disk, we allow any sized entries to be moved between directory pairs with a fixed RAM cost.

Really all this means is that the internal commit function was changed from taking an array of "commit structures" to a linked-list of "commit structures". The benefit of a linked-list is that layers of commit functions can pull off some minor modifications to the description of the commit. Most notably, commit functions can add additional entries that will be atomically written out and CRCed along with the initial commit. Also a minor benefit, this is one less parameter when committing a directory with zero entries.

Expiremental implementation. This opens up the opportunity to use the same commit description for both commits and appends, which effectively do the same thing. This should lead to better code reuse.

Now, with the off, diff, and len parameters in each commit entry, we can build up directory commits that resize entries. This adds complexity but opens up the directory blocks to be much more flexible. The main concern is that resizing entries can push around neighboring entries in surprising ways, such as pushing them into new directory blocks when a directory splits. This can break littlefs's internal logic in how it tracks in-flight entries. The most problematic example being open files. Fortunately, this is helped by a global linked-list of all files and directories opened by the filesystem. As entries change size, the state of open files/dirs may be updated as needed. Note this already needed to exist for the ability to remove files/dirs, which has the same issue.

Before, tags were implicitly updated by the dir update functions, which have a strong understanding of the entry struct. However, most of the time the tag was already a part of the entry struct being committed. By making tag updates explicit, this does add cost to commits that now have to pass tag updates explicitly, but it reduces cost where that tag and entry update can be combined into one commit region. It also simplifies the dir update functions.

Now, instead of passing an enum for mem/disk commits, we pass a function pointer that can specify any behaviour. This has the benefit of opening up the possibility to pass any sort of commit logic to the committers, and unused logic can be garbage-collected by the compiler if unused. The downside is that unfortunately compilers have a harder time optimizing around functions pointers than enums, and fitting the state into structs for the callbacks may be costly.

Before, when appending new entries to a directory, we try to find empty space in the last block of a directory chain. This has a nice side-effect that the order of directory entries is maintained. However, this isn't strictly necessary. We're already scanning the directory chain in order, so other than changes to directory order, there's no downside to taking advantage of any free space we come across.

Tweaked the commit callback to pass the arguments for from-memory commits explicitly, with non-from-memory commits still being able to hijack the opaque data pointer for additional state. The from-memory commits make up the vast majority of commits in littlefs, so this small change has a noticable impact.

This allows updates to directories without needing to allocate an entry struct for every call.

The size field is redundant, since an entry's size can be determined from the nlen+elen+alen+4. However, as you may have guessed from that expression, calculating the size this way is a bit roundabout and inefficient. Despite its redundancy, it's cheaper to store the size in the entry, though with a minor RAM cost. Note, extra care must now be taken to make sure these size and len fields don't fall out of sync.

Proof-of-concept implementation of inline files that stores the file's content directly in its parent's directory pair. Inline files are indicated by a different type stored in an entry's struct field, and take advantage of resizable entries. Where a normal file's entry would normally hold the reference to the CTZ skip-list, an inline file's entry contains the contents of the actual file. Unfortunately, storing the inline file on disk is the easy part. We also need to manage inline files in the internals of littlefs and provide the same operations that we do on normal files, all while reusing as much code as possible to avoid a significant increase in code cost. There is a relatively simple, though maybe a bit hacky, solution here. If a file fits entirely in a cache line, the file logic never actually has to go to disk. This means we can just give the file a "pretend" block (hopefully one that would assert if ever written to), and carry out file operations as normal, as long as we catch the file before it exceeds the cache line and write out the file to an actual disk.

Now when a file overflows the max inline file size, it will be correctly written out to a proper block. Additionally, tweaked corner cases around inline file, however this still needs significant testing. A real neat part that surprised me is that littlefs _already_ contains the logic for writing out inline files: in lfs_file_relocate! With a bit of tweaking, littlefs can pull off both the overflow from inline to normal files _and_ the relocating of bad blocks in files with the same piece of logic.

Making the superblock look like "just another entry" allows us to treat the superblock like "just another entry" and reuse a decent amount of logic that would otherwise only be used a format and mount time. In this case we can use append to write out the superblock like it was creating a new entry on the filesystem.

It's a relatively simple function but offers some code reuse as well as making the dir entry operations a bit more readable.

…perations This move was surprisingly complex, but offers the ultimate opportunity for code reuse in terms of resizable entries. Instead of needing to provide separate functions for adding and removing entries, adding and removing entries can just be viewed as changing an entry's size to-and-from zero. Unfortunately, it's not _quite_ that simple, since append and remove hide some relatively complex operations for when directory blocks overflow or need to be cleaned up. However, with enough shoehorning, and a new committer type that allows specifying recursive commit lists (is this now a push-down automata?), it does seem to be possible to shove all of the entry update logic into a single function. Sidenote, I switched back to an enum-based DSL, since the addition of a recursive region opcode breaks the consistency of what needs to be passed to the DSL callback functions. It's much simpler to handle each opcode explicitly inside a recursive lfs_commit_region function.

Being a portable, microcontroller-scale embedded filesystem, littlefs is presented with a relatively unique challenge. The amount of RAM available is on completely different scales from machine to machine, and what is normally a reasonable RAM assumption may break completely on an embedded system. A great example of this is file names. On almost every PC these days, the limit for a file name is 255 bytes. It's a very convenient limit for a number of reasons. However, on microcontrollers, allocating 255 bytes of RAM to do a file search can be unreasonable. The simplest solution (and one that has existing in littlefs for a while), is to let this limit be redefined to a smaller value on devices that need to save RAM. However, this presents an interesting portability issue. If these devices are plugged into a PC with relatively infinite RAM, nothing stops the PC from writing files with full 255-byte file names, which can't be read on the small device. One solution here is to store this limit on the superblock during format time. When mounting a disk, the filesystem implementation is responsible for checking this limit in the superblock. If it's larger than what can be read, raise an error. If it's smaller, respect the limit on the superblock and raise an error if the user attempts to exceed it. In this commit, this strategy is adopted for file names, inline files, and the size of all attributes, since these could impact the memory consumption of the filesystem. (Recording the attribute's limit is iffy, but is the only other arbitrary limit and could be used for disabling support of custom attributes). Note! This changes makes it very important to configure littlefs correctly at format time. If littlefs is formatted on a PC without changing the limits appropriately, it will be rejected by a smaller device.

One of the big benefits of inline files is that small files no longer need to take up a full block. This opens up an opportunity to provide much better support for storage devices with only a handful of very large blocks. Such as the internal flash found on most microcontrollers. After investigating some use cases for a filesystem on internal flash, it has become apparent that the 255-byte limit is going to be too restrictive to be useful in many cases. Most uses I found needed files ~4-64 bytes in size, but it wasn't uncommon to find files ~512 bytes in length. To try to remedy this, I've pushed the 255 byte limit up to 1023 bytes, by stealing some bits from the previously-unused attributes's size. Unfortunately this limits attributes to 63 bytes in total and has a minor code cost, but I'm not sure even 1023 bytes will be sufficient for a lot of cases. The littlefs will probably never be as efficient with internal flash as other filesystems such as SPIFFS, it just wasn't designed for this sort of limited geometry. However, this feature has been heavily requested, even with limitations, because of the opportunity for code reuse on microcontrollers with both internal and external flash.

A much requested feature (mostly because of littlefs's notable lack of timestamps), this commits adds support for user-specified custom attributes. Planned (though underestimated) since v1, custom attributes provide a route for OSs and applications to provide their own metadata in littlefs, without limiting portability. However, unlike custom attributes that can be found on much more powerful PC filesystems, these custom attributes are very limited, intended for only a handful of bytes for very important metadata. Each attribute has only a single byte to identify the attribute, and the size of all attributes attached to a file is limited to 64 bytes. Custom attributes can be accessed through the lfs_getattr, lfs_setattr, and lfs_removeattr functions.

Although it's simple and probably what most users expect, the previous custom attributes API suffered from one problem: the inability to update attributes atomically. If we consider our timestamp use case, updating a file would require: 1. Update the file 2. Update the timestamp If a power loss occurs during this sequence of updates, we could end up with a file with an incorrect timestamp. Is this a big deal? Probably not, but it could be a surprise only found after a power-loss. And littlefs was developed with the _specifically_ to avoid suprises during power-loss. The littlefs is perfectly capable of bundling multiple attribute updates in a single directory commit. That's kind of what it was designed to do. So all we need is a new committer opcode for list of attributes, and then poking that list of attributes through the API. We could provide the single-attribute functions, but don't, because the fewer functions makes for a smaller codebase, and these are already the more advanced functions so we can expect more from users. This also changes semantics about what happens when we don't find an attribute, since erroring would throw away all of the other attributes we're processing. To atomically commit both custom attributes and file updates, we need a new API, lfs_file_setattr. Unfortunately the semantics are a bit more confusing than lfs_setattr, since the attributes aren't written out immediately.

Mostly just removed LFS_FROM_DROP and changed the DSL grammar a bit to allow drops to occur naturally through oldsize -> newsize diff expressed in the region struct. This prevents us from having to add a drop every time we want to update an entry in-place.

In the form of lfs_file_setattr, lfs_file_getattr, lfs_fs_setattr, lfs_fs_getattr. This enables atomic updates of custom attributes as described in 6c754c8, and provides a custom attribute API that allows custom attributes to be stored on the filesystem itself.

This has existed for some time in the form of the lfs_traverse function, through which a user could provide a simple callback that would just count the number of blocks lfs_traverse finds. However, this approach is relatively unconventional and has proven to be confusing for most users.

This is what I get for not runing CI on a local development branch.

Also found some bugs. Should now have a good amount of confidence in these features.

For better compatibility with GPL v2 With permissions from: - aldot - Sim4n6 - jrast

This was causing code sizes to be reported with several of the logging functions still built in. A useful number, but not the minimum achievable code size.

geky · 2019-04-11T19:57:08Z

Closing as this has been superseded by v2 (#85)

This was referenced Apr 8, 2018

Manage generic attributes #23

Open

Available space #45

Closed

geky force-pushed the custom-attributes branch 3 times, most recently from f4f1955 to c832311 Compare April 16, 2018 07:37

This was referenced Apr 18, 2018

Is LFS_NAME_MAX 255 bytes a little larger? #49

Open

Lookahead bug? #46

Closed

This was referenced Jul 2, 2018

Added possibility to open multiple files with LFS_NO_MALLOC enabled #58

Merged

Small files / minimum allocation size issue #66

Open

geky mentioned this pull request Aug 5, 2018

v2: Metadata logging, custom attributes, inline files, and a major version bump #85

Merged

8 tasks

geky force-pushed the master branch 2 times, most recently from eb7b7c7 to cb62bf2 Compare September 27, 2018 19:46

geky added 18 commits October 9, 2018 23:02

Changed dir append to mirror commit DSL

e3daee2

Expiremental implementation. This opens up the opportunity to use the same commit description for both commits and appends, which effectively do the same thing. This should lead to better code reuse.

Separated out version of dir remove/append for non-entries

03b262b

This allows updates to directories without needing to allocate an entry struct for every call.

Fixed big-endian support for entry structures

fb23044

Fixed a handful of bugs as result of testing

701e4fa

Added internal lfs_dir_get to consolidate logic for reading dir entries

ad74825

It's a relatively simple function but offers some code reuse as well as making the dir entry operations a bit more readable.

geky added 13 commits October 9, 2018 23:02

Bumped versions, cleaned up some TODOs and missing comments

65ea6b3

Added test coverage for filesystems with no inline files

2a8277b

Fixed big-endian support again

ea4ded4

This is what I get for not runing CI on a local development branch.

Added tests for resizable entries and custom attributes

61f454b

Also found some bugs. Should now have a good amount of confidence in these features.

Changed license to BSD-3-Clause

1f8c509

For better compatibility with GPL v2 With permissions from: - aldot - Sim4n6 - jrast

Fixed script issue with bash expansion inside makefile parameter

c23481c

This was causing code sizes to be reported with several of the logging functions still built in. A useful number, but not the minimum achievable code size.

geky force-pushed the custom-attributes branch from 72c671c to c23481c Compare October 10, 2018 04:05

geky closed this Apr 11, 2019

geky deleted the custom-attributes branch August 5, 2019 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom attributes, inline files, and resizable entries #48

Add custom attributes, inline files, and resizable entries #48

geky commented Apr 8, 2018 •

edited

Loading

geky commented Apr 11, 2019

Add custom attributes, inline files, and resizable entries #48

Add custom attributes, inline files, and resizable entries #48

Conversation

geky commented Apr 8, 2018 • edited Loading

geky commented Apr 11, 2019

geky commented Apr 8, 2018 •

edited

Loading