Minor release: v2.5 #669

geky · 2022-04-11T05:50:24Z

Thanks for everyone who contributed and provided feedback!

Bringing in:

Draft of release notes follows:

What's new?

Removed all recursion (#658)

On paper, littlefs promises O(1) RAM consumption, however in practice, this hasn't quite been true due to the use of recursion in several parts of the implementation. 2 instances of recursion were bounded at runtime, but 1 instance could be unbounded, though this would require many block relocations to all occur simultaneously.

These pieces have now been rewritten to strictly avoid recursion, at the cost of some code size (<=5.1%). This means littlefs now lives up to its promise of bounded RAM, and the total RAM usage can be easily evaluated using static tooling.
Added several new scripts for measuring RAM usage, including detection of future recursion (#658)

These have also been integrated into CI, so data, stack, and struct usage will now be reported with all CI jobs.
Several corner-case performance improvements thanks to @invoxiaamo and @robekras:
- lfs_dir_traverse no longer exhibits a nasty O(n^3) performance spike when a rename occurs at the same time a directory is being compacted (#666)
  
  @invoxiaamo notes a reduction in lfs_dir_traverse calls by 97.8% (3248 vs 148393), with a number of other proposal to improve this with additional RAM caching in #621.
  
  This change also reduces stack consumption of lfs_dir_traverse by ~25%.
- lfs_file_seek now avoids flushing when seeking inside the cache while reading (#632)
  
  @robekras notes a ~90% (2 sec vs 20 sec) speedup when loading a sparse binary font file into LVGL.
Several improvements to API robustness thanks to @colin-foster-in-advantage, @nnayo, @Johnxjj, and @lmapii:
- Attempting to mount a littlefs image with a block_size or block_count that does not match the configuration will now error (#584)
- lfs_file_open is now omitted when building with LFS_NO_MALLOC, previously this would always be a runtime error (#614)
- Seeking to a negative offset is now an error, instead of underflowing (#630)
- lfs_file_open now returns LFS_ERR_NAMETOOLONG if it can't fit the name + file metadata into a metadata block, previously this would return LFS_ERR_NOSPC (#638)
Thanks to @m8ddin, lfs_filebd now works on Windows (#643)
The copyright notice has been appended to reflect ownership changing to "the littlefs authors" (#657)

When the on-disk block size doesn't match the config block size, it is possible to get file corruption. For instance, if the num blocks was 0x200 and we re-mount with 0x100 files could be corrupt. If we re-mount with a larger number of blocks things should be safer, but could be handled with a resize option or perhaps a mount flag to ignore this parameter.

With the previous commit, fail if the superblock block_size doesn't match the config block_size.

Now with -s/--sort and -S/--reverse-sort for sorting the functions by size. You may wonder why add reverse-sort, since its utility doesn't seem worth the cost to implement (these are just helper scripts after all), the reason is that reverse-sort is quite useful on the command-line, where scrollback may be truncated, and you only care about the larger entries. Outside of the command-line, normal sort is prefered. Fortunately the difference is just the sign in the sort key. Note this conflicts with the short --summary flag, so that has been removed.

This is to avoid unexpected script behavior even though data.py should always return 0 bytes for littlefs. Maybe a check for this should be added to CI?

scripts/coverage.py was missed originally because it's not ran as often as the others. Since it requires run-time info, it's usually only used in CI.

Note this detects loops (recursion), and renders this as infinity. Currently littlefs does have a single recursive function and you can see how this infects the full call graph. Eventually this should be removed.

- size -> code_size - size -> data_size - frame -> stack_frame - limit -> stack_limit - hits -> coverage_hits - count -> coverage_count

Note this does include internal structs, so this should probably be limited to informative purposes.

This required a patch to the --diff flag for the scripts to ignore a missing file. This enables the useful one liner for making comparisons with potentially missing previous versions: ./scripts/code.py lfs.o -d lfs.o.code.csv -o lfs.o.code.csv function (0 added, 0 removed) old new diff TOTAL 25476 25476 +0 One downside, these previous files are easy to delete as a part of make clean, which limits their usefulness for comparing configuration changes...

- Added -L/--depth argument to show dependencies for scripts/stack.py, this replaces calls.py - Additional internal restructuring to avoid repeated code - Removed incorrect diff percentage when there is no actual size - Consistent percentage rendering in test.py

A full summary of static measurements (code size, stack usage, etc) can now be found with: make summary This is done through the combination of a new ./scripts/summary.py script and the ability of existing scripts to merge into existing csv files, allowing multiple results to be merged either in a pipeline, or in parallel with a single ./script/summary.py call. The ./scripts/summary.py script can also be used to quickly compare different builds or configurations. This is a proper implementation of a similar but hacky shell script that has already been very useful for making optimization decisions: $ ./scripts/structs.py new.csv -d old.csv --summary name (2 added, 0 removed) code stack structs TOTAL 28648 (-2.7%) 2448 1012 Also some other small tweaks to scripts: - Removed state saving diff rules. This isn't the most useful way to handle comparing changes. - Added short flags for --summary (-Y) and --files (-F), since these are quite often used.

- Changed `make summary` to show a one line summary - Added `make lfs.csv` rule, which is useful for finding more info with other scripts - Fixed small issue in ./scripts/summary.py - Added *.ci (callgraph) and *.csv (script output) to CI

- Added to GitHub statuses (61 results) - Reworked generated release table to include these (16 results, only thumb) These also required a surprisingly large number of other changes: - Bumbed CI Ubuntu version 18.04 -> 20.04, 22.04 is already on the horizon but not usable in GitHub yet - Manualy upgrade to GCC v10, this is required for the -fcallgraph-info flag that scripts/stack.py uses. - Increased paginated status queries to 100 per-page. If we have more statuses than this the status diffs may get much more complicated... - Forced whitespace in generated release table to always be nbsp. GitHub tables get scrunched rather ugly without this, prefering margins to readable tables. - Added limited support for "∞" results, since this is returned by ./scripts/stack.py for recursive functions. As a side-note, this increases the number of statuses reported per-commit from 6 to 61, so hopefully that doesn't cause any problems...

Using errors=replace in python utf-8 decoding makes these scripts more resilient to underlying errors, rather than just throwing an unhelpfully generic decode error.

GCC is a bit frustrating here, it really wants to generate every file in a single command, which _is_ more efficient if our build system could leverage this. But -fcallgraph-info is a rather novel flag, so we can't really rely on it for generally compiling and testing littlefs. The multi-file output gets in the way when we want an explicitly separate rule for callgraph-info generation. We can't generate the callgraph-info without generating the objects files. This becomes a surprsing issue when parallel building (make -j) is used! Suddenly we might end up with both the .o and .ci rules writing to .o files, which creates a really difficult to track down issue of corrupted .o files. The temporary solution is to use an order-only prerequisite. This still ends up building the .o files twice, but it's an acceptable tradeoff for not requiring the -fcallgraph-info for all builds.

This requires parsing an additional section of the dwarfinfo (--dwarf=rawlines) to get the declaration file info. --- Interpreting the results of ./scripts/structs.py reporting is a bit more complicated than other scripts, structs aren't used in a consistent manner so the cost of a large struct depends on the context in which it is used. But that being said, there really isn't much reason to report internal-only structs. These structs really only exist for type-checking in internal algorithms, and their cost will end up reflected in other RAM measurements, either stack, heap, or other.

…ntu snaps Introduced when updating CI to Ubuntu 20.04, Ubuntu snaps consume loop devices, which conflict with out assumption that /dev/loop0 will always be unused. Changed to request a dynamic loop device from losetup, though it would have been nice if Ubuntu snaps allocated from the last device or something.

Avoids redundant counting of structs shared in multiple .c files, which is very common. This is different from the other scripts, code.py/data.py/stack.py, but this difference makes sense as struct declarations have a very different lifetime.

As noted in Python's subprocess library: > This will deadlock when using stdout=PIPE and/or stderr=PIPE and the > child process generates enough output to a pipe such that it blocks > waiting for the OS pipe buffer to accept more data. Curiously, this only became a problem when updating to Ubuntu 20.04 in CI (python3.6 -> python3.8).

lfs_dir_commit originally relied heavily on tail-recursion, though at least one path (through relocations) was not tail-recursive, and could cause unbounded stack usage in extreme cases of bad blocks. (Keep in mind even extreme cases of bad blocks should be in scope for littlefs). In order to remove recursion from this code path, several changed were raequired: - The lfs_dir_compact logic had to be somewhat inverted. Instead of first compacting and then resolving issues such as relocations and orphans, the overarching lfs_dir_commit now contains a state-machine which after committing or compacting handles the extra changes to the filesystem in a single, non-recursive loop - Instead of fixing all relocations recursively, >1 relocation requires defering to a full deorphan step. This step is unfortunately an additional n^2 process. It also required some changes to lfs_deorphan in order to ignore intentional orphans created as an intermediary in lfs_mkdir. Maybe in the future we should remove these. - Tail recursion normally found in lfs_fs_deorphan had to be rewritten as a loop which restarts any time a new commit causes a relocation. This does show that the algorithm may not terminate, but only if every block is bad, which will eventually cause littlefs to run out of blocks to write to.

This mostly just required separate functions for "lfs_file_rawwrite" and "lfs_file_flushedwrite", since lfs_file_flush recursively invokes lfs_file_rawread and lfs_file_rawwrite. This comes at a code cost, but gives us bounded and measurable RAM usage on this code path.

lfs_dir_traverse is a bit unpleasant in that it is inherently a recursive function, but without a strict bound of 4 calls (commit -> filter -> move -> filter), and efforts to unroll the recursion comes at a signification code cost. It turns out the best solution I've found so far is to simple create an explicit stack with an explicit bound of 4 calls (or more accurately, 3 pushed frames). --- This actually highlights one of the bigger flaws in littlefs right now, which is that this function, lfs_dir_traverse, takes O(n^2) disk reads to traverse. Note that LFS_FROM_MOVE can only occur once per commit, which is why this code is O(n^2) and not O(n^4).

This should fix the performance issue if a new seek position belongs to currently cached data. This avoids unnecessary rereads of file data.

The basic idea is simple, if we seek to a position in the currently loaded cache, don't flush the cache. Notably this ensures that seek is always as fast or faster than just reading the data. This is a bit tricky since we need to check that our new block and offset match the cache, fortunately we can skip the block check by reevaluating the block index for both the current and new positions. Note this only works whene reading, for writing we need to always flush the cache, or else we will lose the pending write data.

- nit: Moving brace to end of if statement line for consistency - mount: add more debug info per CR - Fix compiler error from extra parentheses - Fix superblock typo

Rename can be VERY time consuming. One of the reasons is the 4 recursion level depth of lfs_dir_traverse() seen if a compaction happened during the rename. lfs_dir_compact() size computation [1] lfs_dir_traverse(cb=lfs_dir_commit_size) - do 'duplicates and tag update' [2] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1]) - Reaching a LFS_FROM_MOVE tag (here) [3] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1]) <= on 'from' dir - do 'duplicates and tag update' [4] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[3]) followed by the compaction itself: [1] lfs_dir_traverse(cb=lfs_dir_commit_commit) - do 'duplicates and tag update' [2] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1]) - Reaching a LFS_FROM_MOVE tag (here) [3] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[1]) <= on 'from' dir - do 'duplicates and tag update' [4] lfs_dir_traverse(cb=lfs_dir_traverse_filter, data=tag[3]) Yet, analyse shows that levels [3] and [4] don't perform anything if the callback is lfs_dir_traverse_filter... A practical example: - format and mount a 4KB block FS - create 100 files of 256 Bytes named "/dummy_%d" - create a 1024 Byte file "/test" - rename "/test" "/test_rename" - create a 1024 Byte file "/test" - rename "/test" "/test_rename" This triggers a compaction where lfs_dir_traverse was called 148393 times, generating 25e6+ lfs_bd_read calls (~100 MB+ of data) With the optimization, lfs_dir_traverse is now called 3248 times (589e3 lfs_bds_calls (~2.3MB of data) => x 43 improvement...

…nt_fail Fail mount when the block size changes

don't use lfs_file_open() when LFS_NO_MALLOC is set

add the limit, the cursor cannot be set to a negative number

Fix lfs_file_rawseek performance issue

Removed invalid overwrite for return value.

There are two issues, when using the file-based block device emulation on Windows Platforms: 1. There is no fsync implementation available. This needs to be mapped to a Windows-specific FlushFileBuffers system call. 2. The block device file needs to be opened as binary file (O_BINARY) The corresponding flag is not required for Linux.

Fixes to use lfs_filebd on windows platforms

Update copyright notice

Optimization of the rename case.

Restructure littlefs to not use recursion, measure stack usage

This is possible thanks to invoxiaamo's optimization of compacting renames to avoid the O(n^3) nested filters. Not only does this significantly reduce the runtime cost of that operation, but it reduces the maximum possible depth of recursion to 3 frames. Deepest lfs_dir_traverse before: traverse with commit '-> traverse with filter '-> traverse with move '-> traverse with filter Deepest lfs_dir_traverse after: traverse with commit '-> traverse with move '-> traverse with filter

geky · 2022-04-11T06:21:48Z

And here are the new CI results, for those who are curious:

Configuration	Code	Stack	Structs	Coverage
Default	15722 B (+12.7%)	1312 B (-∞%)	772 B (+0.0%)	92.7% of 2388 lines (-2.0%)
Readonly	5742 B (+7.4%)	352 B (+4.7%)	772 B (+0.0%)
Threadsafe	16516 B (+10.9%)	1312 B (-∞%)	780 B (+0.0%)
Migrate	17414 B (+11.5%)	1648 B (-∞%)	776 B (+0.0%)
Error-asserts	16326 B (+12.3%)	1304 B (-∞%)	772 B (+0.0%)

Minor release: v2.5

Johnxjj · 2022-06-08T08:54:09Z

@geky I would like to know if v2.5 fixes these problems I encountered before?
#327
#75
#376

geky · 2023-05-01T05:36:05Z

Hi @Johnxjj, sorry I just saw this. The answer is mostly no, there's still quite a bit of outstanding work around performance.

Some quick answers:

#327 - This is significantly improved with @invoxiaamo lfs_dir_traverse optimization.
#75 - This is unaffected.
#376 - I think this is also unaffected, though the metadata_max config introduced in #502 may offer a workaround at a large storage cost.

colin-foster-in-advantage and others added 30 commits July 21, 2021 08:56

Fail when block_size doesn't match config

487df12

With the previous commit, fail if the superblock block_size doesn't match the config block_size.

don't use lfs_file_open() when LFS_NO_MALLOC is set

e334983

Split out scripts/code.py into scripts/code.py and scripts/data.py

2cdabe8

This is to avoid unexpected script behavior even though data.py should always return 0 bytes for littlefs. Maybe a check for this should be added to CI?

Added scripts/calls.py for viewing the callgraph directly

f5286ab

Added coverage-sort to scripts/coverage.py

20c58dc

scripts/coverage.py was missed originally because it's not ran as often as the others. Since it requires run-time info, it's usually only used in CI.

Added scripts/stack.py for viewing stack usage

f4c7af7

Note this detects loops (recursion), and renders this as infinity. Currently littlefs does have a single recursive function and you can see how this infects the full call graph. Eventually this should be removed.

Changed script's CSV formats to allow for merging different measurements

d7582ef

- size -> code_size - size -> data_size - frame -> stack_frame - limit -> stack_limit - hits -> coverage_hits - count -> coverage_count

Added scripts/structs.py for getting sizes of structs

0a2ff3b

Note this does include internal structs, so this should probably be limited to informative purposes.

A few more tweaks to scripts

7ea2b51

- Changed `make summary` to show a one line summary - Added `make lfs.csv` rule, which is useful for finding more info with other scripts - Fixed small issue in ./scripts/summary.py - Added *.ci (callgraph) and *.csv (script output) to CI

Fixed spurious encoding error

e4adefd

Using errors=replace in python utf-8 decoding makes these scripts more resilient to underlying errors, rather than just throwing an unhelpfully generic decode error.

Cleaned up make clean

563af5f

Update copyright notice

2db5dc8

Update lfs.c

a6f01b7

This should fix the performance issue if a new seek position belongs to currently cached data. This avoids unnecessary rereads of file data.

Squash of CR changes

cf274e6

- nit: Moving brace to end of if statement line for consistency - mount: add more debug info per CR - Fix compiler error from extra parentheses - Fix superblock typo

Set the limit, the cursor cannot be set to a negative number

b898977

lmapii and others added 13 commits April 10, 2022 13:00

Updated error handling for NOSPC

3b62ec1

Merge pull request #584 from colin-foster-in-advantage/block_size_mou…

f40b854

…nt_fail Fail mount when the block size changes

Merge pull request #614 from nnayo/fix_no_malloc_2

bfb9bd2

don't use lfs_file_open() when LFS_NO_MALLOC is set

Merge pull request #630 from Johnxjj/dev-johnxjj

cc02565

add the limit, the cursor cannot be set to a negative number

Merge pull request #632 from robekras/patch-1

a94fbda

Fix lfs_file_rawseek performance issue

Merge pull request #638 from lmapii/master

f28ac3e

Removed invalid overwrite for return value.

Merge pull request #643 from microist/fix-filebd-windows

5451a6d

Fixes to use lfs_filebd on windows platforms

Merge pull request #657 from littlefs-project/copyright-update

0ced362

Update copyright notice

Merge pull request #666 from invoxiaamo/rename-opti2

3ce64d1

Optimization of the rename case.

Merge pull request #658 from littlefs-project/no-recursion

c60c977

Restructure littlefs to not use recursion, measure stack usage

geky added the v2.5 label Apr 11, 2022

geky added this to the v2.5 milestone Apr 11, 2022

geky merged commit 5609535 into master Apr 14, 2022

geky added a commit that referenced this pull request Apr 14, 2022

Merge pull request #669 from littlefs-project/devel

40dba4a

Minor release: v2.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor release: v2.5 #669

Minor release: v2.5 #669

geky commented Apr 11, 2022 •

edited

Loading

geky commented Apr 11, 2022

Johnxjj commented Jun 8, 2022

geky commented May 1, 2023

Minor release: v2.5 #669

Minor release: v2.5 #669

Conversation

geky commented Apr 11, 2022 • edited Loading

What's new?

geky commented Apr 11, 2022

Johnxjj commented Jun 8, 2022

geky commented May 1, 2023

geky commented Apr 11, 2022 •

edited

Loading