Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhead calculator #95

Open
X-Ryl669 opened this issue Sep 3, 2018 · 11 comments
Open

Overhead calculator #95

X-Ryl669 opened this issue Sep 3, 2018 · 11 comments
Labels

Comments

@X-Ryl669
Copy link

X-Ryl669 commented Sep 3, 2018

Can you provide some formula to figure out the overhead used for the filesystem compared to the data stored ?

Something with page size, and block size as input and the Y * (number of file of size XXX) ?

Thanks

@geky
Copy link
Member

geky commented Sep 19, 2018

Hi @X-Ryl669, sorry for the late response, it's taken me a bit to catch up on open issues.

This is the current minimums:

  • 2 blocks per superblock
  • 2 blocks per directory
  • 1 block per file

So if you had 10 files stored in root, that would take:
2 for the superblock + 2 for directories (root is a directory) + 10 for files = 14 blocks at minimum

This is useful for most uses of embedded filesystems when files are smaller than the block size.

The full formula that handles when directory blocks/file blocks overflow is a bit more complicated. The full details can be found in the SPEC.md. But if you want a direct formula it would be something like this:

math


Currently this isn't great when there are very few blocks (for example internal flash). But works ok for most forms of external flash. Most notably, small files always take up a full block at minimum.

As a sidenote: I'm currently working on this as a part of the 2.0 work which will get rid of the 2-block superblock requirement and allow small files to be inlined in their parent's directory blocks: #85

@X-Ryl669
Copy link
Author

Thanks a lot!

@perigoso
Copy link

@geky for the v2 could you show what formulas one could use to calculate overhead?

@ExtremeGTX
Copy link

@geky is it still the same overhead for v2.x ?

@geky geky added the question label Jun 15, 2023
@geky
Copy link
Member

geky commented Jun 15, 2023

Hi @ExtremeGTX, @perigoso, sorry about missing this earlier.

This has changed in v2, it's improved in some ways but also became a bit more complicated:

  • There is no longer a 2-block superblock requirement, though write-heavy applications can end up with multiple superblocks.
  • Inline-files mean files no longer need a full block at minimum
  • Metadata has gotten more complicated and more expensive to store, all metadata has a 4x overhead

The file data structure is the same, so if $\mathtt{file\_size} > \mathtt{cache\_size}$:

$$\mathtt{file\_blocks} = \left\lceil\frac{\mathtt{file\_size}}{\mathtt{block\_size}-8}\right\rceil$$

Metadata get a bit more complicated. Each directory still needs at minimum one pair of metadata blocks, and each piece of metadata has a 4x overhead (2x for block pairs, 2x to avoid exponential runoff):

$$\mathtt{dir\_blocks} = 2 \times \left\lceil\frac{2 \times \left(\mathtt{dir\_metadata} + \sum{\mathtt{file\_metadata}}\right)}{\mathtt{block\_size}}\right\rceil$$ $$\mathtt{total\_blocks} = \sum{\mathtt{dir\_blocks}} + \sum{\mathtt{file\_blocks}}$$

Where the $\mathtt{dir\_metadata}$ includes some bookkeeping things: a 4 byte revision count + ~40 bytes according to this comment. Though this is susceptible to change in updates.

The $\mathtt{file\_metadata}$ includes the filename, inline data or a 8-byte pointer to the file data structure, and any custom attributes. Each attribute includes a 4 bytes tag. So filename + inline data + 3 attributes would need an extra 5x4 bytes to store everything.

So, at the time of writing, this is roughly:

$$\mathtt{dir\_metadata} \approx 44$$ $$\begin{aligned} \mathtt{file\_metadata} &\approx 4 + \mathtt{name\_size} \\\ & + 4 + \begin{cases} 8 & \mathtt{file\_size} > \mathtt{cache\_size} \\\ \mathtt{file\_size} & \mathtt{file\_size} \le \mathtt{cache\_size} \end{cases} \\\ & + \sum{\left(4 + \mathtt{uattr\_size}\right)} \end{aligned}$$

Some other things to note:

  1. Files are copy-on-write. This means when you write to a file, the old contents remain reserved until you call lfs_file_sync. So you effectively need to pay ~2x the cost for open files!

  2. If dynamic wear-leveling is enabled: Directories are "copy-on-bounded-write", which means worst case you may have a copy-on-write copy of the directory tree. So worst case ~2x the path from the root to your file. This only occurs during metadata updates, and is fixed immediately, but can be a problem for a near-full filesystem.

  3. If dynamic wear-leveling is enabled: Superblocks are inserted into the filesystem if too many writes propagate to the root. This becomes exponentially unlikely as the filesystem grows, so it's unlikely for there to ever be more than one superblock (2 blocks). littlefs avoids inserting a superblock if the filesystem is >1/2 full, but this can still be a problem in a write-heavy application since inserted superblocks are never reclaimed.

For these reasons I would suggest some amount of extra storage to avoid near-full issues, something in the range of 1.5x-2x. These extra blocks will also contribute to dynamic wear-leveling, extending the lifetime of the device, so they're not really wasted in a sense.

Hopefully this info helps.

@kyrreaa
Copy link

kyrreaa commented Oct 18, 2023

What about the variable skip block section of each block. This varies with block number and basically grows with size of file?

@geky
Copy link
Member

geky commented Oct 18, 2023

@kyrreaa, great question! It's super unintuitive, but because the number of variable pointers forms a perfectly balanced binary tree, on average the overhead never exceeds $2w$.

Since our word size $w$ is 32-bits, or 4 bytes, this is where the 8 comes from in the $\mathtt{file\_blocks}$ equation:

$$\mathtt{file\_blocks} = \left\lceil\frac{\mathtt{file\_size}}{\mathtt{block\_size}-8}\right\rceil$$

We can work through a couple examples to see this is true:

   0        1        2        3        4        5        6        7        8
.-----.  .-----.  .-----.  .-----.  .-----.  .-----.  .-----.  .-----.  .-----.
|     |<---ptr |<---ptr |<---ptr |<---ptr |<---ptr |<---ptr |<---ptr |<---ptr |
|     |<-|     |----ptr |<-|     |----ptr |<-|     |----ptr |<-|     |----ptr |
|     |<-|     |--|     |--|     |----ptr |<-|     |--|     |--|     |----ptr |
|     |<-|     |--|     |--|     |--|     |--|     |--|     |--|     |----ptr |
'-----'  '-----'  '-----'  '-----'  '-----'  '-----'  '-----'  '-----'  '-----'
 0 ptrs   1 ptrs   2 ptrs   1 ptrs   3 ptrs   1 ptrs   2 ptrs   1 ptrs   4 ptrs
'-----------.-----------'
     3 ptrs in 3 blocks
          3 < 6 (3*2)
'--------------------.--------------------'
              7 ptrs in 5 blocks
                   7 < 10 (5*2)
'--------------------------------------.--------------------------------------'
                               15 ptrs in 9 blocks
                                    15 < 18 (9*2)

One way to prove this is to look at each row of pointers. Ignoring the first block, the first row has $n$ pointers, the second has $\frac{n}{2}$ pointers, the third has $\frac{n}{4}$, then $\frac{n}{8}$ etc:

$$n + \frac{n}{2} + \frac{n}{4} +\frac{n}{8} + \cdots$$

This gives us a geometric series that converges to $2n$, or an average of $2$ pointers per block:

$$n\cdot\sum_{i=1}^{\infty }{\left(\frac{1}{2}\right)^i} = n \cdot 2$$

@kyrreaa
Copy link

kyrreaa commented Oct 18, 2023

Yeah I figured the number of pointers based on the block number outwards counting trailing zeroes in block number and add one. Since it is a bit convoluted the cost of the "actual size on disk" stat would be too high I guess.
Reason I ask is spending a few days hunting down why my superblocks disappeared. The problem went away when I started using a proper sized lookahead after finding your comment on a previous issue that a lookahead much bigger than the number of blocks in the fs may break something. Seems you were right. (I was writing 20-50 MB files till I got no more space error, but not all files were in memory when listed. Upon reboot filesystem was unmountable and superblocks were gone. My wish was to see if stopping before it was actually full would help.)

@shahawi-sumup
Copy link

Hi @geky
I wanna understand the following case please:
I have a File system (LFS v2.4) used only for logs which will store only 2 files

File system Size = 65536 bytes
Block size = 4096 bytes
Total blocks = 16

Available_space = 64K - 2*4K (overhead(1x SuperBlock + 1x RootDir)) = 56K
Max_Size_per_log_file = 56K / 2 = 28672 bytes
Available_size_per_file = 28672 - 1*4KB (overhead (file_metadata)) = 24576
LogFileName = log.0000 (8 bytes)

But this calculation gives me error, no space left on the file system unless I subtract 28 bytes more from
file size to be (24576 - 28) = 24548 bytes

I assume these 28 bytes are (8 bytes (filename) + 0 bytes (inline data) + 5x4 (attributes)) but shouldn't this be already part of the 1 block subtracted above for (file_metadata) ?

Thank you.

@geky
Copy link
Member

geky commented Feb 28, 2024

Hi @shahawi-sumup, those 28-bytes come from the CTZ skip-list overhead:

$$\mathtt{file\_blocks} = \left\lceil\frac{\mathtt{file\_size}}{\mathtt{block\_size}-8}\right\rceil$$

littlefs carves out 8 bytes for each block for its own bookkeeping (the details are a bit more complicated, see above).

That being said, littlefs doesn't reserve a block for each file-metadata, the file's metadata is stored inside the parent directory. So you shouldn't need the - 1*4KB (overhead (file_metadata)).

Two things come to mind:

  1. littlefs is trying to create copies of the metadata blocks to wear-level.

    Consider setting block_cycles=-1 to disable block-level wear-leveling. At 16-blocks this may not get you much. littlefs still wear-levels metadata blocks internally, and data blocks are still dynamically wear-leveled via copy-on-write.

  2. The extra blocks are needed for copies of the file's tail block when appending.

    It's rare for files to be perfectly block-size aligned, so sometimes littlefs needs to create a copy of the last block in a file in order to perform a write.

In general I would avoid trying to fill up littlefs completely. The copy-on-write system is kind of complex and can result in ENOSPC in strange situations like what you're seeing. I would save at least ~10% of storage for copy-on-write operations. Though in your case, at 16 blocks, saving 2 blocks (or maybe even just 1 if block_cycles=-1?) is probably sufficient.

@geky
Copy link
Member

geky commented Apr 29, 2024

This is being referenced externally, and a bit incorrectly (unfortunately the exact constraints are complicated), so I just wanted to clarify:

littlefs can fit in 2 blocks, as of v2.

Iff all files are "inlineable", which requires:

  1. they fit in RAM (cache_size specifically)
  2. they fit in 1/4 block_size

Inline files are more expensive, using ~4x the storage, but this is better than storing small files in full blocks.

If you have a small amount of storage, say, 4 blocks, you may also want to consider using a larger block size, block_size=2*block_size, to allow for larger inline files and better storage utilization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants