"Blockless" IndexedDB VFS #46

rhashimoto · 2022-04-26T17:21:22Z

rhashimoto
Apr 26, 2022
Maintainer

There are sometimes conflicting goals in writing VFS samples: one is to provide a good starting point for others to write their own, and another is to explore interesting approaches and techniques. I've been concerned that IDBVersionedVFS (originally IndexedDbVFS) is becoming too complicated and intimidating for someone to use as a model. So I thought I might write an additional VFS for IndexedDB where the logic was as simple as I could make it.

The normal way to implement a filesystem is to segment file data into fixed-size blocks. Even though I've done this a number of times, the part I hate writing is the data blocking and unblocking. It's fiddly to get right with lots of opportunities for off-by-one errors. And if writes can be unaligned then that requires read-modify-write, and that makes you want to cache those partially written blocks which is more state and more code... So I was trying to come up with a way to bypass some of these block-related complications and maybe even be faster as a result.

I found a different approach that doesn't use fixed-size blocking, but instead stores one IndexedDB object per write with whatever number of bytes the write contains. This makes writing really simple provided that whenever SQLite overwrites data it always exactly overlays and replaces a previous write, i.e. the resulting file extents of IndexedDB objects should never overlap. SQLite does in fact behave this way except for one special case, which I will detail later.

So how do we find the correct IndexedDB object to satisfy a random-access read with arbitrarily sized blocks? We can take advantage of the fact that our storage "device" is actually a database itself so we can look things up with a key. The trick is to make the key for each object include the negative of its offset, e.g. the key for an object at offset 1024 includes -1024. Then to read at a particular offset (which doesn't have to be equal to any object offset), we fetch the first object with a key greater than or equal to the negative of the read offset (negation is necessary because IndexedDB only searches in one direction).

It turns out that SQLite almost never requests a read that crosses the boundary of an earlier write. I will detail the exception for that later as well, but otherwise we only need to fetch one IndexedDB object for each read.

To summarize, we're assuming these constraints on SQLite's behavior:

Any overwritten data uses the same write offset and size. That is, if SQLite writes new data to replace old data in a file, it has to use the same write parameters (other than the actual bytes).
Every read requests data from only one write. That is, no SQLite read crosses a write boundary.

And the payoff for those assumptions is that for every SQLite write, we put one IndexedDB object, and for every SQLite read, we get one IndexedDB object. I don't think things can be simpler than that. I implemented this in a new example VFS, IDBMinimalVFS, and it works (choose "IDBMinimal" in the demo)!

Now about those exceptions. SQLite will violate constraint 1 if you change the page size of the database file in-place, e.g. with:

PRAGMA page_size=8192;
VACUUM;

SQLite will violate constraint 2 if a memory journal is spilled to the VFS and then read with a rollback. I have only seen this when batch atomic writes are enabled, so this might not happen with the IDBMinimalVFS xDeviceCharacteristics configuration.

Some applications may simply be able to avoid these special cases. See IDBBatchAtomicVFS for a more complicated example class that uses the same "blockless" idea and handles the above situations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Blockless" IndexedDB VFS #46

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

"Blockless" IndexedDB VFS #46

rhashimoto Apr 26, 2022 Maintainer

Replies: 0 comments

rhashimoto
Apr 26, 2022
Maintainer