Optimising/adapting IDBBatchAtomicVFS for performance on large tables #109

AntonOfTheWoods · 2023-08-12T14:57:07Z

AntonOfTheWoods
Aug 12, 2023

I have a large table (400k+ rows) with an INTEGER PRIMARY KEY and need to join this table on various other (much smaller) tables. An example joining against a table of ~5k rows on the PK (to the other PK) takes about 150ms with AccessHandlePoolVFS. The exact same query on a db created with exactly the same SQL in exactly the same browser takes around 6000ms with IDBBatchAtomicVFS. Even a count(0) on this table takes around 5000ms. The 400k table has 10 columns, 3 of which contain quite a lot of text data, though the value I am getting for this query is around 2-6 bytes of UTF8 (this particular column has a max of about 6 bytes). Because of the (great) performance with AccessHandlePoolVFS, it took me a bit to think of creating a unique index with just the PK and the single small text column. This reduces the query time to around 650ms and the count to around 400ms. I'm guessing the PK is stored with the rest of the data, so the entire table has to be read without the index?

Are there any rough guidelines on how we should be preferring to create and manage reasonably large tables to better manage the particulars of IDBBatchAtomicVFS ? Like would it be definitely preferable to split this table into several? Or is it just a matter of testing everywhere?

rhashimoto · 2023-08-12T19:13:52Z

rhashimoto
Aug 12, 2023
Maintainer

I'm not a SQL expert so questions about how best to structure a query or index, or how to optimize a schema are best addressed to someone not me, like the SQLite forum.

All I/O is going to be slower with IDBBatchAtomicVFS than AccessHandlePoolVFS, but the overhead for each read and write call is also a factor. I would see if increasing the database page size (with PRAGMA page_size) helps. My guess is that it will but I don't know by how much.

You can also try to reduce the number of VFS calls by increasing the cache size (with PRAGMA cache_size). I'm not sure if that will help at all since you're having to read your entire large table, but perhaps it will if it keeps indexes and/or the smaller joined tables in memory.

5 replies

AntonOfTheWoods Aug 13, 2023
Author

Thanks for your input. With the index, setting page_size to 65536 and cache_size to -15000, I was able to get the query time down to around 250ms. All seem to contribute to reducing the final query time. Thanks!

AntonOfTheWoods Aug 13, 2023
Author

And as a gotcha, if the page_size is not increased, it looks like setting a cache_size over around 15000 causes db corruption, at least with the IDBBatchAtomicVFS. I have no idea why, but it's quite easy to reproduce, at least with a large db.

rhashimoto Aug 13, 2023
Maintainer

That's a very large cache at nearly 1 GB. You may just be putting your entire database in memory (but maybe not since it's still not as fast as your initial AccessHandlePoolVFS numbers). That's fine if your database won't get any larger, but if it does you might see performance drop off sharply again at some point.

I'm not sure what would cause database corruption when pushing the cache size even higher. What browser are you using, and can you try with a wa-sqlite debug build to see if that provides any more information? It's possible SQLite is simply running out of memory, but I wouldn't expect that to happen until a bit further so it could also be a bug, e.g. with addresses over 2 GB

AntonOfTheWoods Aug 13, 2023
Author

The usecase for IDBBatchAtomicVFS is a Chrome extension (hence no web worker), and until FF comes out with mobile extensions again at the end of the year, I'm not super bothered about the cache size - performance is key though so this fits the bill. In terms of everything being in the cache - I suspect that the indexes I created may be entirely in memory, though probably not the entire db.

I'm not sure what memory it would be running out of - the corruption happens if I don't increase the page_size but do increase the cache_size. Or do I misunderstand?

Actually the entire reason for embarking on this migration to sqlite was because Safari IOS has such ridiculously low memory available and was crashing several features that were just too slow unless everything was in memory. With the AccessHandlePoolVFS everything on Safari is insanely fast without touching page_size or cache_size, so while a little hacky, it still fits the bill nicely.

rhashimoto Aug 13, 2023
Maintainer

It's the 4 GB (on Chrome) WASM memory limit I'm wondering might be the issue. I don't think that should be the case - seems a little early with a 1 GB cache - but I don't know all the details of how WASM memory is managed so I can't rule it out.

It's also possible there is a 2 GB limit somewhere because some piece of software is treating a 32-bit C pointer as a signed integer. I would say this is more likely but I can't think of anywhere I would be doing that, and if Emscripten did it you'd think that would have been found long ago.

AntonOfTheWoods · 2023-08-18T15:49:15Z

AntonOfTheWoods
Aug 18, 2023
Author

A small note for self and others who might also be starting out with sqlite - it sometimes doesn't pick that right index, and making sure it does by checking the query planner and potentially forcing it to use a specific index can make a massive difference. By increasing page_size and making sure I was using the right indexes, I have been able to get great performance with large tables with this VFS also.

But I guess this is just normal query optimisation advice!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimising/adapting IDBBatchAtomicVFS for performance on large tables #109

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Optimising/adapting IDBBatchAtomicVFS for performance on large tables #109

AntonOfTheWoods Aug 12, 2023

Replies: 2 comments · 5 replies

rhashimoto Aug 12, 2023 Maintainer

AntonOfTheWoods Aug 13, 2023 Author

AntonOfTheWoods Aug 13, 2023 Author

rhashimoto Aug 13, 2023 Maintainer

AntonOfTheWoods Aug 13, 2023 Author

rhashimoto Aug 13, 2023 Maintainer

AntonOfTheWoods Aug 18, 2023 Author

AntonOfTheWoods
Aug 12, 2023

Replies: 2 comments 5 replies

rhashimoto
Aug 12, 2023
Maintainer

AntonOfTheWoods Aug 13, 2023
Author

AntonOfTheWoods Aug 13, 2023
Author

rhashimoto Aug 13, 2023
Maintainer

AntonOfTheWoods Aug 13, 2023
Author

rhashimoto Aug 13, 2023
Maintainer

AntonOfTheWoods
Aug 18, 2023
Author