A future for SQL on the web

James Long
August 12, 2021

I discovered something absurd recently and I’m very excited to tell you about it.

The end result is absurd-sql, and it’s a persistent backend for SQLite on the web. That means it doesn’t have to load the whole db into memory, and writes persist. In this post I will explain the absurdities of the web’s storage APIs (mainly IndexedDB), show how SQLite provides a 10x perf improvement, explain all the cool tricks that make it work, and explain the locking/transactional semantics that make it robust.

If you are writing a web app today, you’ll probably choose IndexedDB to store data. It’s the only option for something database-like that works across all browsers.

As you attempt to build a local app, you will quickly find that it’s a poor database to build an entire app around. Sure, it might be fine for small bits of functionality. But if we really want webapps to be impressive (I do), we need a more powerful way to work with data.

IndexedDB is slow. What’s even worse, in my testing Chrome, which is by the far the major browser, has the slowest implementation. Simple operations against the database take ~10ms (and I’m being charitable!), while it’s common for me to be profiling SQLite at ~.01ms. This makes a massive difference for what kinds of apps you can write against it.

You’re on your own if you want to query data in IndexedDB. The only function it provides is count, and the rest of the APIs just return a range of items. You’ll have to construct your own query functionality by wiring up indexes and structuring your data in specific ways.

Heck, you can’t even add a new “object store”, which is sort of like a table, at any point in time. You can only do it when opening the database, and doing so forces all other tabs to kill their database connection!

Maybe IDB was supposed to be low-level and you’re supposed to reach for a library to help provide better support for these features. Every library I looked at was messy and made performance even worse (one of the most popular “fast” ones I looked at took ~45ms to just get one item!?).

I say abstract away the whole thing. I have something for you that you should reach for next time instead of all those other libraries. And it’s going to massively improve your life as a developer.

Introducing absurd-sql

SQL is a great way to build apps. Especially small local web apps. Key/value stores may have their place in large distributed systems, but wow wouldn’t it be great if we could use SQLite on the web?

I’m excited to announce absurd-sql which makes this possible. absurd-sql is a filesystem backend for sql.js that allows SQLite to read/write from IndexedDB in small blocks, just like it would a disk. I ported my app to use and you can try it here.

This whole situation, and how well this project ended up working out, really is absurd. Why? In all browsers except Chrome, IndexedDB is implemented using SQLite. Anyway…

A huge thanks to phiresky writing the article Hosting SQLite databases on Github Pages which inspired me to do this. It’s a very clever technique and it gave me the idea to research this.

sql.js is already a great project for using SQLite on the web. It compiles SQLite to WebAssembly, and lets you read databases and run queries. The major problem is that you can’t persist any writes. It loads the entire db into memory, and only changes the in-memory data. Once you refresh the page, all your changes are lost.

While in-memory databases have their uses, it kneecaps SQLite into something far less useful. To build any kind of app with it, we need the ability to write and persist.

absurd-sql solves this, and it works by intercepting read/write requests from SQLite and fetching and persisting them into IndexedDB (or any other persistent backend). I wrote a whole filesystem layer that is aware of how SQLite reads and writes blocks, and it efficiently performs the operations correctly.

What this means is it never loads the database into memory because it only loads whatever SQLite asks for, and writes always persist.

We use sql.js because it already has a large community and is by far the most common way to use SQL on the web. Now, all you have to do is install absurd-sql and add some lines of code to hook it up. It looks like this:

import initSqlJs from '@jlongster/sql.js';
import { SQLiteFS } from 'absurd-sql';
import IndexedDBBackend from 'absurd-sql/dist/indexeddb-backend';

SQL = await initSqlJs();
sqlFS = new SQLiteFS(SQL.FS, new IndexedDBBackend());
// This is temporary for now
SQL.register_for_idb(sqlFS);

SQL.FS.mkdir('/sql');
SQL.FS.mount(sqlFS, {}, '/sql');

let db = new SQL.Database('/sql/db.sqlite', { filename: true });
A few notes: this currently requires my fork of sql.js but hopefully these changes will be merged in… This API is likely to change. The reason for register_for_idb is because Emscripten doesn’t properly hook up the file locking API, so we need to manually forward SQLite’s lock requests. Lastly, this code must run in a web worker. The APIs we need are only available there, but that’s fine because you shouldn’t be blocking the main thread.

Use db like you would normally, except now it efficiently reads and persists writes to IndexedDB! See the README for more information, and look at the sql.js docs for the database API. You can also use this example project to get started quickly.

let stmt = db.prepare('INSERT INTO kv (key, value) VALUES (?, ?)');
stmt.run(['item-id-00001', 35725.29]);

let stmt = db.prepare('SELECT SUM(value) FROM kv');
stmt.step();
console.log(stmt.getAsObject());

We’re bringing WebSQL back, baby!

While I wouldn’t recommend this for production quite yet, I ported my app Actual to use it and it works! Click “try demo” to see it in action, and look at your IndexedDB storage.

More than just another database

I talk a lot about IndexedDB in this post because it’s the only persistent storage that works across all browsers today. The problem is it couples two needs into one: databases and persistent storage.

What happens when you’ve built your entire app around IndexedDB, and then a brand new super fast storage layer comes to the web? You’ve locked yourself in.

Using sql.js + absurd-sql, we’ve abstracted away the storage layer. I’ll talk about how much faster absurd-sql is than IndexedDB below, but don’t be fooled: we are still far away from native performance. We are about 50-100x slower than native SQLite because we have no way to make faster writes. (but don’t let that scare you; it’s still fast enough for many use cases and SQLite’s caching helps a ton)

IndexedDB is just one backend for absurd-sql. I’ve already experimented with a webkitFileSystem backend, but I haven’t been able to match IDB’s performance yet. I must be missing how to do bulk reads and writes, because it’s terribly slow.

There is a new proposal called Storage Foundation API which is built exactly for this use case. I’m talking with the authors and I will experiment with a backend using that. (It’s missing some critical things like locking, but hopefully solvable)

There is potential for another 1 or 2 orders of magnitude increase in performance in the future. When that happens, all you’ll have to do is switch a couple lines of code!

A note about IndexedDB durability

Let’s hope a new storage API comes along because it’s worth mentioning that it’s not guaranteed that your IndexedDB data will live forever. Supposedly browsers may delete your IndexedDB database under certain conditions.

While this very rarely happens (I haven’t seen it yet), the fact that it’s possible puts a major downer on using it for persistent storage. For now, you can use it for anything that is also backed up in the cloud — but I wouldn’t use it for something that only lives locally and it meant to last for the rest of your life. Hopefully we’ll get a better storage API for that.

Ok — off to the fun stuff!

Tradeoffs? What tradeoffs?

SQLite, even though it’s implemented on top of IndexedDB, easily beats out IndexedDB in every single performance metric. The absurdity!

I wrote a bechmarking app to test many different scenarios: https://priceless-keller-d097e5.netlify.app. It runs a couple different queries, and lets you configure SQLite in various ways, and also lets you run them against a raw IndexedDB implementation.

All data is in this spreadsheet, and the benchmark code is here.

Reads

These are the results of a query like SELECT SUM(*) FROM kv which forces the db to read all items (2015 macbook pro). For IndexedDB, it was implemented by opening a cursor and summing each item.

Expand to see Firefox’s data, which looks similar but their IDB is twice as fast (look at the Y axis). For the same of simplicity, I’ll focus on Chrome.

Writes

Here is the results of doing a bunch of inserts like INSERT INTO kv (key, value) VALUES ("large-uuid-string-0001", 53234.22), and using a bulk put (we only wait for transaction success) with the same data with IndexedDB:

I realize that writes are difficult to measure; I tried to do a fair comparison. I am doing a bulk insert in IndexedDB, and should be as fast as it can be. I’m also not doing anything tricky like making SQLite not wait for the disk to be flushed or something like that. They both use the exact same flow.

Again, expand to see Firefox’s data, which is over twice as fast

My original target was 1,000,000 items, and absurd-sql handles that just fine. It’s a little slow, writing takes 4-6s and reading takes 2-3 seconds. But I wasn’t even patient enough to wait for IndexedDB to finish. It eventually did, but writes was the order of 2 or 3 minutes and reads was somewhere in the 1 minute range I think.

IndexedDB seems get worse the more items you throw at it; it’s not exactly linear, and there’s a point where it’s just not feasible to use anymore.

What’s the catch?

Surely there’s a catch somewhere. Where are we paying the cost of those wins?

There is one downside: you need to download a 1MB WebAssembly file (actually I forgot about gzipping! it’s only 409KB after that). But that’s it. That’s the only catch. Everything else is an upside, and with WASM streaming compilation, the cost of parsing/compiling is relatively low. For real apps it’s a no-brainer, especially since it’s so cacheable (it never changes).

Not only do you get fantastic performance, you also get all the features of a great database:

Here’s a demo of full text search. Click “load data” to load some data in, and then start typing in the input. If you refresh the page the data will still be there.

No really — how is this possible?

Whenever something looks too good to be true, I like to sit back and make sure I’m not missing something. Is there some catch I’m missing?

How can we beat IndexedDB so easily anyway? If this works, why don’t we do the same thing with native SQLite and chop up a SQLite db into SQLite?

The reason it works is because IndexedDB is so slow. The simple act of batching read/writes provides such drastic performance benefits, that no amount of CPU processing is going to close the gap. Because that’s the thing — we are doing a lot more CPU work! Shouldn’t that count for something negative?

No — because we save so much time avoiding IndexedDB reads/writes that the CPU hit is negligable. You don’t even see unless you look real close. Gaining that time gives us a lot of time to do everything SQLite does, and still be 5x (or much more) faster.

I’m really hoping for a better storage API like this one. Let’s hope it happens.

Not fast enough? Caching to the rescue!

As I mentioned earlier in the post, while these results are extremely promising, a look at native SQLite performance is sobering. It’s an order of magnitude or 2 faster.

If you are doing a lot of reads, the current performance is not sufficient to support a real-world app. But there’s a very easy fix for it: use SQLite caching!

SQLite automatically uses a 2MB page cache, so you might not even notice any perf problems. It will cache the results of any read requests, and evict it whenever a write happens. For the usual use case of an app where there are many reads and sometimes writes, this works very well.

If you are working with larger data, you can try to bump the page cache up. Even something like 10MB would be fine. We’re basically lazily loading a lot of the db into memory, but we let SQLite manage it so it knows when to evict.

Increasing the page size is another way to reduce reads, which I explain below.

Page sizes

If you loaded data in the demo, open your developer tools and look at your IndexedDB data (in Chrome, it’s under the “Application” tab). You’ll see something like this:

Note how it stores data in blocks of 4096 bytes, which is the default page size of SQLite. The smaller page size, the more reads it has to do, and originally I thought this would be way too small to work. However, with the below optimizations it works fine!

However, you are free to bump up the page size. It must be a power two, so you can try 8192. Doing this cuts reads in half, and will increase performance. It’s not magical though; larger block sizes means it might read or wrote a lot of data that it doesn’t need. Profile and see what works for you. All of the perf benchmarks use a page size of 4096, meaning it uses a less performant config to try to be fair. It can run even faster though!

How do you change page size? Just use SQLite:

let stmt = db.prepare(`
  PRAGMA page_size=8192;
  VACUUM;
`);

You must VACUUM for it to take effect; SQLite must restructure the whole db to change the page size. If you do this, we automatically change how we store data:

Note how it’s blocks of 8192 now. We read the required page size directly from the bytes of the file and respect it.

In my app Actual (try the demo), it uses a 4096 block size. Eventually I’ll make it larger, but the same file is shared across platforms and I don’t want to effect mobile/desktop.

How it works

There are several tricks required to get this level of performance. Because IndexedDB is so slow, if we don’t do things right we could actually be much slower than a raw IndexedDB because we’re recreating transactions or something like that.

The goal is to do whatever you would expect would happen: if SQLite it making a bunch of sequential reads, it should open a cursor and iterate through the data. It should not open a new transaction for every read. This is difficult to do because we only have a single read function that takes an offset and length. However, with some tricks we’re able to make it work.

Another goal is to maintain as little state as possible. We shouldn’t be tracking lock state, our own change counter, or page size. We read directly from the SQLite file when needed, and leverage IndexedDB’s transaction semantics for locking. The only state we store is the size of the file, and that’s fine. This makes it far more reliable.

The big problem: sync APIs

The biggest problem is when sqlite does a read or write, the API is totally synchronous because it’s based on the C API. Accessing IndexedDB is always async, so how do we get around that?

We spawn a worker process and give it a SharedArrayBuffer and then use the Atomics API to communicate via the buffer. For example, our backend writes a read request into the shared buffer, and the worker reads it, performs the read async, and then writes the result back.

I wrote a small channel abstraction to send different types of data across a SharedArrayBuffer.

The real magic is the Atomics.wait API. It’s a beautiful thing. When you call it, it completely blocks JS until the condition is met. You use it to wait on some data in the SharedArrayBuffer, and this is what enables us to turn the async read/write into a sync one. The backend calls it to wait on the result from the worker and blocks until it’s done.

Another option would be to use Asyncify, an Emscripten technique for turning sync C calls into async ones. However, there’s no way I’m going to touch that. It works by transforming the entire code to unwind/rewind stacks, causing a massive perf hit and bloating the binary size. It’s just too invasive and would force SQLite to have an async API.

Atomics.wait unlocks some key performance improvements anyway. It’s good.

Long-lived IndexedDB transactions

IndexedDB has an awful behavior where it auto-commits transactions once the event loop is done processing. This makes it impossible to use a transaction over time, and requires you to create a new one if you are doing many reads over time. Creating a transaction is super slow and this is a massive perf hit.

However, Atomics.wait is so great. We also use it in the worker to block the process which keeps transactions alive. We open a transaction, and then block the whole thread so the system can’t take it away from me. I love it!

That means while processing requests from SQLite, we can reuse a transaction for all of them. If 1000 reads come through, we will use the same readonly transaction for all of them, which is a massive speedup.

This glossing over details; we only keep the transaction open for the scope of a lock… SQLite always locks files when reading and writing, and we open a single transaction for the scope of the lock

Iterating with cursors

It’s faster to iterate over data with cursors. We only get separate read requests though; how can we use cursors?

Because we keep a single transaction open for reads over time, we can detect when sequential reads are happening and open a cursor. There’s a lot of interesting tradeoffs here because opening a cursor is actually super slow in some browsers, but iterating is a lot faster than many get requests. This backend will intelligently detect when several sequential reads happen and automatically switch to using a cursor.

Interestingly, in Firefox (and I think Safari) using a cursor is a massive speedup. In Chrome it’s only slightly faster.

File locking and transactional semantics

This one is for the database nerds. This is probably the most important thing to get right because if not your database will get corrupted. Performance doesn’t matter with a corrupted database.

How do we make sure we’re always applying writes correctly? Unsurprisingly, it’s complicated! But I believe the algorihm implemented is robust.

Atomic commits

The first thing to understand is the assumptions that SQLite makes. We need to meet these assumptions 100% or otherwise risk database corruption.

Atomic Commit In SQLite is a great description of how writes work. I also studied the source code a lot to understand any other requirements. It comes down to this:

The first requirement we must meet is to provide an fsync method that flushes out writes atomically.

Here’s the thing: if we’re using IndexedDB, we don’t have to worry about any of this. IDB already provides transactional sematics; if we open a transaction and do a bunch of writes, they either all succeed or not. That takes the weight off of us and we can rely on that.

We don’t really even need a journal file. We could tell SQLite to avoid using one with journal_mode=OFF, however it still needs to journal for rolling back transactions (ROLLBACK). So you should use journal_mode=MEMORY which keeps the journal in memory — that’s much faster than writing it to disk.

There’s also write-ahead logging, or “WAL mode”. This can be more efficient if working with a real disk. However, to achieve its performance it requires things like shared memory using mmap and things like that. I don’t think we can reliably map those semantics onto IndexedDB, and I don’t even see the point. WAL potentially adds more overhead if we’re just writing them all to IDB anyway. The structure of the writes isn’t as important to us.

File locking

SQLite uses advisory locks to coordinate between db connections. How locks are used is described in detail on Atomic Commit In SQLite.

It’s super important that we get this write. We need to lock our data in the same way the SQLite expect, otherwise connections could write over each other. Even though we depend on IndexedDB transactions, that doesn’t mean they are ordered correctly.

We can leverage transactions, however. We just need to do some extra work. IndexedDB readonly transactions can run in parallel, while readwrite transactions with only run one at a time. We can use this to implement locking — once a readwrite transaction is running, the file is effectively locked.

Here’s our method that implements locking, and the comment above it is helpful.

When a SHARED lock is requested, we open a readonly transaction. It’s fine if many SHARED locks exist at once, and that works with parallel readonly transactions. However when a write lock is requested, we open a readwrite transaction. Once that transaction is running, we know we control the database because of IDB semantics.

The only problem is that another thread might have written data down between the time that we requested a readwrite lock and got it. SQLite already gives us this information! Once we’ve grabbed a readwrite transaction, we can read some bytes from the file which represent SQLite’s “change counter”. If that counter is the same as what we had when we requested a write, it’s safe to write!

(If you’re familiar with how SQLite upgrades/downgrades locks, we implement the same flow and you can view the source for upgrading here)

SQLite locks a file for writing whenever a transaction starts, like BEGIN TRANSACTION, and then unlocks it when the transaction ends. In a way we’re mapping SQLite transactions onto IndexedDB readwrite transactions, but I wouldn’t think of it that way because we don’t translate ROLLBACK to abort() in IndexedDB. We’re leveraging IDB transactions to ensure safe atomic writes, and we write down whatever SQLite tells us to.

This is only possible because we have long-lived IDB transactions, since we can open a readwrite transaction once and make sure we have it over the course of the write lock. If we had to reopen it, it would destroy our ability to safely write to the db and it would get easily corrupted.

Fallback mode without SharedArrayBuffer

While most browsers support SharedArrayBuffer, Safari has not enabled it yet. I’m not too worried about it because it will in the future, however we need to provide some kind of support for it today.

absurd-sql provides a fallback mode if SharedArrayBuffer is not available. The way it works is it reads all the data in at the beginning so reads happen sync.

Writes are more difficult: it assumes that writes are always successful, but actually queues them up to run in the background. We implement the same transactional semantics as before (only write when it’s safe), but they happen in the background.

The major problem with this is when a write is not successful, the whole database is stale. We can never write to the file again. We could potentially restart the db and read the entire file again but that feels to complicated.

The end result is that if you have two tabs open in Safari, only one of them can actually perform writes. If both of them try to, one of them is going to detect that something has changed from underneath it and never write to to the db. It will continue to work in memory, but when you refresh changes will be lost. The app should notify the user when this happens so they don’t lose data.

Come help!

I would love your help in testing this and improving it! Come over to the repo, and I hope you have fun SQLing.

- James