Developing ACID Transaction Support in StarbaseDB

If you don’t know what StarbaseDB is yet, it’s an open-source HTTP SQLite scale-to-zero database built directly on top of Cloudflare Durable Objects.

What I intend for these blog posts to be is an opportunity for everyone to join along in my journey to spectate (and opine!) on how I go about reasoning through and implementing key decisions that make their way into how StarbaseDB actually works.

Contributions are open to anyone and all ideas, requests, and suggestions are welcome as we build this out with the community.

Overview

Do you know what ACID stands for when we talk about transactions within a database? It’s the most important concept to wrangle and critical for any database to implement. Let us dive into a group review dissecting what each letter in the acronym represents and how each piece can be achieved in a database system. Then we’ll switch over and talk through our implementation and why we have made the decisions we have as it pertains to StarbaseDB.

A is for Atomic

When you execute a single query against a database like a SELECT we refer to that as a query – a statement that retrieves data from a database without modifying it. Those are simple to reason about operations. Operations such as INSERT, UPDATE and DELETE on the other hand are considered data manipulation statements (or DML statements), rather than queries. These are operations in which data is being modified within the database and we need to ensure these actions are both atomic and consistent.

Atomicity ensures that a transaction is treated as a single, indivisible unit. Meaning, all operations within a transaction must either all succeed or if one fails then they all fail together. Transactions in database systems allow us to pass in multiple queries to be executed collectively.

Let’s take the following classic example showcasing atomicity. Suppose a bank transaction is being made where Jane Doe is transferring $100 from their account to John Doe’s account (John bought too many domains this month and needs bankrolled a little bit). Two actions need to succeed for the entire transaction to succeed:

Deduct $100 from Jane’s account
Deposit $100 into John’s account

If the first part of this transaction succeeds in removing $100 from Jane’s account, but the second part of the transaction fails in depositing $100 into John’s account then we need to make sure we safely rollback the entire transaction – otherwise Jane has lost $100 and nobody has gained $100… except perhaps the bank.

Atomic means for everything to succeed, or for everything to fail together.

C is for Consistency

Ensuring that all data within a database complies with the rules defined by the database is called Consistency. Extrapolating on the example we presented above when we were deducting $100 from Jane’s account, a rule likely would exist stating that any account can’t withdraw more money than it has. Simply put, an account cannot be a negative value.

If Jane only had $50 in her account prior to the transaction being executed, then the rules of consistency should check that $50 subtracted from $100 would put the account balance at negative $50 which violates the constraint of that column. When this rule becomes violated the entire transaction should be aborted and the original state of the data prior to the beginning of the transaction should remain in tact.

Consistency is the compliance of rules set within a database.

I is for Isolation

Of the four letters in ACID, Isolation may have the most depth of involvement to it due to the nature of complexity and implementation details between database systems. At its most basic level Isolation is meant to ensure that no two transactions interfere with one another.

Continuing on with our classic banking scenario, let’s say we have one database transaction in flight of deducting $100 from Jane’s account and depositing it into John’s. However, while that transaction is occurring on the database another query comes in attempting to read the balance of both John & Jane’s account. Isolation ensures that the second query does not see an intermediate state such as $0 in both accounts (while the $100 is in the transferring part of its transaction). Instead, because the first transaction has not been fully committed the second transaction of reading the account balances will see the state of the database as it was before the first transaction started – which was $100 in Jane’s account and $0 in John’s.

Isolation ensures transactions are invisible to other transactions until one of them has completed.

D is for Durability

After a transaction has completed successfully we need to ensure that the manipulated data is committed and stored in memory. Even if a power outage occurs we need to be confident that the data from the successful transaction persists.

Finishing off our example on how Jane graciously helped John out for his domain purchasing addiction, let’s see how durability plays a role. Our transaction of $100 from Jane to John’s account has succeeded. The database now needs to ensure that both accounts reflect the correct amounts. If our system crashes immediately after the transaction was committed and the database is restarted, we can use our transaction log to recreate in our database that we know that transaction had processed successfully and now if we query again for the account balances of our friends they will be properly reflected even after reboot.

Durability is the ability to not lose data of successful transactions during unforeseen events such as power outages.

The Problem

Review lesson is officially over. We’re all up to speed on what ACID is and why it’s so important for database systems to get right. Now is where we talk about the details as to how we have approached it currently in StarbaseDB.

As you might already know, Durable Objects are inherently single-threaded. Each Durable Object also has its own private storage on disk which for our purposes we use the disk storage as our SQLite database. These will be two key notes to keep in mind moving forward.

Traditionally in SQL you might be familiar with having the ability to construct transaction queries and send them to the database, however the SQLite engine on Durable Objects does not support the following without throwing an authentication error:

BEGIN TRANSACTION;
    INSERT INTO table_name (id, name) VALUES (1, "John");
COMMIT;

So what do we do given that’s not possible? We have to construct our own way to allow transactions to be defined, and then second to that, architect our Durable Object in a way that we manage the success (and failure) of a collection sequence of DML statements. This includes rolling back to a valid state upon a failing transaction.

Supporting Transactions

Before we can get into the implementation details on how we handle the queueing and management of requests, we first need a way to allow users or systems to even send us transaction blocks of queries. As we mentioned above, writing it how we normally would in SQL isn’t seemingly an option available to us at this time.

Prior to allowing transactions we only supported passing in a single QueryRequest object into our HTTP endpoint that has the following structure.

export type QueryRequest = {
    sql: string;
    params?: any[];
};

The above type supports a single statement well, but not if we want an array of statements to be executed as a transaction together. For this use case we have introduced the following type in addition to our QueryRequest.

export type QueryTransactionRequest = {
    transaction?: QueryRequest[];
}

With the two types defined for what we expect to be passed into us via HTTP we can now go into our route handler and see how we can detect which one is being presented to us and how to fork the logic accordingly.


async queryRoute(request: Request): Promise<Response> {
    const { sql, params, transaction } = await request.json() as QueryRequest & QueryTransactionRequest;

    // If `transaction` is in our network request body
    if (Array.isArray(transaction) && transaction.length) {
        const queries = transaction.map((queryObj: any) => {
            const { sql, params } = queryObj;
            return { sql, params };
        });

        const response = await this.enqueueOperation(queries, true);
        return response;
    }

    // Otherwise if not a transaction, treat as a single query
    const queries = [{ sql, params }];
    const response = await this.enqueueOperation(queries, false);
    return response;
}

What we have defined above is both the two types to handle two incoming network request bodies. Those that support transaction array as its body, and the default which just supported a single object with two keys, sql and params. Based on what we receive we can now handle them how we must to be atomic and consistent.

Single Query cURL

curl --location --request POST 'https://starbasedb.YOUR-ID-HERE.workers.dev/query' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ABC123' \
--data-raw '{
    "sql": "SELECT * FROM artist WHERE artistid=$1;",
    "params": [123]
}'

Transaction Query cURL

curl --location --request POST 'https://starbasedb.YOUR-ID-HERE.workers.dev/query' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ABC123' \
--data-raw '{
    "transaction": [
        {
            "sql": "SELECT * FROM artist WHERE artistid=$1;",
            "params": [123]
        },
        {
            "sql": "SELECT * FROM artist;",
            "params": []
        }
    ]
}'

Above are two cURL examples of how we would anticipate a user passing in either a single query, or a transaction block of queries into our system via HTTP.

Implementing ACID Transactions

Now to the good stuff. We need to make sure that requests are handled in the order that they are received (this sounds like we’re operating a call center) and only move onto the next request when the one before it finishes. With it come two problems in search of a solution; queueing requests, and rolling back failing transactions.

Remember! Our goal is to be ACID compliant so we need to make sure we are being atomic with our queries/DML, consistent by respecting defined database constraints, isolated so two queries don’t affect one another, and durable so data is persisted through sessions or reboots. Durable Objects do scale to zero, so durability is key – thankfully “durable” is in the name.

Queueing Requests

Durable Objects allow for queries to be executed in a synchronous manner. The disk is directly accessible and we can treat it as if the data in the SQL database were an L5 caching layer (read more on their blog here). We can expect the results will be provided back to us quickly, sometimes quicker than asynchronous requests as we don’t have to manage the costs of async event loops.

We want to make sure for the sake of atomicity and consistency in particular that our queries are executed in the order they were received so we can guarantee transactions are committed correctly and no two concurrent requests are vying for mutating the same information. Because of the fact that our durable object can run synchronous SQL query executions and we can feel confident that our responses will be quick, we can simply queue up queries & transactions (which are arrays of queries) alike to be executed one after another.

In our code we have this operationQueue where we insert each incoming request into the execution queue where they will be picked up one at a time. We also maintain a flag to indicate that we’re currently processing a request at the moment with processingOperation boolean.

private operationQueue: Array<{
    queries: { sql: string; params?: any[] }[];
    isTransaction: boolean;
    resolve: (value: Response) => void;
    reject: (reason?: any) => void;
}> = [];
private processingOperation: boolean = false;

Time to dive into our enqueuing function and how we actually handle executing each request. You caught a glimpse of the enqueueOperation function call in our routes mentioned in the section above but when we dig into the guts of it, we create a way for us to push the incoming queries into an array. That’s it. Just places what we want to execute in a line.

async enqueueOperation(
    queries: { sql: string; params?: any[] }[],
    isTransaction: boolean
): Promise<Response> {
    const MAX_WAIT_TIME = 25000;

    return new Promise((resolve, reject) => {
        const timeout = setTimeout(() => {
            reject(createResponse(undefined, 'Operation timed out.', 503));
        }, MAX_WAIT_TIME);

        this.operationQueue.push({
            queries,
            isTransaction,
            resolve: (value) => {
                clearTimeout(timeout);
                resolve(value);
            },
            reject: (reason) => {
                clearTimeout(timeout);
                reject(reason);
            }
        });

        this.processNextOperation().catch((err) => {
            console.error('Error processing operation queue:', err);
        });
    });
}

Okay, so we do a bit more than just push it into an array. The rest is some help from promises in passing messages of status around, as well as declaring a maximum execution timeout per request.

After it’s pushed into the array we immediately call another function called processNextOperation(). The intent of this function is to take the first request in the queues line and kick off the execution of the SQL statement(s). Once a response has been presented then we’ll surface that response to the end user.

async processNextOperation() {
    if (this.processingOperation) {
        // Already processing an operation
        return;
    }

    if (this.operationQueue.length === 0) {
        // No operations remaining to process
        return;
    }

    this.processingOperation = true;

    const { queries, isTransaction, resolve, reject } = this.operationQueue.shift()!;

    try {
        let result;

        if (isTransaction) {
            result = await this.executeTransaction(queries);
        } else {
            const { sql, params } = queries[0];
            result = this.executeQuery(sql, params);
        }

        resolve(createResponse(result, undefined, 200));
    } catch (error: any) {
        console.error('Operation Execution Error:', error);
        reject(createResponse(undefined, error.message || 'Operation failed.', 500));
    } finally {
        this.processingOperation = false;
        await this.processNextOperation();
    }
}

There really isn’t too much going on above. If it’s a transaction (an array of queries coming in) then we ensure they all succeed or if any fail we rollback. Otherwise if a single query is the request then we don’t worry about rollbacks we just execute it directly.

After each request set from our queue is finally finished, we recursively call the same processNextOperation to let it run a check if any other requests are pending to be executed as well.

Transaction Rollbacks

We mentioned rollbacks above but we didn’t take a moment to double-click into it. Transaction HTTP requests come presented into our router with a body of QueryTransactionRequest which consists of an array of QueryRequest objects (reference the cURL statement example above).

So as we’ve talked about perhaps to more of an extent than we’ve wanted, we need transactions to either all succeed, or if a single one of them fails then rollback to the beginning state of the database. But how?

For any transaction array of queries before we attempt to execute a single one we take a snapshot of the database in its current point in time. In our code below we store that as transactionBookmark.

With a snapshot in hand we can begin looping over our synchronous query execution function calling and throwing an error if any query fails. In that instance if a failure is detected then we latch back onto that transactionBookmark we defined before the first query and tell our durable object to call onNextSessionRestoreBookmark and abort to restart itself, meaning the next request will have the last database state available to it prior to this transaction kicking off.

async executeTransaction(queries: { sql: string; params?: any[] }[]): Promise<any[]> {
    const results = [];
    let transactionBookmark: any | null = null;

    try {
        // Create a storage bookmark before starting the transaction.
        transactionBookmark = await this.ctx.storage.getCurrentBookmark();

        for (const queryObj of queries) {
            const { sql, params } = queryObj;
            const result = this.executeQuery(sql, params);
            results.push(result);
        }

        transactionBookmark = null;
        return results;
    } catch (error) {
        console.error('Transaction Execution Error:', error);

        /**
         * If an error occurs during the transaction, we can restore the storage to the state
         * before the transaction began by using the bookmark we created before starting the
         * transaction.
         */
        if (transactionBookmark) {
            await this.ctx.storage.onNextSessionRestoreBookmark(transactionBookmark);
            await this.ctx.abort();
        }

        throw error;
    }
}

ACID Adherence

All of the implementation details above are great, but are we adhering to ACID? Let’s take a look at each one.

Atomic. When any query within a transaction block of queries fails, the state of the database is restored back to its original state before the next transaction is initiated. Check.

Consistency. Queries and mutations adhere to the rulesets defined by the database. If a transaction query fails due to rule limitations, the query throws an error and we rollback the transaction to its original state. Check.

Isolation. Due to the nature of our queue and query executions, everything is synchronous by nature so no two queries can be ran concurrently making them run in an isolated nature by default. Check.

Durability. Cloudflare Durable Objects provide a durable storage on disk by default. When a transaction completes the SQLite state is immediately updated on disk. Double check.

Considerations

Every feature implementation likely comes with its own list of pros and cons and this implementation strategy in its current form has its share of downsides. An experienced database engineer can probably already point one or two out.

One limitation of Durable Objects is that each new network request made to the instance resets its CPU execution time back up to 30 seconds, it’s maximum. In an example where our instance gets two requests at the same time, the CPU execution time is reset to 30 seconds. A problem would occur if the first request took 29 seconds to execute and return, the second request would have 1 second left to complete all of its execution before our Durable Object were to shut down. Clearly that is less than ideal for our users.

We will need to return to our solution in the future to ensure each request in the queue gets completed and our Durable Object stays alive long enough to fulfill each request. There are options that exist such as scaling StarbaseDB instances horizontally which Durable Objects can do well, but we need to find a way to consistently have manipulated data centralized and distributed (oi vey, database problems). More deep space exploration on that later!

Pull Requests

Want to see the code that was contributed alongside this article? Check the contributions out below!

https://github.com/Brayden/starbasedb/pull/4 (Support added)

https://github.com/Brayden/starbasedb/pull/5 (Code refactored)

Outerbase

While you’re down here hanging out at the bottom of my blog post, visit Outerbase for the best experience interacting with all of your databases. I’m the co-founder & CTO of Outerbase and loving every minute of database exploration and building on all fronts.

https://www.outerbase.com

And check me out on Twitter:

https://twitter.com/BraydenWilmoth