Event Sourcing: Read Models to the rescue

This is part three of my event sourcing series. In the first two posts, I showed you approaches that rely solely on projections. Now is the time to introduce read models to support all your query needs. And maybe, solve some performance issues as well. Of course, I’ll discuss the downsides of read models, too.

To finish this post, I’ll do a deep dive into a code example.

Read models solve three problems in event sourcing: Querying projections, repeated transformations, and performance problems with long event streams.

What a read model is

Hey Copilot, what is a read model?

Copilot:
A read model is a projection of events into a structure that’s fast and convenient to query. Because the event store only contains the history (the facts), you build one or more read models to answer questions like:

  • “Show me the current balance.”
  • “List all open orders.”
  • “Give me the dashboard summary.”

The read model is derived, not authoritative — the events are the source of truth.

Thanks Copilot.

The events are projected and stored in one or more read models, so we can query them instead of the event streams.

Query anything

Without read models*, we can only query event streams by their IDs (see post 1), or by data we extend the events with (see post 2). With read models, we can add whatever we need to them to achieve high-performance queries, including search and computed columns, thereby simplifying queries.

* There are, of course, other possible solutions, like some kind of index to find the right event stream(s) to project to get the projection(s) you need. However, I think read models offer greater benefits with less effort.

Get rid of repeating transformations

UIs and APIs often need data in a specific format. This means we often have to transform the raw data we get from projections into the required form. If this transformation isn’t trivial, we can run into performance issues or high CPU-load problems.

A read model solves this problem by storing the transformed data in it. The transformation has to be calculated only when we update the read model.

If there are several clients with different needs, we can implement multiple read models that match their individual needs. Of course, this makes updating the read models more expensive.

Projecting long event streams only on changes

And finally, read models solve the problem of repeated projections of long event streams as well. These long event streams only have to be projected when changes happen to update the read model, not on every query.

The downsides of read models

As everything is software engineering, read models are trade-offs. They offer better query capabilities and improved read performance, but they also introduce some downsides.

We need more storage space because we are introducing additional representations of the data we already store. In most cases, this is negligible because the events need much more space anyway, but you should keep this in mind.

More difficult is updating the read models. Depending on the approach chosen, it introduces different problems into our codebase.

Consistent Updates

With the first approach, called consistent updates, we update the read model(s) immediately when we add the new event(s) to the event stream – typically in the same database transaction.

The benefit of this solution is that it is straightforward:

  1. get the old projection
  2. apply the new event(s) to it
  3. store the event to the event stream and the updated projection to the read model

The problem of consistent updates is concurrent updates. When multiple commands add events to the same event stream, we could lose changes in the read model – one command overrides the changes made by another command because it reads the projection without the other command’s event. Even more confusing for a user is that a later update would then pick up all events again. Although the system is self-correcting, the user may already have taken actions based on data missing an event.

There are three solutions to this problem:

  1. Ignore it because the domain makes it (almost) impossible for concurrent events on the same event stream to occur.
  2. Detect concurrency problems by making sure that no events were skipped. For example, the read model stores the event number it was created with and only accepts an update when the new projection is based on an event one number bigger. Otherwise, we deny the update, repeat reading, project all events (there should now be an additional one), and store the read model entry.
  3. Switching to eventually consistent read models (see below).

We still choose the consistent updates approach when it is very unlikely that multiple events for the same event stream can occur concurrently, due to its simplicity. In addition, our users expect a consistent user experience across many parts of the application, which this approach provides.

If we learn that concurrent events occur, we have the option to introduce concurrency checks or switch to eventually consistent updates.

Eventually consistent updates

Instead of updating the read model immediately after appending the new event to the event stream, the command appends the new event only. Then, an asynchronously running listener/projector reacts to the new event by projecting the event stream and updating the read model.

Benefits of eventually consistent updates

If the listener/projector synchronises projections using the event stream ID, then we are guaranteed never to have concurrency issues that result in incorrect projections. The projector always picks up all available events to update the read model and will execute again once new events become available.

The commands run faster because they no longer have to update the read model.

Disadvantages of eventually consistent updates

The main disadvantage is that eventually-consistent read model updates make it hard to provide a consistent user experience. That is, of course, only if you need a strongly consistent user experience. I’d say that most applications don’t need a strongly consistent user experience. Or you can use session consistency – the user sees their changes strongly consistent, but not changes by other users. So after executing a command, the client needs to “wait” for the read model to catch up. Waiting = polling, long-polling, signals/subscriptions – choose your poison.

Summary

Read models are great when you can’t query event streams directly. Either because you can’t find the needed event streams with the query data you have available, or because the projections would take too long. In addition, read models perform well when there are many more reads than writes.

When deciding whether to use consistent or eventually consistent updates, start with the domain. How likely are concurrency issues? Do you need strong consistency?

However, if you can get away with a projection only based approach, go with that because it’s simpler.

Deep Dive

Continuing the expenses example from the first post, the following code shows the command that processes an expense (typically meaning it was paid). I won’t explain the details. Please ask if you have questions 😊 (on BlueSky, Mastodon, LinkedIn, or here in the comments section).

First, we need some data for the command (for historical reasons that I can’t recall, we named our commands operations):

The library FSharp.UMX allows us to attach measures to GUIDs – awesome!

The command/operation needs to check whether the current user (the requester) is allowed to process the expense:

Here is the command:

First, we retrieve the current user (processor) from the metadata generated by the request data. If none, we return an error.

Then we load the settings and the expense that should be processed. Note that let! ... and! ... let’s us load both in parallel. parallelAsyncResult comes from the fantastic library FsToolkit.ErrorHandling. If we can’t find the expense, we return an error.

Next, we validate that the current user is allowed to perform this operation and that the expense is in the Accepted state. Only accepted expenses can be processed. If not accepted, we return an error.

Then, we create the new processed event and apply it to the current expense. As you can see, we went with consistent updates without any concurrency checks. The reason is that it is very unlikely that multiple changes to the same expense occur within the timeframe of this operation’s execution.

Finally, we use the OperationRunner to persist the event to the event stream and the updated expense to the read model. It also notifies that this workday has changed. That will trigger a couple of validations and recomputations. The Operation Runner also performs logging and will compensate the operation in case it fails. But that is stuff for another blog post.

The function persisting the event and read model entry looks like this:

We use an interface (ExpenseStorage) to provide all persistence functions/methods because we can run our application with a real SQL server database or a hand-written in-memory fake. With the interface, it’s easy to switch between them.

We wrote some helper functions in the Sql module to help with database interactions. These use Dapper on the inside. Sql.ExecuteInTransaction executes several SQL commands in a single transaction, so that we update both the event stream and the read model, or neither.

The command inserts the event into the event stream, deletes the obsolete entries in the read model, and inserts the new read model entries (if any).

mapToRow maps an event to a record representing the data row in SQL server:

CLIMutable is required for Dapper to write values. We only use base types that we map to because it works better for us than registering mappers in Dapper (no type surprises).

We store the event data as JSON, so we need the Version field to make sure we can read old versions of the events.

We’ll look at CompensatedAt in a later blog post (we all love cliff hangers, don’t we?).

About the author

Urs Enzler

Add comment

Recent Posts