Event Sourcing: trade-offs of Event versioning and migrating approaches

Event‑sourced systems must evolve, and sooner or later, events need to change. In the sixth part of my series on event sourcing, we are looking at versioning of events (schemas) and whether migrating or keeping all old versions is the way to go. Of course, there are trade-offs involved regarding long-term maintainability and overall simplicity.

Why is versioning even needed?

Events state immutable facts – so goes the theory. But what they need to tell us may change.

Sometimes we need to add more data to an event. Data we didn’t know we’d need back then, when we designed the event. That data was already there in the real world, but we didn’t catch it because it wasn’t necessary for our application.

Sometimes, we find a better way to model our events. We still store the same data, but with differently modelled events. Maybe the code gets simpler, or we can support new use cases this way.

Approaches for versioning events

There are at least four ways to version events:

  1. Introduce a new event (with a new name/type) and keep the old one(s)
  2. Introduce a new version of an existing event, keep the old version, and handle multiple versions when projecting events.
    • Projection logic per version
    • Ad-hoc migration
  3. Introduce a new version of an existing event and migrate all previous events to it.
  4. A combination of 2 and 3: temporarily support two versions, and no downtime migration in production.

There are likely more, but these are the ones we used, and I have experience with.

1. Introduce a new event

A simple approach to adding additional data to an event is to introduce a new event type that includes the new data and treat it as its own dedicated event.

The downsides are that you may end up with duplicate projection logic and that it becomes harder to comprehend the more “versions” of an event there are. Our code should probably always use the latest “version” of the event, but that get cumbersome when there are many to choose from.

2. Introduce a new version

Instead of creating a new type variant of the same event, we keep a single event type, but we add a version number to it so that when we read events, we know which version it is and can handle it differently.

We either keep the projection logic per version, so we have code for every version, or we perform an ad hoc migration of the event to the latest version and project that.

Choosing between these two sub-options depends on how easy it is to do the ad hoc migration. Generally, I’d prefer the ad hoc migration because it probably reduces code duplication.

3. migrate all previous events

With this approach, we don’t introduce versions of events. We migrate the affected existing event streams by replacing old events with new events.

This approach yields the cleanest code because we don’t care about different versions. There is a single way to project an event, and no ad-hoc migrations are in place. Simple and fast.

However, migrating event streams can quickly become a time-consuming task. You probably need maintenance windows to perform the migration, and releases are restricted to them. So the release management can be tricky.

4. No downtime migrations

Approach 4 – no downtime migrations – is a combination of approaches 2 and 3. We temporarily support two versions of an event.

  1. We introduce a new version of an event; we write only events in the new version, but we can still read the old version. We release the software.
  2. We run a migration that migrates events from the old to the new version. Best done in batches to keep the production system responsive.
  3. Once all old events are migrated, we can remove the code for the old version and release the cleaned-up software.

The advantage of this approach is that we can keep the code simple by maintaining a single version of each event (outside of the migration phase), with no downtime during migration.

The downside is that it needs a multi-step release process.

Summary

When we work on an event-sourced system, events will have to change as new learnings emerge. So either we deal with multiple versions of events, or we migrate them – or we use a combination.

That’s it for today – unless you are in for a deep dive…

Deep dive

The deep dive shows the details of our code. I don’t expect you to understand everything, but it gives you some insights into our real production code. I hope you like it.

Let me replay the actions taken on an event migration I did lately.

In earlier posts, we used expenses as a sample, and I’ll continue to do so here. I’ll take you through an update of an expense settings event. Expense settings define what kind of expenses can be entered.

An expense settings event looks like this:

With Data specifying the details of an event:

The change we have to make is inside the ExpensePositionAdded case:

We need to change the ExpensePosition from

to

by adding the last two fields.

So we need to change the code responsible for persisting expense settings events:

The ExpenseSettingsEventRow matches the SQL server table. As you can see, there is a Version field on it. The only thing that typically changes is the data in the Data field. So the version describes the Data content.

We typically use approach 4 (zero-downtime migrations) when bumping an event version.

To be able to read the old version, we introduce a copy of the old version of the data:

And we need to specify the function used to deserialise this version of data:

We implemented a MigrationConverter that takes the old ExpensePosition as an input and returns the new one. We initialise the two new fields with false, so the system acts the same for old data as before. That means we use ad hoc migrations for old events. Note that we only need to provide a migration converter for the changed type, not for the entire object graph (the entire JSON or all the types in the JSON – I don’t know how to call it) read from storage.

The serialise and deserialise functions for the current version stay untouched and look still like this:

When we want to store an event, we need to map it to the ExpenseSettingsEventRow type. We use the latest version there:

When we read data from the storage, we map the row back to our event type:

Now, depending on the version, we use the corresponding deserialisation function. And now, we can read old and new events.

After this code is released, we can migrate the events from the old to the new version with a console application running on my computer.

The program defines a generic migration function that reads event data batched from the old version from storage, deserialises it, serialises it into the new version, and updates storage.

We can define the connection string, the batch size, and what events we want to migrate (schema, table, old, and new version:

Then we can run this program, and after a while, all events are on the new version. We use batches because we don’t want to lock the table for too long. This way, the productive system stays responsive for our users.

Finally, we can remove the old version from the code and release the cleaned-up code.

About the author

Urs Enzler

Add comment

Recent Posts