Simple Pleasures

There are a multitude of stories, teachings and traditions in our society that instill the unconscious belief that reward only comes through hard work, struggle and/or sacrifice. Because of this…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Making changes to your database

If you have a database, you probably have data which has errors, less than ideal column types/names or you need to change it over time.

But how do you actually upgrade your data?

If your database is small enough, you could make the data updates in your database migrations. After your database grows enough this approach may start causing you problems with locking or performance issues while the migrations are running.

Also, if you upgrade during the database migrations the old code will be running, possibly causing race conditions.

In https://medium.com/@jakeginnivan/how-to-do-database-migrations-7f98104e9e53 I touched on how to rename a column without downtime, it required 3 deployments. This is one scenario, another is incorrect data which needs updating in place.

There are three issues at play when fixing either of these scenarios, performing any required schema changes, the changes required to the data stored in that schema and the intermediate states after the migration has run, but the old code is still running.

The breaking schema change is the more obvious and easy to deal with, but the second and third ones can be a pain for the following reasons:

Due to the complexity of fixing these issues, we typically only fix the more critical issues and don’t bother refactoring our database or we resort to implementing workarounds to deal with bad data.

We wanted to solve this in a more cross cutting way, which ensured we wouldn’t hurt our database performance by updating 1million rows at the same time to fix a small data issue in many of them.

There are a few moving parts to our solution, and you may have 1 or more of the concepts already in place. If you don’t, each of these sections have value on their own but work together to give us much more flexibility around the way we interact with our database.

A query object encapsulates your database query and create named queries which can be tested rather than adhoc queries spread over out codebase.

This is a pretty simple example, but scales really well. You can add args for smaller variations, or a whole new query when the queries are getting too complex.

Using this pattern we collapsed a ~600 ad-hoc queries into ~50 query objects which had tests covering them.

The centralisation of our queries also allowed the next few steps because we can enforce conventions in our queries which are must harder to enforce when using knex directly.

Being a news site, we don’t have a very business logic rich domain, but having a wrapper around the raw database objects has a few advantages.

So what does this look like.

Then we can update all out query objects to return domain objects instead of schema objects.

While this first level of abstraction doesn’t seem useful, let’s say we want to change the content column to a jsonb column type, we can actually handle the code side of the migration entirely in our domain object. Rather than having to search for all usages which use that table, and make sure that code is using the new column.

We have added a content2 column which is a jsonb column to our database and the type representing the article table. Notice both content columns are nullable. This allows new inserts to just insert into the new column. Then we modify the content getter to use the new value if it’s there, otherwise use the old value.

This first level is just the start of making use of our domain objects, next we will ensure updates go through them.

We can now use this to perform simple updates

article.update(() => ({ headline: ‘New headline’ }))

But how to we persist that to the database? The answer is creating a new query object.

Now you might be asking yourself at this point, how does this help me do data upgrades?

Now that we have an abstraction to our database we are in a position to solve data upgrades.

You might be seeing where I am going with this now, if we go back to the beginning of this post part of the issue was performing the data upgrade in our migration.

All the above infrastructure is to allow us to easily perform the upgrades in code.

Our domain object takes care of tracking which upgrades have run, and not running already run upgrades.

IMPORTANT NOTE: Unless you were going to be performing an update anyway, don’t flush these updates back to the database. It could put additional unexpected load on your database and cause you to be opening/committing changes in an API endpoint which you expect to be read only.

The final piece of the puzzle is to upgrade all the data in the database which is not upgraded. To do this we just start a background process which looks roughly like this.

Now when we deploy, it will just start upgrading all our data over time. Once this process is done, you can deploy a new version of the code which drops the column and deletes that data upgrade. We are safe to drop the column before the new code has gone out because all data upgrades are complete, and the data upgrade was the only place where the old column was referenced.

Using this approach we have managed to reduce our deploys from 3, to 2, and fixed any potential performance problems from updating a large number of rows at once.

Hopefully this approach is useful to you, since it’s been in place we have fixed a large number of small data issues and refactored our database schema. This was only possible because we established a way to do it, which is safe to deploy to production without introducing downtime or temporary errors.

When we need to change data, we don’t want to risk production stability, so we update it as a background process. The tl;dr of our approach is:

Add a comment

Related posts:

Time to Change

My favorite time of each fantasy season is draft season. The evaluating players, mock drafts, and overthinking are so exciting for me — and then there is the actual draft. All that prep work…