Dealing with partitions in Postgres 11+

Photo by Tobias Adam on Unsplash

One of the main features of PostgreSQL 10 was support for declarative partitioning. And PostgreSQL 11 brought many improvements to it.

But does that solve all the issues, and what other issues you may get from using partitions in your project?

Let’s start with the table we’d like to partition:

We assume that we get tens of millions of objects per day, which are uniquely identified by external_id

Due to the fact that we mostly care about today’s data and we also like to scrap object that are older than a week, we decided to partition by created_at

Makes sense, right?

Well, not really:

Ok, let’s move primary key definition:

Well?

So, both the primary key and unique keys need to include the partition key.

Let’s try upsert query on that table:

Not good:

So, we need to include the partition key in our upserts too:

Another error:

Right, we didn’t create any partition yet. But there’s a quick fix for that since Postgres 11 — defult partitions.

Now, let’s try inserting object with same external_id twice. We expect to have only one row, with status = 2

But we have two:

Reason is quite obvious when we look back at our unique key. It includes created_at , since that’s the partition key. But created_at is usually set automatically, rendering our unique key useless for upserts.

Are we stuck? Not yet.

Let’s go back to our table definition and add a new column, called day:

Now, this column, and not created_at, becomes our partition key.

And now we get the expected behaviour from our upserts:

I know what you’re thinking. Having this additional column may seem ugly.

So, maybe using expressions as our partition key would help us?

Let’s try it:

Doesn’t seem so:

Ok, maybe if we just get rid of the primary key it will work fine?

Maybe if we can get rid of the primary key it would work?

Nope:

So, you can use expression as your partition key, but then you must forfeit unique keys.

And in some cases, it may be a valid choice.

Consider, for example, a situation where we never mutate our data, and use only inserts. It becomes then the application decision which row is the correct one. You may decide it’s the row with the greatest created_at , or with greatest status , or probably you’ll have some even more complicated business logic to resolve that.

The last, rather trivial part that’s left for us to do is to create the partitions themselves.

We can create hundred of them ahead of time:

Let’s take a look at the execution plan now:

Note that I decided to select ±1 days

As you can see, PostgreSQL successfully prunes the partitions, scanning only the relevant ones, then uses the unique index we created on the parent table to fetch the particular row.

Conclusions

You should make good use of partitions in PostgreSQL.

But consider carefully the requirements:

  • Both the primary key and all unique keys must include partition key in their definition. Make sure that the unique key still makes sense
  • Partitions are not created automatically, so make sure you have a default partition
  • Expressions can be used as partition key, but then you cannot use neither primary key nor unique keys

Solutions Architect @Depop, author of “Hands-on Design Patterns with Kotlin” book and “Web Development with Kotlin” course

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store