PostgreSQL 10 was released at the start of October. With it came the official Table Partitioning support. But if you’re still stuck with PostgreSQL 9.6 or below, you can have partitions “the old way”.
During my career I had to implement partitioned solutions over Postgres a number of times. Main problem with examples I’ve found was that they covered only the basic cases. Usually, the examples speak about partitioning by a single column, and maybe even a static set of values. In this article, I’ll cover a much more complex case, dynamically partitioning by two columns, while also dealing with problems introduced by timezones.
Our table looks something like this:
This table may contain millions of rows for each tenant each month. So, we would like to partition it, to improve query performance.
All of the queries must specify
TENANT_ID . Most of them also supply range by
CREATED_AT. Those two columns are an excellent candidates for partitioning. Let’s do it now.
The way partitions are implemented in Postgres below 10 is by table inheritance and checked conditions.
Child tables have the same structure as their parent table. When parent table is queried, all of its children are queried too. Checked conditions provide Postgres engine with means to query only specific children of the parent table, effectively implementing partitions.
What we’ll first do is create a function that will delegate the insert into parent table to one of the child tables:
TG_TABLE_NAME is one of many special variables that are set by Postgres before calling your trigger. It holds the name of the table trigger was invoked on. Very useful in case you specify same trigger on more than one table.
PERFORM calls a function that doesn’t return anything. You won’t see it often, since most of the examples just do everything in the same function. But if you like to keep your code clean, it’s definitely useful.
$1.* will explode all the specified values.
USING NEW binds values from the original insert statement to our new statement.
This will create the partition if it doesn’t exist yet:
CHECK to specify which rows will end up in this table, and use
INHERITS to bind this child table to its parent.
RAISE NOTICE will print results into the console.
Partitions will be created without indexes. We should create them manually:
You don’t need indexes on the parent table, since we’ll keep it empty. But it’s very important to have indexes on child tables.
What’s left is calling the procedure by introducing a trigger:
Dealing with timezones and UTC introduces a problem. Take special note of how checked conditions on
CREATED_AT and indexes are defined. If you don’t handle timezones correctly, your application will break horribly. As it happened to my application a few times.
If you’re using some kind of ORM framework, like ActiveRecord in Ruby, together with partitioning, you may get exceptions or null values. The reason is most ORMs rely on PostgreSQL
RETURNING statement. This is a neat trick to avoid selecting the row you just created. But with partitions, this may cause some trouble.
If you don’t use ORM, you can simply return
NULL instead of
NEW from your method. Otherwise, you’ll have to provide another trigger, that will delete the duplicate from the parent table, keeping it empty.
You can use
TG_TABLE_NAME trick from before, in case you don’t want to hardcode the table name.
Let’s populate the table using the following script:
It randomly creates 10K entries for each tenant for this year. Run it until you reach desired DB size.
That’s it. What’s left to see is that optimizer indeed scans correct partitions:
explain select *
where tenant_id = 'tenant_a'
and created_at AT TIME ZONE 'UTC' > '20171102'
and created_at AT TIME ZONE 'UTC' <= '20171103';
Note that you must use
AT TIME ZONE in order for your partitions and indexes to work correctly. I omitted using it on the parameters only for the brevity of the example.
Success! Although we have thousands of rows for each tenant, we hit only a small portion of them, thanks to proper partitioning.
Hope you enjoyed this read, and learned a few PostgreSQL tricks.
PostgreSQL: Documentation: 9.6: Partitioning
Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed…