working with partitioned tables in rails

You might have to deal with a large amount of data, this is where table partitioning comes in handy. But integrating it in a rails project isn't that straightforward as it might seem. Here's my take on it.

working with partitioned tables in rails

I had to deal with PostgreSQL partitioned tables in rails and the documentation for this isn't very optimal. So here's what I found out about it.

Existing Gems

I stumbled upon two gems, pg_partition_manager which has a post about it on the honeybadger blog but that doesn't deal with migration at all.
The second one which seems to have more options and features which will also deal with migrations is pg_party which in theory works but at least with PostgreSQL 13 has issues creating the migration properly due to a primary key issue giving an error like this

ERROR: insufficient columns in PRIMARY KEY constraint definition  
PRIMARY KEY constraint on table "my\_table" lacks column "created_at" which is part of the partition key.

This is mentioned in Alexey Soshin's blog post Dealing with partitions in Postgres 11+.
So here is what I ended up with.

My setup with rails 6

I went with the pg_party gem, although so far I didn't really use any features of the gem.

Switch to struct instead of schema

First, you will have to switch from schema.rb to struct.sql by adding this config to config/application.rb

config.active_record.schema_format = :sql

configure pg_party

Create an initializer for pg_party in congin/initializers/pg_party.rb

PgParty.configure do |c|
  c.caching_ttl = 60
  c.schema_exclude_partitions = false
  c.include_subpartitions_in_partition_list = true
  # Postgres 11+ users starting fresh may consider the below options to rely on Postgres' native features instead of
  # this gem's template tables feature.
  c.create_template_tables = false
  c.create_with_primary_key = true
end

it's just copied & pasted configuration from the pg_party GitHub page.

Create migrations

I couldn't get the migrations to work the way described in the documentation so I went with custom SQL for the migrations to solve this.

example:

class CreateCdrs < ActiveRecord::Migration[6.1]
  def up
  execute <<-SQL
      CREATE TABLE public.cdrs (
          id integer NOT NULL,
          hostname character varying(256) DEFAULT 'unknown'::character varying NOT NULL,
          start_stamp timestamp(6) with time zone NOT NULL,
          created_at timestamp(6) without time zone NOT NULL,
          updated_at timestamp(6) without time zone NOT NULL,
          primary key (id, start_stamp)
      ) PARTITION BY RANGE (start_stamp);
  end

The primary key (id, start_stamp) was the part needed to get this migration to work.
I however stumbled upon an issue of failing specs because of the ID key as ActiveRecord complaints that it doesn't support the Composite primary key.

WARNING: Active Record does not support composite primary key.

cdrs has composite primary key. Composite primary key is ignored.

To solve this, just add to the model which the primary key should be.

in app/models/cdr.rb

  range_partition_by :start_stamp
  self.primary_key = 'id'

and while at it add the pg_party configuration.

Creating partitions

Here you could use features of pg_party as described in their docs

class Log < ApplicationRecord
  range_partition_by { '(created_at::date)' }

  def self.maintenance
    partitions = [Date.today.prev_month, Date.today, Date.today.next_month]

    partitions.each do |day|
      name = Log.partition_name_for(day)
      next if ActiveRecord::Base.connection.table_exists?(name)
      Log.create_partition(
        name: name,
        start_range: day.beginning_of_month,
        end_range: day.next_month.beginning_of_month
      )
    end
  end

  def self.partition_name_for(day)
    "logs_y#{day.year}_m#{day.month}"
  end
end

and also deal with truncating old data, but I kind of went the more automated way when creating data and deal with the truncation later (which probably won't ever happen)
What I do is creating a before_validation call back to create the partition if needed.

  before_validation :check_or_create_partition
  
  def check_or_create_partition
    name = Cdr.partition_name_for(start_stamp)
    return if ActiveRecord::Base.connection.table_exists?(name)

    Cdr.create_partition(
      name: name,
      start_range: Cdr.partition_start(start_stamp),
      end_range: Cdr.partition_end(start_stamp)
    )
  end

First I check if the table already exists with ActiveRecord::Base.connection.table_exists?(name) and if not create it.
It might seem "expensive" to have this in a before_validation but doing some benchmarks seems that it's not quite that bad.

n = 100000
Benchmark.bm do |x|
  x.report { for i in 1..n; ActiveRecord::Base.connection.table_exists?(name) ; end }
end
 
       user     system      total        real
  10.341165   6.479261  16.820426 ( 39.597046)

That's 40 seconds for 100k checks on my MacBook. I can live with that.
Here is the complete model.

class Cdr < ApplicationRecord
  range_partition_by :start_stamp
  self.primary_key = 'id'

  before_validation :check_or_create_partition

  def check_or_create_partition
    name = Cdr.partition_name_for(start_stamp)
    return if ActiveRecord::Base.connection.table_exists?(name)

    Cdr.create_partition(
      name: name,
      start_range: Cdr.partition_start(start_stamp),
      end_range: Cdr.partition_end(start_stamp)
    )
  end

  def self.partition_name_for(day)
    "cdrs_y#{day.year}_m#{day.month}_d#{day.day}"
  end

  def self.partition_start(day)
    "#{day.year}-#{day.month}-#{day.day}"
  end

  def self.partition_end(day)
    tomorrow = day.next_day
    "#{tomorrow.year}-#{tomorrow.month}-#{tomorrow.day}"
  end
end

And this is the complete magic to my partitioning setup. Please be aware that I just now started reading into PostgreSQL partitioning. You have been warned.

You should go ahead and read the Postgresql documentation on partitioning