working with partitioned tables in rails
You might have to deal with a large amount of data, this is where table partitioning comes in handy. But integrating it in a rails project isn't that straightforward as it might seem. Here's my take on it.
I had to deal with PostgreSQL partitioned tables in rails and the documentation for this isn't very optimal. So here's what I found out about it.
Existing Gems
I stumbled upon two gems, pg_partition_manager which has a post about it on the honeybadger blog but that doesn't deal with migration at all.
The second one which seems to have more options and features which will also deal with migrations is pg_party which in theory works but at least with PostgreSQL 13 has issues creating the migration properly due to a primary key issue giving an error like this
ERROR: insufficient columns in PRIMARY KEY constraint definition
PRIMARY KEY constraint on table "my\_table" lacks column "created_at" which is part of the partition key.
This is mentioned in Alexey Soshin's blog post Dealing with partitions in Postgres 11+.
So here is what I ended up with.
My setup with rails 6
I went with the pg_party gem, although so far I didn't really use any features of the gem.
Switch to struct instead of schema
First, you will have to switch from schema.rb to struct.sql by adding this config to config/application.rb
config.active_record.schema_format = :sql
configure pg_party
Create an initializer for pg_party in congin/initializers/pg_party.rb
PgParty.configure do |c|
c.caching_ttl = 60
c.schema_exclude_partitions = false
c.include_subpartitions_in_partition_list = true
# Postgres 11+ users starting fresh may consider the below options to rely on Postgres' native features instead of
# this gem's template tables feature.
c.create_template_tables = false
c.create_with_primary_key = true
end
it's just copied & pasted configuration from the pg_party GitHub page.
Create migrations
I couldn't get the migrations to work the way described in the documentation so I went with custom SQL for the migrations to solve this.
example:
class CreateCdrs < ActiveRecord::Migration[6.1]
def up
execute <<-SQL
CREATE TABLE public.cdrs (
id integer NOT NULL,
hostname character varying(256) DEFAULT 'unknown'::character varying NOT NULL,
start_stamp timestamp(6) with time zone NOT NULL,
created_at timestamp(6) without time zone NOT NULL,
updated_at timestamp(6) without time zone NOT NULL,
primary key (id, start_stamp)
) PARTITION BY RANGE (start_stamp);
end
The primary key (id, start_stamp)
was the part needed to get this migration to work.
I however stumbled upon an issue of failing specs because of the ID key as ActiveRecord complaints that it doesn't support the Composite primary key.
WARNING: Active Record does not support composite primary key.
cdrs has composite primary key. Composite primary key is ignored.
To solve this, just add to the model which the primary key should be.
in app/models/cdr.rb
range_partition_by :start_stamp
self.primary_key = 'id'
and while at it add the pg_party configuration.
Creating partitions
Here you could use features of pg_party as described in their docs
class Log < ApplicationRecord
range_partition_by { '(created_at::date)' }
def self.maintenance
partitions = [Date.today.prev_month, Date.today, Date.today.next_month]
partitions.each do |day|
name = Log.partition_name_for(day)
next if ActiveRecord::Base.connection.table_exists?(name)
Log.create_partition(
name: name,
start_range: day.beginning_of_month,
end_range: day.next_month.beginning_of_month
)
end
end
def self.partition_name_for(day)
"logs_y#{day.year}_m#{day.month}"
end
end
and also deal with truncating old data, but I kind of went the more automated way when creating data and deal with the truncation later (which probably won't ever happen)
What I do is creating a before_validation call back to create the partition if needed.
before_validation :check_or_create_partition
def check_or_create_partition
name = Cdr.partition_name_for(start_stamp)
return if ActiveRecord::Base.connection.table_exists?(name)
Cdr.create_partition(
name: name,
start_range: Cdr.partition_start(start_stamp),
end_range: Cdr.partition_end(start_stamp)
)
end
First I check if the table already exists with ActiveRecord::Base.connection.table_exists?(name)
and if not create it.
It might seem "expensive" to have this in a before_validation but doing some benchmarks seems that it's not quite that bad.
n = 100000
Benchmark.bm do |x|
x.report { for i in 1..n; ActiveRecord::Base.connection.table_exists?(name) ; end }
end
user system total real
10.341165 6.479261 16.820426 ( 39.597046)
That's 40 seconds for 100k checks on my MacBook. I can live with that.
Here is the complete model.
class Cdr < ApplicationRecord
range_partition_by :start_stamp
self.primary_key = 'id'
before_validation :check_or_create_partition
def check_or_create_partition
name = Cdr.partition_name_for(start_stamp)
return if ActiveRecord::Base.connection.table_exists?(name)
Cdr.create_partition(
name: name,
start_range: Cdr.partition_start(start_stamp),
end_range: Cdr.partition_end(start_stamp)
)
end
def self.partition_name_for(day)
"cdrs_y#{day.year}_m#{day.month}_d#{day.day}"
end
def self.partition_start(day)
"#{day.year}-#{day.month}-#{day.day}"
end
def self.partition_end(day)
tomorrow = day.next_day
"#{tomorrow.year}-#{tomorrow.month}-#{tomorrow.day}"
end
end
And this is the complete magic to my partitioning setup. Please be aware that I just now started reading into PostgreSQL partitioning. You have been warned.
You should go ahead and read the Postgresql documentation on partitioning