Rare Database Entries and Partial Indexes
Today I discovered that PostgreSQL supports partial indexes and I’ll share my scenario for when a partial index has a number of space/speed advantages over a traditional index.
My application consists of many grid-like maps onto which users can place objects. Some objects don’t have to be placed right away; these appear in a staging area, which will become important later. The map creation process is collaborative and one user might make a change, save it and hand it off to another user to complete. Eventually when all the changes have been made, we attempt to finalize the modifications. Part of this process is to ensure that there are no objects left in the staging areas.
The staging area happens to have a special coordinate (-1, -1), so my query becomes:
SELECT object_pk FROM objects WHERE col=-1 AND row=-1;
It’s a simple query and there will typically only ever be a handful of results, but even so it has trouble scaling when there are hundreds of millions of objects in the objects table. We could try adding indexes to the “col” and “row” columns:
CREATE INDEX staging_col_idx ON objects USING btree(col);
CREATE INDEX staging_row_idx ON objects USING btree(row);
But since we only care about one specific lookup (-1, -1), nearly all of the index entries would be completely useless. Preliminary tests showed that it would result in several GBs of unnecessary index records and might slow down inserts/updates too.
This is where the partial index comes in. Partial indexes can be very useful in some specialized scenarios. They provide a way to avoid indexing common “uninteresting” values, thereby reducing the size of the index and speeding up queries that do use the index. Our partial index looks like this:
CREATE INDEX staging_idx ON objects USING btree(col, row)
WHERE col=-1 AND row=-1;
Only entries that satisfy the conditional will appear in the resulting index. I still get the same blazing fast lookups, but now my index will be a tiny fraction of the size and won’t need to be updated for the vast majority of insert or update operations on the table.