Postgres Indexing for Rails at Scale

Indexing is one of those topics where Rails developers know just enough to be dangerous. The add_index migration is one line; the Rails guides tell you to index your foreign keys; and so the default mental model becomes “an index is a thing you sprinkle on columns to make them fast.” That model holds until a table crosses a few million rows. Then it breaks, and the breakage is subtle: a query that was instant in staging takes two seconds in production, the index you added does not get used, and the fix is not “add another index” — it is understanding what the planner is actually doing.

This is a tour of the four indexing concepts that pay for themselves on large Rails tables: reading EXPLAIN ANALYZE, composite-index column order, partial indexes, and knowing when to stop indexing and reach for a counter cache instead.

EXPLAIN ANALYZE is the only opinion that counts

Stop guessing whether an index is used. Ask Postgres. EXPLAIN shows the planner’s chosen plan and its cost estimate; EXPLAIN ANALYZE actually runs the query and shows real timings and real row counts. The gap between estimated and actual rows is where most bad plans hide.

From Rails you do not even need to drop to psql — Active Record relations respond to .explain, and on recent Rails you can pass options:

Order.where(status: "pending", merchant_id: 42)
     .order(created_at: :desc)
     .limit(25)
     .explain(:analyze, :buffers)

The thing to look for first is the scan node. Seq Scan on a large table inside a hot query is almost always the smell. You want to see Index Scan or Index Only Scan. But do not stop there — read the row estimates. A plan that says (rows=1) in the estimate but (actual rows=84000) means the planner’s statistics are stale or the predicate is not as selective as it thinks, and it may have chosen a nested loop that is catastrophic at the real row count. The cure for stale stats is ANALYZE the_table; (or letting autovacuum catch up). The cure for a genuinely non-selective predicate is a different index, or a partial one.

Index Only Scan is the prize: it means Postgres answered the query entirely from the index without touching the heap, because every column the query needed was in the index. That is the payoff behind covering indexes, which Postgres supports via INCLUDE.

Composite index column order is not arbitrary

The most common large-table mistake is a multi-column index whose columns are in the wrong order. Postgres can use a composite index for a query as long as the query constrains a left-to-right prefix of the index columns. An index on (merchant_id, status, created_at) can serve a query filtering on merchant_id alone, or merchant_id + status, or all three. It cannot efficiently serve a query that filters only on status — that is the second column, and there is no usable prefix.

The rule of thumb that gets you most of the way there: equality columns first, then the range or sort column last. For the query above — equality on merchant_id and status, ordered by created_at — the index that lets Postgres satisfy the filter and the sort from a single index walk is:

class AddOrdersMerchantStatusCreatedIndex < ActiveRecord::Migration[7.1]
  disable_ddl_transaction!

  def change
    add_index :orders, [:merchant_id, :status, :created_at],
      order: { created_at: :desc },
      algorithm: :concurrently
  end
end

Two production details in that migration. algorithm: :concurrently tells Postgres to build the index without taking the ACCESS EXCLUSIVE lock that a plain CREATE INDEX takes — which on a large, busy table would block writes for the entire build. concurrently requires disable_ddl_transaction! because it cannot run inside a transaction. Forgetting this pair is how an “add an index” migration turns into an outage. Always build indexes concurrently on tables that take live traffic.

The matching order: { created_at: :desc } lets the ORDER BY ... DESC LIMIT 25 read straight off the end of the index — no separate sort node, which is what makes keyset/top-N pagination cheap.

Partial indexes: index only the rows you query

A partial index covers only the rows matching a WHERE clause. This is one of Postgres’s sharpest tools and Rails developers underuse it badly. The classic case: a status column that is wildly skewed. Suppose 98% of your orders are completed and you only ever query for the small set that are pending or failed. A full index on status indexes all those millions of completed rows you never search for — wasted space, wasted write amplification on every insert and update.

add_index :orders, [:merchant_id, :created_at],
  where: "status IN ('pending', 'failed')",
  name: "index_orders_actionable",
  algorithm: :concurrently

This index is a tiny fraction of the size of a full one, and it stays out of the write path for the 98% of rows that are completed. For Postgres to use it, the query’s WHERE clause must logically imply the index predicate — so the Active Record scope needs to actually filter on those statuses:

scope :actionable, -> { where(status: %w[pending failed]) }
# Order.actionable.where(merchant_id: 42).order(:created_at) => uses the partial index

The other great partial-index use is the NOT NULL / soft-delete case: where: "deleted_at IS NULL" indexes only live records. If 30% of a big table is soft-deleted, that is 30% of the index you never needed.

Partial indexes also do double duty as a way to enforce conditional uniqueness — for example, a unique index on (account_id) where: "primary_contact = true" guarantees one primary contact per account at the database level, which no application validation can match for correctness under concurrency.

When indexing is the wrong answer: counter caches

Some queries cannot be indexed away because the cost is in the aggregation, not the lookup. SELECT COUNT(*) FROM comments WHERE post_id = $1 can use an index on post_id to find the rows, but it still has to count them — and on a hot list page rendering 50 posts, that is 50 counts per request, each one scanning a slice of the index. No index makes counting free.

This is what counter caches are for. Rails maintains a denormalized count column on the parent and updates it inside the same transaction as the child insert/delete:

class Comment < ApplicationRecord
  belongs_to :post, counter_cache: true
end
# reads post.comments_count — a column, zero extra queries

You add a comments_count integer column to posts and backfill it once. After that, post.comments_count is a column read, not an aggregate. The trade is real and worth stating plainly: every comment insert and delete now also writes the parent row, which is more write contention on hot parents and a foot-gun if you ever modify children outside Active Record (bulk SQL, delete_all) without keeping the counter in sync. For most read-heavy list views the trade is overwhelmingly worth it. For an extremely write-heavy hot row, consider an approximate count or a periodically-reconciled counter instead, to avoid serializing all those writes on one parent.

A workable process

Indexing well is not memorization; it is a loop:

Find the query. Use pg_stat_statements to rank queries by total time, or your APM’s slow-query view. Optimize the queries that actually cost you, not the ones that feel slow.
Run EXPLAIN ANALYZE with realistic data. A plan on 1,000 staging rows tells you nothing about behavior on 10 million. Test against a production-sized copy.
Design the narrowest index that serves it — right column order, partial where the data is skewed, covering with INCLUDE if an index-only scan is achievable.
Build it CONCURRENTLY and re-run EXPLAIN ANALYZE to confirm the planner actually chose it.
Watch for the index you no longer need. Unused and redundant indexes (a single-column index made redundant by a composite that has it as a prefix) cost write performance for nothing. pg_stat_user_indexes shows scan counts; drop the indexes with zero scans.

The takeaway

The default Rails instinct — index the foreign key and move on — is correct right up until a table gets large, and then it quietly stops being correct. At scale, indexing is about column order matching your access pattern, partial indexes that skip the rows you never query, and recognizing the queries (aggregates) where an index is the wrong tool entirely. The discipline that ties it together is refusing to guess: EXPLAIN ANALYZE on production-sized data is the only authority, and every index you add should be one you have watched the planner choose.

We do Postgres-at-scale work — index audits, query-plan triage, schema design for tables that have outgrown memory — on Rails engagements. If a page got slow as your data grew, that is usually a one-afternoon conversation worth having.