Skip to main content
New: Deck Doctor. Upload your deck, get CPO-level feedback. 7-day free trial.
TemplateFREE⏱️ 25 minutes

Database Optimization Template for PMs

Optimize database performance with query analysis, indexing strategy, connection pool tuning, and capacity projections.

By Tim Adair• Last updated 2026-03-05
Database Optimization Template for PMs preview

Database Optimization Template for PMs

Free Database Optimization Template for PMs — open and start using immediately

or use email

Instant access. No spam.

Need a custom version?

Forge AI generates PM documents customized to your product, team, and goals. Get a draft in seconds, then refine with AI chat.

Generate with Forge AI

What This Template Is For

The database is the performance bottleneck in most web applications. A single missing index can turn a 5ms query into a 3-second full table scan. An N+1 query pattern hidden inside an ORM can generate 500 database round trips where 1 would suffice. A connection pool sized too small starves the application of database access during traffic spikes.

These problems compound. As data grows, unoptimized queries degrade logarithmically or worse. A query that ran in 20ms with 100K rows runs in 800ms with 10M rows if it is doing a sequential scan instead of using an index. By the time users complain, the fix requires understanding query plans, index selectivity, and connection pool behavior under load.

This template provides a structured approach to database optimization. It covers slow query identification, indexing strategy, connection pool tuning, and capacity projections. The goal is to move database performance from reactive firefighting to proactive management.

Use this alongside the API Performance Template to connect database improvements to user-facing latency. The Monitoring and Alerting Template covers alerting on database-specific metrics. For schema-level changes, the Database Migration Template provides a safe execution plan. The Technical PM Handbook explains how PMs should evaluate the product impact of database performance investments. See also the glossary entry on technical debt for context on why unoptimized databases accumulate cost over time.


When to Use This Template

  • User-facing latency has increased and the root cause traces to database queries
  • A specific query or set of queries consistently appear in slow query logs
  • You are approaching a data milestone (10M rows, 100M rows) and need to verify query performance scales
  • Cloud database costs are increasing and you suspect query inefficiency is a factor
  • A new feature will add significant read or write load to existing tables
  • Connection pool errors (timeouts, exhaustion) are occurring during peak traffic

How to Use This Template

  1. Start with the Slow Query Audit. Pull the top offenders from your slow query log and analyze each one.
  2. Build the Indexing Strategy. For each slow query, determine whether an index would help and what columns to index.
  3. Tune the Connection Pool. Review current settings against actual usage patterns.
  4. Estimate Capacity. Project when your current database configuration will no longer support traffic growth.
  5. Prioritize and schedule. Not every optimization is equally impactful. Rank by user-facing latency improvement.
  6. Measure after each change. Compare before/after for every optimization. If the numbers do not improve, revert.

The Template

Database Overview

FieldDetails
Engine[Postgres 16 / MySQL 8 / etc.]
Instance Type[e.g., db.r6g.xlarge / 4 vCPU, 32GB RAM]
Storage[X GB used / Y GB provisioned, GP3/IO1]
Read Replicas[X replicas, regions]
Largest Tables[List top 5 by row count]
Total Connections[Max configured / Avg in use / Peak in use]
Slow Query Threshold[X ms]
Backup Schedule[Daily automated, X day retention]

Slow Query Audit

QueryAvg DurationCalls/DayTotal Time/DayTable(s)Scan TypeIndex Used?
[Query description or fingerprint][X ms][X][X min][tables][Seq Scan / Index Scan / Bitmap][Yes: which / No]

How to find slow queries.

  • Postgres: pg_stat_statements extension (tracks all queries with timing), or log_min_duration_statement in config.
  • MySQL: slow_query_log with long_query_time threshold.
  • Application APM: Datadog, New Relic, or similar will attribute latency to specific queries.

Query Analysis

For each slow query, document the following:

Query #1: [Description]

-- Original query
[Paste the full SQL]

EXPLAIN ANALYZE output.

[Paste the EXPLAIN ANALYZE output]

Analysis.

  • Scan type. [Sequential scan / Index scan / Index-only scan / Bitmap heap scan]
  • Rows estimated vs actual. [Estimated X, Actual Y. If >10x off, statistics may be stale.]
  • Bottleneck. [Where is time being spent? Sort? Filter? Join? Disk I/O?]
  • Proposed fix. [Add index / Rewrite query / Add caching / Partition table]
  • Estimated improvement. [X ms to Y ms]

Indexing Strategy

TableProposed IndexColumnsTypeQueries BenefitingEst. SizeWrite Impact
[table][idx_name][col1, col2][B-tree / GIN / GiST / Hash][List queries][X MB][X% write slowdown]

Index design guidelines.

  • Composite indexes. Order columns by selectivity (most selective first) and by query filter order. An index on (user_id, created_at) serves queries filtering on user_id alone, but not queries filtering on created_at alone.
  • Partial indexes. If a query always filters on a condition (e.g., WHERE deleted_at IS NULL), a partial index is smaller and faster.
  • Covering indexes (INCLUDE). Add non-indexed columns with INCLUDE to enable index-only scans (Postgres 11+). Avoids heap lookups.
  • Index size. Indexes consume disk and memory. A 500MB index on a table with 100M rows is normal. A 5GB index suggests over-indexing.
  • Write impact. Every index slows down INSERT, UPDATE, and DELETE operations. Add indexes selectively on write-heavy tables.

Connection Pool Configuration

SettingCurrentRecommendedRationale
Pool size (per app instance)[X][X][Rule: pool_size = (num_cores * 2) + effective_spindle_count]
Max overflow[X][X][Burst connections beyond pool size]
Connection timeout[X seconds][X seconds][How long a request waits for a connection]
Idle timeout[X seconds][X seconds][How long an idle connection stays open]
Max lifetime[X minutes][X minutes][Force recycle to handle failover]
Validation query[None / SELECT 1][SELECT 1][Detect stale connections after failover]

Total connections check. [App instances] x [Pool size + Max overflow] = [Total max connections]. This must be less than the database's max_connections setting, with headroom for admin connections and monitoring.

Capacity Projections

MetricCurrent3 Months6 Months12 MonthsAction Threshold
Largest table row count[X M][X M][X M][X M][Table partitioning at X M rows]
Total database size[X GB][X GB][X GB][X GB][Storage upgrade at X GB]
Peak connections[X][X][X][X][Connection pooler at X connections]
Avg query latency[X ms][X ms][X ms][X ms][Optimization sprint at X ms]
IOPS utilization[X%][X%][X%][X%][Instance upgrade at 80%]

Optimization Backlog

Query/IssueImpact (latency saved)EffortPriorityStatus
[Description][p95 -Xms, Y queries/day][X days][P0/P1/P2][Planned / In Progress / Done]

Filled Example: Postgres-Backed SaaS Application

Database Overview

FieldDetails
EnginePostgres 16.1
InstanceAWS RDS db.r6g.xlarge (4 vCPU, 32GB RAM)
Storage180 GB used / 500 GB GP3, 3000 IOPS
Read Replicas1 replica (us-east-1b), used for analytics queries
Largest Tablesevents (85M), orders (12M), users (800K), line_items (45M), audit_log (120M)
ConnectionsMax 200 / Avg 45 / Peak 120
Slow Query Threshold100ms

Slow Query Audit

QueryAvgCalls/DayTotal/DayTableScanIndex?
Dashboard event count by user840ms12,000168 mineventsSeq ScanNo
Order search by date range + status320ms8,50045 minorders, line_itemsBitmap HeapPartial (orders only)
User lookup by email2ms95,0003.2 minusersIndex ScanYes (no action)
Audit log recent by entity1,200ms3,00060 minaudit_logSeq ScanNo

Query Analysis: Dashboard Event Count

SELECT date_trunc('day', created_at) as day, COUNT(*)
FROM events
WHERE user_id = $1 AND created_at > NOW() - INTERVAL '30 days'
GROUP BY day ORDER BY day;

EXPLAIN ANALYZE. Sequential scan on events (85M rows). Estimated 85M rows, actual 85M. Filter removes 99.97% of rows. Sort on day.

Bottleneck. No index on (user_id, created_at). The planner scans the entire 85M row table to find ~25K matching rows.

Fix. Add composite index CREATE INDEX CONCURRENTLY idx_events_user_created ON events (user_id, created_at).

Estimated improvement. 840ms to 8ms (index scan on ~25K rows instead of seq scan on 85M).

Indexing Strategy

TableIndexColumnsTypeQueriesEst. SizeWrite Impact
eventsidx_events_user_created(user_id, created_at)B-treeDashboard event count1.8 GB+3% on event inserts
ordersidx_orders_status_created(status, created_at) INCLUDE (total, user_id)B-treeOrder search280 MB+1% on order writes
audit_logidx_audit_entity(entity_type, entity_id, created_at DESC)B-treeAudit log recent2.4 GB+2% on audit inserts

Connection Pool (Current Issues)

The application runs 8 instances with pool_size=25 each. 8 x 25 = 200, which equals max_connections. No headroom. During deploys (rolling restart), new instances open connections before old ones close, causing connection exhaustion errors.

Fix. Reduce pool_size to 20 per instance (160 total). Add PgBouncer as a connection pooler in front of RDS with max 180 server connections and 1,000 client connections.


Key Takeaways

  • Fix the top 3 slow queries first. In most applications, 3-5 queries account for 80% of total database time. Use pg_stat_statements (Postgres) or equivalent to find them.
  • Add indexes based on EXPLAIN ANALYZE, not guesswork. An index that the query planner does not use wastes disk and slows writes. Always verify with EXPLAIN ANALYZE after adding an index.
  • Size your connection pool deliberately. Too small and requests queue for connections. Too large and the database spends time context-switching between connections. The formula (cores * 2) + spindles is a reasonable starting point.
  • Project capacity before you hit limits. If your largest table doubles every 6 months, you need to plan for partitioning or archiving before query performance degrades.
  • Measure before and after every optimization. Without before/after data, you cannot verify that the change helped and you cannot prioritize future optimizations.

Frequently Asked Questions

How do I find the most expensive queries in Postgres?+
Enable the `pg_stat_statements` extension and query it: `SELECT query, mean_exec_time, calls, total_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20`. This shows the queries that consume the most total database time (frequency x duration). A query that runs in 5ms but executes 1M times per day may cost more total time than a query that runs in 2 seconds but executes 100 times per day.
When should I add a read replica vs optimize queries?+
Optimize queries first. A read replica adds complexity (replication lag, connection routing, consistency concerns) and cost. If the slow queries are due to missing indexes or inefficient SQL, fixing them is cheaper and more effective. Add a read replica when you have already optimized queries and the primary cannot handle the read volume, or when you need to isolate analytics queries from transactional workload.
How do I know if an index is being used?+
In Postgres: `SELECT indexrelname, idx_scan, idx_tup_read FROM pg_stat_user_indexes WHERE schemaname = 'public' ORDER BY idx_scan ASC`. Indexes with zero scans over a meaningful period are unused and can be dropped to save disk and write overhead. Check over at least 30 days of production traffic before dropping.
What is the right connection pool size?+
Start with `(CPU cores * 2) + number of disks` per application instance. For a 4-core database server and SSD storage, that is roughly 9 connections per app instance. Multiply by the number of app instances and ensure the total is below the database's `max_connections` with 20% headroom. If you need more client connections than the database can handle, add a connection pooler like PgBouncer.
Should PMs track database performance metrics?+
PMs should track the user-facing impact, not the database metrics directly. If the engineering team reports that "the events query went from 840ms to 8ms," the PM should ask how that translates to dashboard load time for users. Include database optimization work in sprint capacity planning and celebrate it as user experience investment, not just infrastructure maintenance.

Explore More Templates

Browse our full library of PM templates, or generate a custom version with AI.

Free PDF

Like This Template?

Subscribe to get new templates, frameworks, and PM strategies delivered to your inbox.

or use email

Join 10,000+ product leaders. Instant PDF download.

Want full SaaS idea playbooks with market research?

Explore Ideas Pro →