API Response Time Monitoring Guide

Response time is the metric your users feel before they can articulate it. A 200ms API response is seamless. An 800ms response introduces noticeable lag in the UI. A 3-second response makes users wonder if something broke. By the time your API is responding in 6 seconds, you are losing users — even if every request technically succeeds.

Uptime monitoring tells you when your API is completely unavailable. Response time monitoring tells you when your API is degrading — which often precedes outages, and causes user-visible problems even when it does not.

This guide covers how to measure API response times properly, how to set thresholds that mean something, and how to catch degradation before it turns into downtime.

What Response Time Actually Measures

When a monitoring tool reports a response time of 250ms, that number covers the time from when the HTTP request was sent to when the full response was received. It encompasses several sub-stages:

DNS lookup: Time to resolve the domain name to an IP address
TCP connection: Time to establish the TCP connection (3-way handshake)
TLS handshake: Time to negotiate the SSL/TLS connection (for HTTPS)
Time to first byte (TTFB): Time from sending the request to receiving the first byte of the response
Content download: Time to receive the complete response body

For small API responses, content download is negligible. TTFB — the time from when your server receives the request to when it starts sending a response — is the most meaningful indicator of server-side performance.

A monitoring tool that only reports total response time is useful. A monitoring tool that reports per-stage timing is much more useful for diagnosing why response time is elevated.

Setting Response Time Thresholds

The most common mistake in response time monitoring is setting arbitrary thresholds — "alert if response time exceeds 1000ms" — without grounding them in actual behavior.

An endpoint that typically responds in 800ms should alert at a different threshold than an endpoint that typically responds in 50ms. A flat 1000ms threshold is too noisy for the first and too lenient for the second.

Start with a Baseline

Before setting thresholds, run monitoring for 7–14 days and observe the actual response time distribution for each endpoint. Look at:

p50 (median): What does a typical response look like?
p95: What does a response look like on a bad-but-not-catastrophic day?
p99: What does a response look like at the worst end of normal?
Max: What is the absolute worst you have seen under normal conditions?

Your alert threshold should be set above p99 but significantly below the point where user experience degrades meaningfully. A reasonable starting heuristic: alert at 3–4x your p95 response time.

If your p95 response time is 180ms, an alert threshold of 600–700ms gives you buffer for normal variance while catching genuine degradation early.

Adjust Per Endpoint Category

Different endpoint types have different reasonable response time expectations:

Endpoint type	Typical p95 range	Starting threshold
Simple reads (cached)	20–80ms	300ms
Simple reads (database)	50–200ms	600ms
Complex queries	200–800ms	2000ms
File uploads	500–3000ms	8000ms
Background job dispatch	20–100ms	400ms
Third-party API calls	200–1500ms	5000ms

These are starting points, not absolutes. Calibrate to your actual data.

Types of Response Time Problems

Understanding the categories of response time degradation helps you diagnose faster when thresholds fire.

Gradual Degradation

Response time trends upward over days or weeks. This is often a capacity or growth problem: the database is processing more data, the server is handling more connections, memory pressure is increasing.

Gradual degradation is easy to miss if you only look at whether checks are failing. It is visible in response time trend graphs. When your p95 response time has doubled over 3 weeks, that is worth investigating — even if nothing has technically "failed."

Common causes: Unindexed queries that grow slower as the table grows, accumulated technical debt in code paths, gradual memory or connection leaks, steadily increasing traffic without corresponding infrastructure scaling.

Sudden Degradation

Response time spikes sharply at a specific point in time. This is almost always correlated with an event: a deployment, a traffic spike, a third-party service slowdown, a cron job that runs at a predictable time.

Common causes: A slow database migration running in the background, a new code path with an N+1 query problem, a scheduled job competing for database connections at peak time, a downstream dependency that started performing poorly.

Time-of-Day Patterns

Response time is consistently elevated at predictable times — during business hours, during your heaviest traffic period, or at the exact time a scheduled job runs.

This is a capacity problem: your infrastructure handles normal load fine but struggles under peak conditions. The pattern in response time monitoring data makes it obvious.

Regional Asymmetry

Response time checks from US East are fine, but checks from US West are consistently 400ms slower. This points to network routing, CDN configuration, or geographic distribution of your infrastructure rather than application performance.

Multi-region response time monitoring makes this category of problem immediately visible.

What Slow Response Time Usually Means

When response time degrades, there is a root cause somewhere in the request chain. The per-stage timing data narrows it down quickly:

High DNS time: DNS resolution is taking longer than expected. Possible causes: your DNS provider is slow, DNS propagation issues, or the monitoring location is using a poorly performing DNS resolver. Usually not something you control directly, but worth noting if it persists.

High TCP connection time: Network latency between the monitoring location and your server is high. Check whether server load or network congestion is contributing.

High TLS handshake time: SSL/TLS negotiation is slow. This can indicate CPU pressure on the server (TLS is CPU-intensive), an older TLS configuration that negotiates more slowly, or certificate chain issues.

High TTFB: Your server received the request and is taking a long time to start responding. This is usually the most actionable metric — it corresponds to time spent in your application code, database queries, external API calls, or memory pressure. This is where to focus debugging effort for application performance issues.

High content download: The response body is large and taking time to transmit. Less common for APIs with small JSON payloads, but significant for file download endpoints or endpoints returning large datasets.

Monitoring Response Time Percentiles, Not Just Averages

Average response time is misleading. A set of response times like [100ms, 90ms, 110ms, 95ms, 3000ms] has an average of 679ms — but the experience of 80% of users is under 110ms. The average is dominated by a single outlier.

Percentiles tell a more accurate story: p50=100ms, p95=~3000ms means most requests are fast but some are very slow.

Most monitoring tools report average response time, which is a reasonable quick reference. For production API performance work, you want p95 and p99 tracked over time. Some monitoring tools report these; if yours does not, supplementing with application performance monitoring (Sentry, New Relic, or OpenTelemetry) gives you the full picture.

Response Time and Your SLA

If your API has a public or contractual SLA (Service Level Agreement) that includes response time commitments, your monitoring thresholds should align with those commitments.

A common SLA pattern: "99.9% of requests will complete in under 500ms." This means your monitoring should track p99.9 response time against a 500ms threshold, and your SLA compliance reporting should come from that data.

If you do not have a formal SLA, it is still worth defining an informal internal standard: "We consider this API healthy when p95 response time is under 300ms." Having a defined standard makes threshold-setting straightforward and incident classification unambiguous.

Alerting on Response Time Degradation

Response time alerts should not behave identically to availability alerts. A 500ms response time does not require the same urgency as a 503 error.

A practical tiered alert model for response time:

Warning (no page, Slack notification): Response time exceeds 2x normal for 3+ consecutive checks. Investigate during business hours.

Alert (Slack @channel): Response time exceeds 4x normal for 5+ consecutive checks. Investigate within the hour.

Critical (PagerDuty/phone): Response time is so high that the endpoint is functionally unusable (approaching or exceeding timeout), sustained for multiple checks.

This tiering prevents response time issues from causing unnecessary 3am pages while ensuring persistent degradation gets timely attention.

Tools That Track Response Time Properly

Not all monitoring tools track and surface response time equally well. When evaluating tools, look for:

Per-stage timing (DNS, TCP, TLS, TTFB, download) rather than just total response time
Historical response time graphs showing trends over time
Response time as an alert condition (configurable threshold, not just a dashboard metric)
Per-region response time so geographic asymmetries are visible
Response time data included in alert notifications — the alert should tell you what the response time was, not just that a threshold was crossed

PulseAPI captures per-stage timing on every check and includes response time data in incident alerts, so you have context the moment an alert fires rather than having to log into a dashboard to find it.

Getting Started with Response Time Thresholds

If you are starting from scratch, here is a practical approach:

Add your critical endpoints to your monitoring tool with generous initial thresholds (5 seconds)
Let monitoring run for 7–14 days to establish baselines
Review the data: what is the p95 response time for each endpoint?
Set thresholds at 3–4x p95 for each endpoint
Review and adjust after 30 days based on what you observe

This approach grounds your thresholds in real data rather than arbitrary numbers, which means fewer false positives and more meaningful alerts when they do fire.

PulseAPI tracks response time with per-stage timing from multiple regions and includes response time data in every incident alert. Start monitoring free →

API Response Time Monitoring: What to Measure and When to Alert