2026-01-27 - 2026-01-27-DDIA_4

chp 2 cont'd:

user care about response time, thats what they see
since throughput is the domain of the system its something they can not see
throughput what hardware can handle, increasing throughput is "scalability"
whats in a response time
1. network latency from user's machine to our's and from our's to the users
2. queuing delays - could be at multiple points in the request call
3. service time - our machine actually processing the request
head-of-line blocking: slow requests grouped together at the start of the queue forcing faster requests to queue longer and have longer response times even tho their service times are short
jitter: variation in network delays
average (arithmetic mean): helpful for estimating throughput limits, not good at describing the typical response time
median/p50: a better way to view the typical user
tail latencies: p95, p99, p999, just because they aren't the typical users doesn't mean they aren't important to look at, those tend to be heavier users of the system. p9999 are out of scope since there is too much randomness that can't be accounted for that point.
- tail latency amplification: 1 user request has parallel requests to backend services, even if 9 out 10 are fast if 1 is extra slow the entire request will have a long response time
SLO: service level objectives
SLA: service level agreements
open source libraries that can help with plotting and estimating of response times: HdrHistogram, t-digest, OpenHistogram, DDSketch
also you gotta histogram it