2026-01-27 - 2026-01-27-DDIA_4
|
1 min read
chp 2 cont'd:
- user care about response time, thats what they see
- since throughput is the domain of the system its something they can not see
- throughput what hardware can handle, increasing throughput is "scalability"
- whats in a response time
- network latency from user's machine to our's and from our's to the users
- queuing delays - could be at multiple points in the request call
- service time - our machine actually processing the request
- head-of-line blocking: slow requests grouped together at the start of the queue forcing faster requests to queue longer and have longer response times even tho their service times are short
- jitter: variation in network delays
- average (arithmetic mean): helpful for estimating throughput limits, not good at describing the typical response time
- median/p50: a better way to view the typical user
- tail latencies: p95, p99, p999, just because they aren't the typical users doesn't mean they aren't important to look at, those tend to be heavier users of the system. p9999 are out of scope since there is too much randomness that can't be accounted for that point.
- tail latency amplification: 1 user request has parallel requests to backend services, even if 9 out 10 are fast if 1 is extra slow the entire request will have a long response time
- SLO: service level objectives
- SLA: service level agreements
- open source libraries that can help with plotting and estimating of response times: HdrHistogram, t-digest, OpenHistogram, DDSketch
- also you gotta histogram it