Continuously samples event-loop delay via perf_hooks.monitorEventLoopDelay()
(C++-backed histogram; cheap). Each window persists mean/p50/p99/max to a new
event_loop_lag table and recomputes a coarse load band (normal/elevated/critical)
from the window p99. Standalone value: current lag is exposed on /api/status and
band changes are logged, so site lag is diagnosable independent of throttling.
The band feeds the #142 reconnect throttle (next commit) but ships first as its
own subsystem.
- event_loop_lag is bounded from day one: indexed on sampled_at + scheduled prune
(LAG_TELEMETRY_RETENTION_DAYS, small default) modeled on the play_logs prune.
Deliberately NOT another unbounded-growth table.
- Band transitions are asymmetric: jump up immediately (tighten fast), release one
level at a time after N calm samples below a deadband (release slow, no flap).
Pure nextBand() function, unit-tested deterministically.
- config: LAG_SAMPLE_INTERVAL_MS, LAG_RESOLUTION_MS, LAG_TELEMETRY_RETENTION_DAYS,
LAG_PRUNE_INTERVAL_MS, LAG_ELEVATED_MS, LAG_CRITICAL_MS, LAG_RELEASE_SAMPLES.
- tests: band-transition unit tests; integration proves sampling persists, stays
bounded under the prune, and surfaces on /api/status.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>