screentinker/server
ScreenTinker dbac699854 fix(#143): content-ack flood control — per-device rate budget + loop-lag valve
#142's content-ack dedup is insufficient: a device cycling 2-4 content IDs makes
every ack look unique so dedup never fires, while aggregate volume from ~30 devices
saturates the event loop (the #142 reconnect throttle kept the server responsive,
which is how this was even observable).

Folded ONE control on the content-ack path (no competing limiters; reconnect-
throttle.js untouched) in lib/content-ack-limiter.js:
- Step 1 — per-device RATE budget: caps TOTAL non-duplicate acks per device per
  window regardless of differing content_id (the case dedup misses). Over budget =
  DROP silently (the per-ack log+emit is the cost); log ONCE per device per window
  when shedding starts. Keeps the #142 dedup (dedup'd repeats don't consume budget).
  Per-device, in-memory, resets on restart (modeled on lastPlayLogAt; does NOT reuse
  reconnect-throttle's ban-semantics bucket).
  Env (TUNING GUESSES, validate vs Bold's fleet): CONTENT_ACK_MAX_PER_WINDOW=20,
  CONTENT_ACK_RATE_WINDOW_MS=10000 (=2/s, above legit ~<=1/s, below the flood).
- Step 2 — global pressure valve: reuses the #142 loop-lag band (+ its hysteresis,
  no second control loop). Under CRITICAL band, shed content-acks even for an
  in-budget device; reconnects + dashboard/HTTP are ALWAYS processed; a healthy
  device in a non-critical band is never touched by the valve. Valve open/close
  logged once at the band edge in services/loop-lag.js (not per shed message).

Tests (unique ports 3985/3986, not the 3982/3983/3984 set):
- unit: the #143 regression (cycling ids evading dedup IS rate-limited), under/over
  budget, dedup still works + doesn't consume budget, valve sheds in-budget under
  critical while normal is untouched, rate precedence, window reset, per-device
  isolation.
- integration: socket flood is capped to budget with a single shed-start log;
  under-budget passes every ack; valve OPEN sheds content-acks while a reconnect +
  /api/status still succeed.
Full suite green serial AND parallel (208 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 22:21:57 -05:00
..
config PiP overlay MVP: push image/web overlays to a device or group (#109) (#127) 2026-06-18 14:54:44 -05:00
db fix(#142): global device_status_log retention sweep + STATUS_LOG_RETENTION_DAYS 2026-06-27 19:34:19 -05:00
lib fix(#143): content-ack flood control — per-device rate budget + loop-lag valve 2026-06-27 22:21:57 -05:00
middleware feat(api): per-agency-token auto-publish (#73) 2026-06-14 13:48:17 -05:00
player fix(player): composite multi-zone layouts in screenshot/stream capture 2026-06-22 23:22:12 -05:00
routes feat(#142): event-loop lag telemetry (perf_hooks) + bounded storage 2026-06-27 19:01:08 -05:00
scripts feat(scheduling): per-item schedule blocks (#74 dayparting, #75 auto-expire) 2026-06-11 15:46:41 -05:00
services fix(#143): content-ack flood control — per-device rate budget + loop-lag valve 2026-06-27 22:21:57 -05:00
test fix(#143): content-ack flood control — per-device rate budget + loop-lag valve 2026-06-27 22:21:57 -05:00
ws fix(#143): content-ack flood control — per-device rate budget + loop-lag valve 2026-06-27 22:21:57 -05:00
.gitignore feat(email): Microsoft Graph send + alert spam protection + preferences UI 2026-05-12 18:16:40 -05:00
config.js fix(#143): content-ack flood control — per-device rate budget + loop-lag valve 2026-06-27 22:21:57 -05:00
package-lock.json chore(release): v1.9.2-beta1 2026-06-27 19:59:34 -05:00
package.json chore(release): v1.9.2-beta1 2026-06-27 19:59:34 -05:00
server.js feat(#142): event-loop lag telemetry (perf_hooks) + bounded storage 2026-06-27 19:01:08 -05:00
version.js chore(version): single-source VERSION, env-configurable data paths, bump tooling 2026-06-10 12:56:03 -05:00