screentinker/server/lib
ScreenTinker 101f086204 fix(#142): load-aware per-device reconnect throttle (the outage fix)
Gates genuine reconnects PER DEVICE before the heavy register work (DB writes +
playlist build) runs, so a single flapping device can no longer saturate the
event loop and take down the server.

- Actuator is per-device, keyed on device_id (modeled on lastPlayLogAt). A device
  is flagged only when it exceeds reconnectBaseMax genuine reconnects per window.
  Same-socket playlist refreshes (isPlaylistRefresh) are exempt.
- Load-awareness is BANDED (normal/elevated/critical from the step-2 lag signal),
  not a continuous controller. The band only MULTIPLIES an already-flagged
  device's backoff; global lag never gates a healthy device.
- Hysteresis: escalate immediately while storming (tighten fast); decay one level
  per reconnectReleaseMs of calm (release slow).
- HARD CEILING per device, independent of band and warm-up — a slow-ramp attacker
  can't train through it.
- COLD START: for reconnectWarmupMs after boot, force the normal band and apply
  only the hard ceiling, so a full-fleet reconnect after a deploy doesn't throttle
  healthy screens. State is in-memory, resets on restart.
- Observability: every throttle engagement logs device, band, observed vs allowed
  rate, and backoff. Throttled device gets device:throttled + a deferred disconnect.

Tests (api.test.js style):
- unit: healthy-never-throttled, storm-throttled-with-growing-backoff, band
  multiplies backoff, hard-ceiling-even-in-warmup, warm-up leniency, neighbor
  isolation, slow release.
- integration GATE (the required one): full-fleet reconnect right after restart
  throttles NO healthy device; a single device storming IS throttled; a neighbor
  stays unaffected while another storms.
- also fixes pre-existing test PORT collisions (my new integration files clashed
  with totp.test.js:3979 and totp-keyrotation.test.js:3980 -> moved to 3982/3983);
  full suite now green serially AND in parallel.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 19:18:00 -05:00
..
agency-layouts.js feat: agency zone-grant issuance UI + reactive placement card (#73) 2026-06-14 15:12:55 -05:00
agency-targets.js feat: full-screen-only guardrail for agency designations (#73) 2026-06-14 17:36:30 -05:00
branding.js fix(security): patch quick-win findings from the codebase review 2026-06-08 19:02:19 -05:00
command-queue.js feat(socket): delivery queue for offline-device emits 2026-05-14 13:06:43 -05:00
content-ingest.js refactor(content): extract the upload ingest into a shared lib (#73) 2026-06-13 22:48:42 -05:00
device-sanitize.js fix(security): patch quick-win findings from the codebase review 2026-06-08 19:02:19 -05:00
image-gen.js feat(ai): generate background + foreground images for signs (#41 Phase 2) 2026-06-09 13:40:14 -05:00
pair-lockout.js fix(api): harden device pairing against brute-force (#87) 2026-06-12 20:16:12 -05:00
permissions.js feat(roles): add cross-org platform_operator staff role (#13) 2026-06-05 10:30:21 -05:00
reconnect-throttle.js fix(#142): load-aware per-device reconnect throttle (the outage fix) 2026-06-27 19:18:00 -05:00
schedule-eval.js feat(scheduling): per-item schedule blocks (#74 dayparting, #75 auto-expire) 2026-06-11 15:46:41 -05:00
schema-check.js fix(db): observable migrations + fail-fast schema verification (#37) 2026-06-09 09:31:52 -05:00
secretbox.js feat(ai): AI content design in the Designer, BYO endpoint (#41 Phase 1) 2026-06-09 12:23:55 -05:00
socket-rooms.js feat(socket): Phase 2.3 workspace-scoped dashboard socket rooms + per-command permission gates. Dashboard namespace was previously a flat broadcast - every connected dashboard received every device's status/screenshot/playback events platform-wide (foreign device names + IPs included). Inbound socket commands gated by a legacy admin/superadmin role check that was dead code post-Phase-1 rename. 2026-05-12 11:34:24 -05:00
tenancy.js feat(roles): add cross-org platform_operator staff role (#13) 2026-06-05 10:30:21 -05:00
tenant-cascade-migration.js fix(db): cascade tenant resources on workspace/org delete (#18 follow-up) 2026-06-08 16:01:52 -05:00
totp-lockout.js feat(server): TOTP primitives - encrypted secret, hashed recovery codes, verify lockout (#100) 2026-06-13 20:48:55 -05:00
totp.js feat(server): TOTP primitives - encrypted secret, hashed recovery codes, verify lockout (#100) 2026-06-13 20:48:55 -05:00
user-deletion.js feat(admin): Delete Organization + Workspace with cascade (#36) 2026-06-09 09:22:21 -05:00
zone-validate.js fix: per-item mute round-trip + multi-zone orphan-zone fallback & warnings 2026-06-22 23:16:29 -05:00