screentinker/server/lib
ScreenTinker 289d6b6f95 fix(#144): OTA update-check circuit-breaker + phantom guard + per-device keying
/api/update/check offered the update whenever client !== latest (raw string
inequality, not semver) with no backoff. A device that can't APPLY the update
(broken OTA client 1.7.12, signing/Fire OS) keeps reporting the same version and is
told update_available=true on every poll; a fast poll loop saturates the event loop
(prod loop-lag 49s). All requests share one NAT IP, so IP-keying is useless.

server-only breaker (lib/ota-breaker.js), two independent axes:
- RATE breaker (primary, immediate): a key checking >THRESHOLD (3) times within
  WINDOW (60s) is looping -> throttle update_available with exponential backoff
  (30s->2m->8m->cap 30m). Healthy devices poll ~12 min and never approach this, so
  rollout/stragglers are inherently safe -- NO grace-for-flood timer; slow == safe.
- PHANTOM guard (immediate): unrecognized version, or a prerelease of an OLDER core
  (superseded old-minor beta e.g. 1.9.1-beta4), gets no-offer on the first check. A
  RECENT real older version (beta3 vs latest beta4; stable 1.7.12) stays offerable.
- Never offers a downgrade (client >= latest -> no offer).

KEYING (#144 option 3): keyed on device_id when present, else reported version.
- server.js:581 accepts + logs ?device_id=, passes it to the breaker.
- UpdateChecker.kt:122 appends &device_id=<config.deviceId> (existing registered id;
  omitted until provisioned). One-line client change.
beta4+ clients get precise per-device throttling; stuck legacy clients sending only
?version= are caught by the version-keyed + rate + phantom logic. Response gains
additive `reason` + `retry_after_seconds` (old clients ignore).

BOUNDED STATE: a periodic sweep (startSweep, wired in server.js) evicts buckets idle
> IDLE_RESET_MS so the keyed Map can't grow unbounded (churned device_ids); not
reset-on-access only.

SCOPE (deliberate): this targets the FAST flood + phantoms. The slow #144 drip
(stable 1.7.12 polling ~every 12 min, ~20/hr) stays below >3/60s and is NOT
throttled -- catching it needs #144 option-3 "skip-this-version after N cycles",
which is intentionally NOT in this build.

NOTE: carries a CLIENT/APK change -> versionCode must increment at the beta4 bump and
the release keystore is required for the APK. The device_id path only helps devices
that can install beta4+; the stuck legacy fleet is covered by the version-keyed path.

Tests: unit (lib/ota-breaker, injected time) a-f + comparator + escalation + sweep +
slow-drip-scope; HTTP integration (real endpoint, device_id passthrough). Full suite
green serial AND parallel (234). OTA-only delta -- reconnect/reclaim/shed/content-ack/
block untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:36:52 -05:00
..
agency-layouts.js feat: agency zone-grant issuance UI + reactive placement card (#73) 2026-06-14 15:12:55 -05:00
agency-targets.js feat: full-screen-only guardrail for agency designations (#73) 2026-06-14 17:36:30 -05:00
branding.js fix(security): patch quick-win findings from the codebase review 2026-06-08 19:02:19 -05:00
command-queue.js feat(socket): delivery queue for offline-device emits 2026-05-14 13:06:43 -05:00
content-ack-limiter.js fix(#143): content-ack flood control — per-device rate budget + loop-lag valve 2026-06-27 22:21:57 -05:00
content-ingest.js refactor(content): extract the upload ingest into a shared lib (#73) 2026-06-13 22:48:42 -05:00
device-sanitize.js fix(security): patch quick-win findings from the codebase review 2026-06-08 19:02:19 -05:00
image-gen.js feat(ai): generate background + foreground images for signs (#41 Phase 2) 2026-06-09 13:40:14 -05:00
ota-breaker.js fix(#144): OTA update-check circuit-breaker + phantom guard + per-device keying 2026-06-28 23:36:52 -05:00
pair-lockout.js fix(api): harden device pairing against brute-force (#87) 2026-06-12 20:16:12 -05:00
permissions.js feat(roles): add cross-org platform_operator staff role (#13) 2026-06-05 10:30:21 -05:00
reconnect-throttle.js fix(#142): load-aware per-device reconnect throttle (the outage fix) 2026-06-27 19:18:00 -05:00
schedule-eval.js feat(scheduling): per-item schedule blocks (#74 dayparting, #75 auto-expire) 2026-06-11 15:46:41 -05:00
schema-check.js fix(db): observable migrations + fail-fast schema verification (#37) 2026-06-09 09:31:52 -05:00
secretbox.js feat(ai): AI content design in the Designer, BYO endpoint (#41 Phase 1) 2026-06-09 12:23:55 -05:00
socket-rooms.js feat(socket): Phase 2.3 workspace-scoped dashboard socket rooms + per-command permission gates. Dashboard namespace was previously a flat broadcast - every connected dashboard received every device's status/screenshot/playback events platform-wide (foreign device names + IPs included). Inbound socket commands gated by a legacy admin/superadmin role check that was dead code post-Phase-1 rename. 2026-05-12 11:34:24 -05:00
tenancy.js feat(roles): add cross-org platform_operator staff role (#13) 2026-06-05 10:30:21 -05:00
tenant-cascade-migration.js fix(db): cascade tenant resources on workspace/org delete (#18 follow-up) 2026-06-08 16:01:52 -05:00
totp-lockout.js feat(server): TOTP primitives - encrypted secret, hashed recovery codes, verify lockout (#100) 2026-06-13 20:48:55 -05:00
totp.js feat(server): TOTP primitives - encrypted secret, hashed recovery codes, verify lockout (#100) 2026-06-13 20:48:55 -05:00
user-deletion.js feat(admin): Delete Organization + Workspace with cascade (#36) 2026-06-09 09:22:21 -05:00
zone-validate.js fix: per-item mute round-trip + multi-zone orphan-zone fallback & warnings 2026-06-22 23:16:29 -05:00