Commit graph

28 commits

Author SHA1 Message Date
ScreenTinker 8d37c7f5ff fix(#143): notify a screen it's paired on reconnect (recovery-critical)
Bold: screens sit on the Connect page showing the server URL = paired server-side
but never told, so the app never starts playing.

Flow / gap (Step A):
- CLIENT leaves the Connect page ONLY on the 'device:paired' event — web player
  (player/index.html) hides the setup screen; Android ProvisioningActivity.onPaired
  launches MainActivity + finish(). That event is the sole signal.
- SERVER pushes 'device:paired' to the device's room from POST /api/provision/pair
  (server.js) at pair time — but ONLY reaches a LIVE socket then. The normal
  device_id reconnect path emitted device:registered + device:playlist-update but
  NOT device:paired. So a screen paired while disconnected, or that reconnects after
  pairing (exactly the screens cycling on the Connect page), is paired server-side
  (user_id set, receiving playlists) yet never gets device:paired -> stuck on Connect.

Fix (server-only, uses the EXISTING client listener — no client update needed, which
matters because we can't push a client update to stuck screens): on the device_id
reconnect, if the device is paired (user_id set), re-emit 'device:paired'
{device_id, name}. Push-on-pair (server.js) already covers the live-at-pair-time
case; this covers paired-then-reconnect. A paired screen now leaves Connect and
plays on its next reconnect with no client change and no manual re-pair.

Tests (port 3989, real flow): provision -> pair via /api/provision/pair (socket
closed) -> reconnect RECEIVES device:paired (+name +playlist) — the stuck-screen
repro; an unpaired device gets NO device:paired (stays on the pairing flow); the fix
reuses the existing device:paired event (no new protocol). Full suite green serial
AND parallel (220); dbac699 / 404c330 / e734281 intact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 23:52:30 -05:00
ScreenTinker e73428182d fix(#143): fingerprint-reclaim stuck loop — reclaim by runtime liveness, throttle log
Bold beta1: three devices spam "Fingerprint reclaim rejected ... device active
(status=offline, ~2500s since heartbeat, liveConn=false)" twice/~2s indefinitely —
contradictory: gone by every signal yet treated as active.

Root cause (NOT a missing clear — corrected the hypothesis). The reject condition
was `liveConn || status==='online' || secondsSince < RECLAIM_GRACE_SECONDS(24h)`.
For the observed devices liveConn=false and status=offline, so the ONLY true term
is `secondsSince < 24h` — an effective 24h CALENDAR grace, not a stale flag. Audited
the clears: liveConn (deviceConnections) is removed on the debounced disconnect
(heartbeat.removeConnection) AND the offline_timeout sweep (deviceConnections.delete);
status is set 'offline' on both. liveConn=false + status=offline PROVE the clears
ran — there is nothing stale to clear. The 24h time gate (mislabeled "device active")
blocked a legitimately-gone device from reclaiming for up to 24h, so it retried
every ~2s forever-in-practice. The "twice per ~2s" is two reclaim ATTEMPTS per cycle
(client reconnect + re-pair-on-auth-error), each hitting the single console.warn —
not double-logging in one attempt.

Fix:
- Decide "still alive" from RUNTIME signals: `!!liveConn || secondsSince <
  reclaimSettleSeconds`. A device with no live socket and a heartbeat older than the
  settle window is gone -> reclaimable. A live (or just-seen) device is still
  rejected, so reclaim-abuse protection holds. NOT just ignoring "active" — it fixes
  WHY it was stuck (the 24h gate). RECLAIM_SETTLE_SECONDS default 300 (was 24h).
  SECURITY TRADEOFF flagged in config: shortens the anti-fingerprint-theft window;
  raise to re-tighten. Tuning guess to validate vs Bold.
- Log throttle: the deferral logs at most once per device per RECLAIM_REJECT_LOG_
  WINDOW_MS (default 60s) — collapses the double-log + the per-2s flood (same
  discipline as the content-ack shed log). Cleared when a reclaim proceeds.

Recovery of the 3 wedged devices (2febcaa9, 1984694c, 139159eb): they SELF-HEAL on
their next reclaim attempt (~2s) once this ships — their heartbeats are ~2500s stale
(>300s settle) and liveConn=false, so the reclaim now succeeds. No operator SQL needed.

Tests (port 3988): gone device reclaims; live device still rejected; clear-on-leave
(disconnect clears liveConn -> stale device reclaims); deferral log <=1 per window.
Full suite green serial+parallel (217). reconnect-throttle.js, the dbac699 content-ack
limiter, and the 404c330 block/auth code untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 22:56:48 -05:00
ScreenTinker 404c3301dd fix(#143): enforceable device block + fix the null-token auth short-circuit
Highest-priority #143 item (operator finding from Bold): nulling a device's token
did NOT lock it out — device 75c2a08a immediately reconnected and saturated the
loop. Two distinct defects:

1. Auth short-circuit (the cause). device:register used
     if (device.device_token && !validateDeviceToken(...)) { reject }
   so a NULL/empty STORED token made the guard falsy -> validation SKIPPED, and the
   next block even MINTED a fresh token and persisted it. Nulling a token thus
   RE-PROVISIONED the device instead of locking it out. Fix: drop the
   `device.device_token &&` guard -> `if (!validateDeviceToken(device_id, device_token))`
   (validateDeviceToken already returns false for null-stored/missing/mismatch), and
   remove the legacy "mint a token for a null-token device" path (the re-provision
   vector). An already-provisioned device (every row, incl. 'provisioning', is created
   WITH a token) presenting null/empty/invalid is now REJECTED + disconnected.
   The first-pairing seam is unaffected: a brand-new device has NO device_id and goes
   through the pairing_code branch (which mints id+token) — a different code path.

2. No server-side kill switch. Added a `blocked` column (devices.blocked INTEGER
   NOT NULL DEFAULT 0; schema.sql + a database.js migration). The block is the FIRST
   gate at the top of device:register — before the fingerprint block, the reconnect
   throttle, any DB writes, or playlist build — so a blocked device's socket is
   refused immediately (auth-error 'Device blocked' + disconnect, zero further work).
   It does NOT rely on null-token (the thing that failed). The row is re-read every
   register, so a DIRECT SQLite edit takes effect on the device's NEXT reconnect with
   NO server restart. Operator statements (dashboard-down, hand-edit):
     block:   UPDATE devices SET blocked = 1 WHERE id = '<device_id>';
     unblock: UPDATE devices SET blocked = 0 WHERE id = '<device_id>';

Tests (port 3987): nulled-token provisioned device is REJECTED (75c2a08a repro);
blocked=1 refused at the first gate (no register/playlist); unblock reconnects;
first-pairing still works; normal valid-token device unaffected. Full suite green
serial AND parallel (213); reconnect-throttle.js + the dbac699 content-ack limiter
untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 22:40:30 -05:00
ScreenTinker dbac699854 fix(#143): content-ack flood control — per-device rate budget + loop-lag valve
#142's content-ack dedup is insufficient: a device cycling 2-4 content IDs makes
every ack look unique so dedup never fires, while aggregate volume from ~30 devices
saturates the event loop (the #142 reconnect throttle kept the server responsive,
which is how this was even observable).

Folded ONE control on the content-ack path (no competing limiters; reconnect-
throttle.js untouched) in lib/content-ack-limiter.js:
- Step 1 — per-device RATE budget: caps TOTAL non-duplicate acks per device per
  window regardless of differing content_id (the case dedup misses). Over budget =
  DROP silently (the per-ack log+emit is the cost); log ONCE per device per window
  when shedding starts. Keeps the #142 dedup (dedup'd repeats don't consume budget).
  Per-device, in-memory, resets on restart (modeled on lastPlayLogAt; does NOT reuse
  reconnect-throttle's ban-semantics bucket).
  Env (TUNING GUESSES, validate vs Bold's fleet): CONTENT_ACK_MAX_PER_WINDOW=20,
  CONTENT_ACK_RATE_WINDOW_MS=10000 (=2/s, above legit ~<=1/s, below the flood).
- Step 2 — global pressure valve: reuses the #142 loop-lag band (+ its hysteresis,
  no second control loop). Under CRITICAL band, shed content-acks even for an
  in-budget device; reconnects + dashboard/HTTP are ALWAYS processed; a healthy
  device in a non-critical band is never touched by the valve. Valve open/close
  logged once at the band edge in services/loop-lag.js (not per shed message).

Tests (unique ports 3985/3986, not the 3982/3983/3984 set):
- unit: the #143 regression (cycling ids evading dedup IS rate-limited), under/over
  budget, dedup still works + doesn't consume budget, valve sheds in-budget under
  critical while normal is untouched, rate precedence, window reset, per-device
  isolation.
- integration: socket flood is capped to budget with a single shed-start log;
  under-budget passes every ack; valve OPEN sheds content-acks while a reconnect +
  /api/status still succeed.
Full suite green serial AND parallel (208 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 22:21:57 -05:00
ScreenTinker 15448d1c5d fix(#142): dedup repeated content-ack reports (secondary load)
device:content-ack logged + emitted every message, so a device repeatedly
reporting the same "content <id>: ready" (observed from an older app version)
added avoidable load per message.

- Suppress identical (device_id, content_id, status) reports within
  config.contentAckDedupMs (default 10s), modeled on the lastPlayLogAt throttle.
  A status change has a different key and passes immediately; a fresh report after
  the window passes too. In-memory, resets on restart. The handler does no DB
  writes, so this is purely shedding redundant log+emit work.

test: integration over a real authenticated device socket — a burst of identical
"ready" collapses to one log/emit, a "ready" after the window passes, and a status
change is never deduped. Unique PORT (3984).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 19:35:04 -05:00
ScreenTinker 101f086204 fix(#142): load-aware per-device reconnect throttle (the outage fix)
Gates genuine reconnects PER DEVICE before the heavy register work (DB writes +
playlist build) runs, so a single flapping device can no longer saturate the
event loop and take down the server.

- Actuator is per-device, keyed on device_id (modeled on lastPlayLogAt). A device
  is flagged only when it exceeds reconnectBaseMax genuine reconnects per window.
  Same-socket playlist refreshes (isPlaylistRefresh) are exempt.
- Load-awareness is BANDED (normal/elevated/critical from the step-2 lag signal),
  not a continuous controller. The band only MULTIPLIES an already-flagged
  device's backoff; global lag never gates a healthy device.
- Hysteresis: escalate immediately while storming (tighten fast); decay one level
  per reconnectReleaseMs of calm (release slow).
- HARD CEILING per device, independent of band and warm-up — a slow-ramp attacker
  can't train through it.
- COLD START: for reconnectWarmupMs after boot, force the normal band and apply
  only the hard ceiling, so a full-fleet reconnect after a deploy doesn't throttle
  healthy screens. State is in-memory, resets on restart.
- Observability: every throttle engagement logs device, band, observed vs allowed
  rate, and backoff. Throttled device gets device:throttled + a deferred disconnect.

Tests (api.test.js style):
- unit: healthy-never-throttled, storm-throttled-with-growing-backoff, band
  multiplies backoff, hard-ceiling-even-in-warmup, warm-up leniency, neighbor
  isolation, slow release.
- integration GATE (the required one): full-fleet reconnect right after restart
  throttles NO healthy device; a single device storming IS throttled; a neighbor
  stays unaffected while another storms.
- also fixes pre-existing test PORT collisions (my new integration files clashed
  with totp.test.js:3979 and totp-keyrotation.test.js:3980 -> moved to 3982/3983);
  full suite now green serially AND in parallel.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 19:18:00 -05:00
ScreenTinker 0c0a8dd68a fix(ota): surface stuck OTA on dashboard + read APK signer correctly on API 28/29 (#139)
Follow-up to the cache/backoff loop fix (aa23cf0): make a device that can't
self-install visible to operators, and fix the signature-verify bug that kept the
whole #139 fix from engaging on the actual Fire OS target.

Dashboard surface (Phase 2):
- devices gains ota_status / ota_target_version / ota_attempts / ota_updated_at
  via the idempotent ALTER TABLE ADD COLUMN migration (non-destructive,
  default-backfilled, idempotent on re-run).
- The device reports ota_status (OtaThrottle.statusFor -> none | pending |
  manual_update_required) in device_info; the server persists it on register
  (the reconnect backstop). devices d.* already surfaces it to the dashboard.
- Dashboard shows a non-blocking amber badge when manual_update_required
  ("Update available (vX) - install failed N times, manual update required");
  i18n key in en.js (non-en inherits via the en fallback). Server suite +1 test.

Event-driven status (Option B):
- New device:ota-status WS message, emitted on STATE TRANSITIONS only
  (enter-backoff -> manual_update_required, clear -> none), so the badge updates
  promptly without waiting for a reconnect and without per-poll/heartbeat chatter.
  Server handler persists the same fields; an unknown/forged device_id is a safe
  no-op. The register-path persist stays as the reconnect backstop.

Signature-verify fix (the critical piece):
verifyApkSignature read the downloaded APK's signer via
getPackageArchiveInfo(GET_SIGNING_CERTIFICATES).signingInfo, but that field is
null for ARCHIVE files on API 28/29 (populated only from API 30). On Fire OS 8
(Android 9 / API 28) - the actual deployment target - this returned 0 certs from
a correctly-signed APK, so every OTA was refused as "tampered," the cache was
deleted, and the full APK re-downloaded every check cycle. This was the real
cause of the #139 re-download loop, NOT a silent-install failure: the cache and
backoff added in this branch sit behind this verify gate and never engaged on
the target.

Fix: below API 30, read the archive's signer via the legacy GET_SIGNATURES +
.signatures (its v1/JAR cert, which IS populated on 28/29). Keep
GET_SIGNING_CERTIFICATES + signingInfo for API >= 30 and for the installed-app
read (which works on 28+). The archive's signer is still extracted and compared
to the installed app's signer; a mismatch or zero-cert APK is still rejected.
This reads the cert correctly on old APIs - it does not weaken verification.

Verified on emulators:
- API 28: verify now passes for a legit APK (was: 0 certs, refused). Full backoff
  then engages - 8.5MB pulled once, cache-hit on retries, backoff after 3,
  manual_update_required emitted once; clears on successful update.
- API 28 negative: a re-signed (different-key) APK is still refused on cert
  MISMATCH - no hole opened.
- API 30: unchanged path still passes (no regression).
- server suite 173/173, OtaThrottleTest 7/7.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 22:49:01 -05:00
screentinker 1f2e923005
fix(#134): quiet false "reconnect" log + report HDMI output and UI render resolution (#136)
Two device-REPORTING fixes from the #134 investigation (the PiP rendering itself
was #135).

1) "Device reconnects every ~45s" was a logging artifact, not instability. The
   player re-emits a full device:register on the SAME socket every ~45-60s
   (requestPlaylistRefresh) to pull a fresh playlist; the server logged
   "Device reconnected" for every register of a known device. The attached 4-day
   log showed 1415 "reconnected" vs 30 real socket connects and 0 heartbeat
   timeouts — the socket never dropped, so #134's "PiP lost between reconnects"
   was a misdiagnosis. Fix: only log a genuine reconnect (new socket); a
   same-socket re-register is a refresh (currentDeviceId === device_id) and stays
   quiet. The playlist still refreshes.

2) Device reported 720p while the monitor showed a 1080 signal. DeviceInfo
   reported getRealMetrics() — the UI RENDER SURFACE — but TV boxes render the UI
   at 720p and upscale to a 1080p HDMI signal. Now report BOTH: screen_width/height
   = the output mode (Display.Mode.physicalWidth/Height), render_width/height =
   the render surface (getRealMetrics). Two new nullable devices columns, stored on
   pairing INSERT + reconnect UPDATE, exposed via the device API, shown on the
   dashboard as "1920x1080 (UI 1280x720)" when they differ.

Backward compatible (required + verified on emulator): a device that omits
render_* — or sends no device_info at all — still registers, with render_* = null,
on both the INSERT and UPDATE paths. New columns nullable; stores use
`?? null` / `|| null`. All 167 server tests pass.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 15:32:11 -05:00
ScreenTinker 1c748b8d3b feat(preview): draft-aware device-free playlist preview via player reuse (#104)
Replaces the broken/fragmented preview with a single surface that renders a
DRAFT playlist exactly as a device does, by reusing the player's renderer in a
same-origin iframe. Fixes "not all items load" (one renderer, full type union)
and inherits the player's YouTube correctness (YT.Player handshake).

Server:
- deviceSocket: extract assemblePayload() (zone-reset + canonical shape) from
  buildPlaylistPayload so the device path and preview can't drift. Pure refactor
  (all 149 tests green).
- playlists: GET /:id/preview-payload (requirePlaylistRead, workspace-scoped).
  Draft-aware via buildSnapshotItems (live items, not published_snapshot);
  derivePreviewLayout() resolves layout from the playlist's own zone-bound items
  (0 zoned -> fullscreen; 1 -> use it; >1 -> dominant + ambiguous flag, never
  crashes). orientation validated/passthrough; wall_config/timezone null.

Player (renderer UNTOUCHED):
- ?preview=1&playlist=ID boot branch: fetch preview-payload (same-origin Bearer
  token) and call handlePlaylistUpdate(). Gated before the pairing/socket path
  so the unpaired auto-connect never fires. All socket emits already guarded.
- Webpage widgets: always-visible honest note (no auto-detection — an XFO
  refusal is provably indistinguishable client-side from a working embed).

Dashboard:
- playlists: Preview button + player-iframe modal with landscape/portrait toggle.
- widgets: same honest note on the existing widget preview modal (the surface the
  bug was reported on).
- i18n x6 (en/es/fr/de/it/pt) + player i18n x5.

Validated end-to-end (headless Chrome + CDP): preview boots, webpage note
renders, 3-zone layout derives+renders, shape parity with device snapshot proven
on real data, auth gate returns 401. The world-readable /uploads finding is
tracked separately as #107 (not a #104 concern — same path the device uses).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 14:11:05 -05:00
ScreenTinker 2ccf3264a9 feat(scheduling): per-item schedule blocks (#74 dayparting, #75 auto-expire)
Some checks are pending
CI / Unit tests (node --test) (push) Waiting to run
CI / Android unit tests (Kotlin schedule evaluator vectors) (push) Waiting to run
CI / Boot smoke + version check (push) Waiting to run
Each playlist item can carry schedule blocks (active days, start/end
time-of-day, optional start/end dates). An item plays when the screen's
local "now" matches at least one block; an item with no blocks always
plays. #74 covers time-of-day/day-of-week windows including overnight
wrap; #75 covers inclusive date ranges (auto-expiry). Evaluation is
on-device, so dayparting and expiry work offline.

- Shared evaluator contract: shared/schedule-vectors.json (39 vectors —
  DST US+AU, overnight-wrap anchoring, timezone correctness, date
  boundaries). Canonical JS evaluator in server/lib/schedule-eval.js;
  Kotlin and Tizen ports kept in lockstep by drift guards (Tizen byte-diff
  test, Kotlin JUnit reads the shared JSON, new android-test CI job).
- All three players (web, Android, Tizen) filter by schedule against their
  own clock, idle with a "Nothing scheduled" message + 30s re-check when
  everything is filtered, and fail open on any evaluator error.
- Editor: per-item schedule modal + row badge in the playlist editor;
  client validation mirrors the server; editing marks the playlist draft.
- Part B (behaviour change): device/group schedule overrides now evaluate
  in each device's effective timezone instead of server-local time.
- Device detail shows the reported timezone + a clock-skew warning.
- i18n for en/es/fr/de/pt across all new strings (namespaced itemsched.*
  to avoid colliding with the device-schedule calendar's schedule.*).
- CHANGELOG documents the feature, the Part B change, the fail-open
  guarantee, and the scheduled-single-video re-render tradeoff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 15:46:41 -05:00
ScreenTinker c8e664e66c fix(ws): guard fingerprint insert against stale device_id (FK violation noise)
A socket reconnecting with a device_id that no longer exists in `devices`
(e.g. the row was deleted server-side) hit the device_fingerprints insert
with an unknown foreign key. INSERT OR IGNORE does NOT suppress FOREIGN KEY
violations, so it threw a caught-but-noisy "Fingerprint tracking error" on
every such reconnect. Null out an unknown device_id before the insert; a
genuinely fresh device sends no device_id and was always fine.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 15:46:41 -05:00
ScreenTinker 73912d5f58 feat(debug): live per-device debug logging toggle on the device screen
Checkbox on the device-detail page streams the Android player's player/zone logs
live (no adb). Transient (off on reconnect), not persisted.

- Android: DebugLog util (logcat + optional socket emit); 'set_debug' command wires
  the sink + flag; key player/zone decisions (layout mode, playItem, per-zone
  render) emit through it.
- Server: relay device:log -> dashboard workspace room as dashboard:device-log.
- Dashboard: 'Debug logging' checkbox sends set_debug; live log panel streams lines
  (rendered via textContent; capped at 500).
2026-06-08 21:49:03 -05:00
ScreenTinker 50d7dbe222 fix(player): zone reset on multi->single layout switch + don't blank multi-zone
- Server (deviceSocket buildPlaylistPayload): when a device's layout has <2 zones
  (single or none), strip leftover zone_id from assignments. After switching a
  device from multi-zone back to fullscreen, content was stuck bound to a gone
  left/right zone_id and never played; nulling it lets both players fall back to
  the default fullscreen renderer.
- Web player: render multi-zone zones BEFORE the single-item 'renderable?' bail,
  so an empty/placeholder current rotation item can't blank the whole screen.
2026-06-08 21:31:27 -05:00
ScreenTinker 890ec5790f fix(proof-of-play): throttle play_logs writes to prevent runaway bloat
A player stuck in a tight loop (playlist with 0-second item durations)
fires device:play-event 'play_start' ~3x/sec, inserting a play_logs row
each time. Three web players doing this generated ~909k rows (99.9% with
duration_sec=0) and grew the prod DB to 265 MB.

Throttle proof-of-play inserts to at most one per device per 2s (in-memory
lastPlayLogAt map). Skipped cycles create no row; the live dashboard
progress event still fires every time, so the UI is unaffected. The
play_end UPDATE only closes open rows, so throttling play_start is safe.

(Junk rows already pruned in prod: 909k deleted, DB 265 MB -> 9.8 MB.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 09:52:22 -05:00
ScreenTinker f5ca26ae2d fix(socket): offline debounce + truthful single-device command feedback
Two dashboard-accuracy improvements for issue #3.

Disconnect debounce (5s):
- Brief transient flaps (Engine.IO ping miss, eviction-then-reconnect,
  Wi-Fi blip) no longer immediately flip the device to offline in the
  dashboard. Disconnect handler now defers the offline transition;
  register handlers cancel the pending timer if reconnect lands in
  window.
- Existing stale-disconnect guard kept as fast-path for the eviction
  case (no timer scheduled at all when the active heartbeat conn is
  already a different socket).
- Re-check at timer fire compares socketIds: aborts only if a
  GENUINELY DIFFERENT socket reclaimed the device. Just the closing
  socket's own (not-yet-cleaned-up) entry is treated as stale and
  proceeds with offline transition.
- Server-restart mid-grace is handled by the heartbeat checker safety
  net (existing component): any 'online' row with last_heartbeat
  older than heartbeatTimeout gets marked offline on next sweep.

Truthful single-device command feedback:
- dashboard:device-command handler now checks deviceNs.adapter.rooms
  for an active socket before emitting (matches the group-command
  route's pattern).
- If room is empty, falls through to commandQueue.queueCommand (lazy
  require - if commit C is reverted, MODULE_NOT_FOUND is cached and
  every subsequent call gets consistent queued=false behavior).
- Returns three-state ack to caller: { delivered, queued, reason }.
- Server log line was misleading - now logs 'Command delivered to
  device X' vs 'Command for offline device X (queued=true/false)'.

Frontend:
- sendCommand() takes optional callback. Without one, fires-and-forgets
  (no behavior change for non-wired callers). With one, uses Socket.IO
  .timeout(5000).emit so the callback always fires (ack or no_ack).
- Six device-detail command buttons wired to three-state toasts:
  reboot, shutdown, screen_off, screen_on, launch, update.
  - delivered: green/success toast (existing localized message)
  - queued: amber/warning toast (new generic message)
  - no_ack: red/error toast
  - fallback: red/error toast
- Two callers intentionally left fire-and-forget:
  - window._sendCmd (generic remote-overlay keypress/touch helper)
  - enable_system_capture (has its own visual state machine; out of
    scope for this commit)

Three new i18n keys (en.js only; other locales follow later):
- device.toast.command_queued
- device.toast.command_undeliverable
- device.toast.command_no_ack

Refs #3

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 13:11:40 -05:00
ScreenTinker 742d8c4b09 feat(socket): delivery queue for offline-device emits
Short-lived per-device queue covers the TV-flap window (issue #3):
when a device is mid-reconnect, prior code emitted to an empty room
and the event vanished. Now playlist-updates and commands targeting
an offline device are queued and flushed in order on the next
device:register for that device_id.

server/lib/command-queue.js (new):
- pendingPlaylistUpdate: per-device marker (rebuild via builder on
  flush -> always fresh DB state, no stale snapshots)
- pendingCommands: per-device Map<type, payload> with last-of-type
  dedup (most recent screen_off wins)
- TTL via COMMAND_QUEUE_TTL_MS env (default 30000)
- Active sweep every 30s prunes expired entries

Memory bounds: ~6 entries per device worst case (1 playlist marker
+ 5 command types), unref'd sweep timer.

Wired emit sites (8 total; the four direct socket.emit calls in
deviceSocket register handlers are intentionally NOT queued because
the socket is alive by definition at those points):
- server/routes/video-walls.js   (pushWallPayloadToDevice)
- server/routes/device-groups.js (pushPlaylistToDevice)
- server/routes/content.js       (content-delete fan-out)
- server/routes/playlists.js     (pushToDevices + assign)
- server/services/scheduler.js   (scheduled rotations)
- server/ws/deviceSocket.js x2   (wall leader reclaim/reassign)

server/ws/deviceSocket.js register paths now call flushQueue after
heartbeat.registerConnection + socket.join. Existing
socket.emit('device:playlist-update', ...) lines kept - they send
the initial state on register; the flush replays any queued events.
Player's handlePlaylistUpdate fingerprint check dedupes the
overlap.

Refs #3

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 13:06:43 -05:00
ScreenTinker fc29843035 feat(socket): Phase 2.3 workspace-scoped dashboard socket rooms + per-command permission gates. Dashboard namespace was previously a flat broadcast - every connected dashboard received every device's status/screenshot/playback events platform-wide (foreign device names + IPs included). Inbound socket commands gated by a legacy admin/superadmin role check that was dead code post-Phase-1 rename.
Fix: at connect, enumerate the user's accessible workspace_ids (direct workspace_members + org_owner/admin paths + platform_admin 'all') via new accessibleWorkspaceIds() helper in lib/tenancy.js; socket.join one room per workspace. All 12 dashboardNs.emit sites across deviceSocket / heartbeat / server.js / devices route / video-walls route now route via dashboardNs.to(workspaceRoom(...)).emit() with the workspace looked up from the relevant device or wall. New lib/socket-rooms.js holds the helpers and breaks a circular dependency (dashboardSocket already requires heartbeat, so heartbeat can't require dashboardSocket).

Inbound 6 commands rewired to canActOnDevice(socket, deviceId, tier): request-screenshot is read tier (workspace_viewer+); remote-touch/key/start/stop and device-command are write tier (workspace_editor+). Platform_admin and org_owner/admin always pass via actingAs. Legacy admin/superadmin branch dropped.

Lifecycle note: workspace-switch already calls window.location.reload (Phase 3 switcher), which forces a fresh socket with updated memberships - no per-emit re-evaluation needed.

Smoke tested with 3 simultaneous socket.io-client connections (switcher-test, swninja, dw5304 platform_admin) + direct canActOnDevice invocation for 6 user/device/tier combinations. All 9 outbound isolation cells and all 6 permission gates pass. Fixture mutation: switcher-test's Field Crew membership flipped from workspace_editor to workspace_viewer to exercise the read/write tier split in one login.
2026-05-12 11:34:24 -05:00
ScreenTinker 2068bc8833 Video walls: free-form canvas editor, leader-driven sync, group dissolve, progress bars
Wall editor: replaces the small grid with a Figma-style pan/zoom canvas. Each
display is a rectangle that can be dragged/resized to match its physical
arrangement; a separate semi-transparent player rect overlays the screens and
defines what content plays where. Drag empty space to pan, wheel to zoom,
"Center" button auto-fits content. Per-rect numeric x/y/w/h panel; arrow keys
nudge by 1px (10px with shift). Negative coordinates supported for screens
offset above/left of the origin. Coords rounded to integers on save.

Wall rendering: each device receives screen_rect + player_rect, maps the
player into its viewport with vw/vh and object-fit:fill so vertical position
of every source pixel is identical across devices that share viewport height.
Leader emits wall:sync at 4Hz with sent_at timestamp; followers apply
latency-adjusted target and use playbackRate ±3% for sub-300ms drift,
hard-seek for >300ms. Followers stay muted; leader unmutes via gesture with
AudioContext priming and pause+play retry to bypass Firefox autoplay.
"Tap to enable audio" overlay as a final fallback.

Reconnect handling: server re-evaluates leader on device:register so the
top-left tile reclaims leadership when it returns. Followers emit
wall:sync-request on entering wall mode (incl. reconnect) so they snap to
position immediately instead of drifting until the next periodic tick.

Group dissolve: removing a device from its last group clears its playlist
to mirror wall-leave semantics. Leaving a group with playlists on remaining
groups inherits the next group's playlist.

Dashboard: walls render as their own card section (hidden the device cards
they contain). Multi-select checkboxes on cards + "Create Video Wall" toolbar
action that creates the wall, removes devices from groups, and opens the
editor. dashboard:wall-changed broadcast triggers live re-render. Per-card
playback progress bar driven by play_start events forwarded from devices.

Security: PUT /walls/:id/devices verifies caller owns each device (or has
team-owner access via the widgets pattern), preventing cross-tenant device
takeover. wall:sync and wall:sync-request validate that the sending device
is a member of the named wall; relay re-stamps device_id with currentDeviceId
so clients can't spoof or shadow-exclude peers.

Schema: video_walls += player_x/y/width/height, playlist_id;
video_wall_devices += canvas_x/y/width/height. All idempotent migrations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:11:16 -05:00
ScreenTinker c105a5941e Security: fix IDORs, XSS, rate limits, SSRF validation
HIGH 1 (teams IDOR): POST/DELETE /api/teams/:id/devices now require the
caller to own the device before assigning or detaching it. Without this
check, any team member could pull any device into their team via UUID
guess and gain remote-control access.

HIGH 2 (schedules IDOR): PUT /api/schedules/:id now re-verifies
ownership of every changed target field — device_id, group_id,
content_id, widget_id, layout_id, playlist_id. Previously only the
schedule owner was checked, letting users fire arbitrary content on
victim devices via update.

HIGH 3 (filename XSS): file.originalname captured by multer bypassed
sanitizeBody. New safeFilename() wraps every INSERT path (multipart
upload, remote URL, YouTube). Frontend sinks now go through esc() in
content-library.js, device-detail.js, video-wall.js. Web player gets
an inline escHtml helper for its info overlay where filenames, device
name, and serverUrl land in innerHTML.

HIGH 4 (kiosk public XSS): config.idleTimeout is now coerced via the
existing safeNumber() helper at both interpolation sites. A crafted
value with a newline can no longer escape the JS line comment to
inject arbitrary code into the public render endpoint.

HIGH 5 (folder DoS): POST /api/folders enforces a per-user cap of 100
folders (429 on overflow). Superadmin exempt.

MED 1 (SSRF): ImageLoader.decodeUrl rejects any URL scheme other than
http(s) so a malicious remote_url can't read local files via file://.
On the server, validateRemoteUrl() is extracted and now also runs on
PUT /api/content/:id remote_url updates — previously the SSRF check
only fired on POST.

MED 2 (fingerprint takeover): the WS device:register fingerprint
reclaim path now rejects takeover while the target device is online or
within 24h of its last heartbeat. A leaked fingerprint can no longer
hijack an active display.

MED 3 (npm audit): bumped uuid 9.x -> 14.0.0 (v3/v5/v6 buffer bounds
CVE; we only use v4 so not exploitable, but clears the audit). path-
to-regexp resolved to 0.1.13 via npm audit fix. 0 vulns remaining.

MED 4 (folder admin consistency): ownedFolder() and the content.js
folder_id move check now both treat only superadmin as privileged,
matching GET /api/folders. Previously a plain "admin" could rename
or delete folders they couldn't see, and could move content into
folders they couldn't list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 14:37:18 -05:00
ScreenTinker ee6888e737 Fix display duplication on WebSocket reconnect
Server-side: when a device reconnects on a fresh socket while the old
TCP zombie is still around, the old socket's eventual disconnect handler
flipped the device offline and removed the new heartbeat entry. Now we
proactively evict any prior socket on register and ignore disconnects
from sockets that are no longer the registered one for that device_id.

Frontend: dedupe devices by id from the API response and only render
each device in the first group it belongs to (multi-group membership
is still tracked for the Manage modal).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 10:13:00 -05:00
ScreenTinker b4ac2fb821 Fix broken service worker + device auth rejection on playlist refresh
Bug 1 (SW): Rewrote service worker fetch handler:
- Skip range requests (video seeking) to avoid caching partial responses
- Skip non-GET requests entirely
- Use ignoreSearch on cache match to avoid query-param misses
- Don't cache opaque cross-origin responses
- Outer catch on Cache API failures
- Don't intercept catch-all requests (let browser handle natively)
- Bump cache version to v4 to purge broken cached responses

Bug 2 (auth): Playlist refresh register was missing device_token,
causing auth rejection every 5 minutes. Fixed by including token
in the refresh-register emit. Added diagnostic logging on both
client and server for token validation failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 22:18:08 -05:00
ScreenTinker dc7450b6a7 Offline resilience: persist playlist cache for cold-start recovery
Web player:
- Cache playlist JSON to localStorage on every update
- Restore and start playing immediately on boot before connecting
- Clear cache on unpair/reset

Android app:
- Cache playlist JSON to EncryptedSharedPreferences on every update
- Restore cached playlist on cold-start, play from disk-cached content
- Update cache on content deletion, clear on unpair

Server (device socket):
- Fingerprint reconnect: issue fresh token instead of rejecting
- Send device:paired on fingerprint recovery for claimed devices
- Add status logging and dashboard notification on fingerprint reconnect

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 21:49:45 -05:00
ScreenTinker 436a3be7f6 Phase 3: playlist publish/draft state with auto-publish from device detail
Schema: add status and published_snapshot columns to playlists table.
Migration snapshots all existing playlists as published (idempotent via schema_migrations).

Devices always receive the published_snapshot, not live playlist_items.
Edits from device-detail/groups auto-publish immediately (display updates instantly).
Edits from playlist detail page go to draft (requires explicit publish).
POST /playlists/:id/publish snapshots and pushes to all devices.
POST /playlists/:id/discard reverts playlist_items from published snapshot.
Content deletion scrubs references from all published snapshots.

Frontend: draft badge in playlist list, prominent yellow banner with publish/discard
buttons on playlist detail and device detail pages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 20:52:29 -05:00
ScreenTinker afbe113acf Security audit remediation: auth, IDOR, XSS, hardening
- Device WebSocket authentication: devices get a device_token on
  registration, must present it on reconnect. All WS events require
  prior auth. Timing-safe token comparison.
- IDOR fixes: ownership checks on schedules (device, week), layouts
  (all CRUD, zones, duplicate, device assign), video-walls (content,
  device-config).
- XSS prevention: shared esc() helper in utils.js, fixed 13 innerHTML
  injection points across 9 frontend files.
- OAuth hardening: no longer silently overwrites auth_provider on
  accounts with local passwords (returns 409).
- JWT pinned to HS256 for sign and verify.
- Password policy: change endpoint now requires 8 chars (was 6).
- HSTS header enabled (max-age 1 year, includeSubDomains).
- Stripe webhook rejects unsigned payloads when no secret configured.
- Screenshot size validation (max 2MB base64).
- Rate limiting on exports, imports, content operations.
- Content file serving checks playlist_items instead of old assignments.
- Content ownership verified in device-groups assign-content.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 22:48:07 -05:00
ScreenTinker 6d4d39c2d8 Phase 2: buildPlaylistPayload reads from playlist_items via device.playlist_id
Replaces the assignments-table query with a playlist_items query keyed on
device.playlist_id. Also eliminates the duplicate payload builder in
scheduler.js — it now calls the shared buildPlaylistPayload.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 22:04:26 -05:00
ScreenTinker faa437881f Add device groups UI, group commands, proxy IP fix, and web player detection
- Dashboard now organizes devices by group with colored section headers
- Group command endpoint (POST /groups/:id/command) sends to all members
- Manage modal with multi-group confirmation prompt
- Destructive commands (reboot/shutdown) require confirmation
- Ungrouped devices shown separately at bottom
- trust proxy + X-Forwarded-For for real client IPs behind Nginx
- Hide Android-only telemetry (battery/storage/RAM/CPU/WiFi) for web players

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 22:03:44 -05:00
ScreenTinker e2879fff58 Instant playlist push, fix YouTube looping, auto-fetch video titles
- Push playlist updates to devices instantly via WebSocket on all
  assignment mutations (add, update, delete, reorder, copy)
- Fix YouTube videos skipping early: remove duration_sec timeout (was
  defaulting to 10s), use generation counter to ignore stale player
  callbacks, disable YouTube loop param for multi-item playlists
- Auto-fetch YouTube video title via oEmbed API when no name provided
- Show actual video duration in M:SS format in playlist instead of
  misleading assignment duration_sec
- Pre-fill server URL from origin on web player setup
- Bump playlist poll interval to 5min (fallback only, push is primary)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 15:42:41 -05:00
ScreenTinker 1594a9d4a4 Initial open source release
ScreenTinker - open source digital signage management software.
MIT License, all features included, no license gates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 12:14:53 -05:00