Bold: screens sit on the Connect page showing the server URL = paired server-side
but never told, so the app never starts playing.
Flow / gap (Step A):
- CLIENT leaves the Connect page ONLY on the 'device:paired' event — web player
(player/index.html) hides the setup screen; Android ProvisioningActivity.onPaired
launches MainActivity + finish(). That event is the sole signal.
- SERVER pushes 'device:paired' to the device's room from POST /api/provision/pair
(server.js) at pair time — but ONLY reaches a LIVE socket then. The normal
device_id reconnect path emitted device:registered + device:playlist-update but
NOT device:paired. So a screen paired while disconnected, or that reconnects after
pairing (exactly the screens cycling on the Connect page), is paired server-side
(user_id set, receiving playlists) yet never gets device:paired -> stuck on Connect.
Fix (server-only, uses the EXISTING client listener — no client update needed, which
matters because we can't push a client update to stuck screens): on the device_id
reconnect, if the device is paired (user_id set), re-emit 'device:paired'
{device_id, name}. Push-on-pair (server.js) already covers the live-at-pair-time
case; this covers paired-then-reconnect. A paired screen now leaves Connect and
plays on its next reconnect with no client change and no manual re-pair.
Tests (port 3989, real flow): provision -> pair via /api/provision/pair (socket
closed) -> reconnect RECEIVES device:paired (+name +playlist) — the stuck-screen
repro; an unpaired device gets NO device:paired (stays on the pairing flow); the fix
reuses the existing device:paired event (no new protocol). Full suite green serial
AND parallel (220); dbac699 / 404c330 / e734281 intact.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bold beta1: three devices spam "Fingerprint reclaim rejected ... device active
(status=offline, ~2500s since heartbeat, liveConn=false)" twice/~2s indefinitely —
contradictory: gone by every signal yet treated as active.
Root cause (NOT a missing clear — corrected the hypothesis). The reject condition
was `liveConn || status==='online' || secondsSince < RECLAIM_GRACE_SECONDS(24h)`.
For the observed devices liveConn=false and status=offline, so the ONLY true term
is `secondsSince < 24h` — an effective 24h CALENDAR grace, not a stale flag. Audited
the clears: liveConn (deviceConnections) is removed on the debounced disconnect
(heartbeat.removeConnection) AND the offline_timeout sweep (deviceConnections.delete);
status is set 'offline' on both. liveConn=false + status=offline PROVE the clears
ran — there is nothing stale to clear. The 24h time gate (mislabeled "device active")
blocked a legitimately-gone device from reclaiming for up to 24h, so it retried
every ~2s forever-in-practice. The "twice per ~2s" is two reclaim ATTEMPTS per cycle
(client reconnect + re-pair-on-auth-error), each hitting the single console.warn —
not double-logging in one attempt.
Fix:
- Decide "still alive" from RUNTIME signals: `!!liveConn || secondsSince <
reclaimSettleSeconds`. A device with no live socket and a heartbeat older than the
settle window is gone -> reclaimable. A live (or just-seen) device is still
rejected, so reclaim-abuse protection holds. NOT just ignoring "active" — it fixes
WHY it was stuck (the 24h gate). RECLAIM_SETTLE_SECONDS default 300 (was 24h).
SECURITY TRADEOFF flagged in config: shortens the anti-fingerprint-theft window;
raise to re-tighten. Tuning guess to validate vs Bold.
- Log throttle: the deferral logs at most once per device per RECLAIM_REJECT_LOG_
WINDOW_MS (default 60s) — collapses the double-log + the per-2s flood (same
discipline as the content-ack shed log). Cleared when a reclaim proceeds.
Recovery of the 3 wedged devices (2febcaa9, 1984694c, 139159eb): they SELF-HEAL on
their next reclaim attempt (~2s) once this ships — their heartbeats are ~2500s stale
(>300s settle) and liveConn=false, so the reclaim now succeeds. No operator SQL needed.
Tests (port 3988): gone device reclaims; live device still rejected; clear-on-leave
(disconnect clears liveConn -> stale device reclaims); deferral log <=1 per window.
Full suite green serial+parallel (217). reconnect-throttle.js, the dbac699 content-ack
limiter, and the 404c330 block/auth code untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Highest-priority #143 item (operator finding from Bold): nulling a device's token
did NOT lock it out — device 75c2a08a immediately reconnected and saturated the
loop. Two distinct defects:
1. Auth short-circuit (the cause). device:register used
if (device.device_token && !validateDeviceToken(...)) { reject }
so a NULL/empty STORED token made the guard falsy -> validation SKIPPED, and the
next block even MINTED a fresh token and persisted it. Nulling a token thus
RE-PROVISIONED the device instead of locking it out. Fix: drop the
`device.device_token &&` guard -> `if (!validateDeviceToken(device_id, device_token))`
(validateDeviceToken already returns false for null-stored/missing/mismatch), and
remove the legacy "mint a token for a null-token device" path (the re-provision
vector). An already-provisioned device (every row, incl. 'provisioning', is created
WITH a token) presenting null/empty/invalid is now REJECTED + disconnected.
The first-pairing seam is unaffected: a brand-new device has NO device_id and goes
through the pairing_code branch (which mints id+token) — a different code path.
2. No server-side kill switch. Added a `blocked` column (devices.blocked INTEGER
NOT NULL DEFAULT 0; schema.sql + a database.js migration). The block is the FIRST
gate at the top of device:register — before the fingerprint block, the reconnect
throttle, any DB writes, or playlist build — so a blocked device's socket is
refused immediately (auth-error 'Device blocked' + disconnect, zero further work).
It does NOT rely on null-token (the thing that failed). The row is re-read every
register, so a DIRECT SQLite edit takes effect on the device's NEXT reconnect with
NO server restart. Operator statements (dashboard-down, hand-edit):
block: UPDATE devices SET blocked = 1 WHERE id = '<device_id>';
unblock: UPDATE devices SET blocked = 0 WHERE id = '<device_id>';
Tests (port 3987): nulled-token provisioned device is REJECTED (75c2a08a repro);
blocked=1 refused at the first gate (no register/playlist); unblock reconnects;
first-pairing still works; normal valid-token device unaffected. Full suite green
serial AND parallel (213); reconnect-throttle.js + the dbac699 content-ack limiter
untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#142's content-ack dedup is insufficient: a device cycling 2-4 content IDs makes
every ack look unique so dedup never fires, while aggregate volume from ~30 devices
saturates the event loop (the #142 reconnect throttle kept the server responsive,
which is how this was even observable).
Folded ONE control on the content-ack path (no competing limiters; reconnect-
throttle.js untouched) in lib/content-ack-limiter.js:
- Step 1 — per-device RATE budget: caps TOTAL non-duplicate acks per device per
window regardless of differing content_id (the case dedup misses). Over budget =
DROP silently (the per-ack log+emit is the cost); log ONCE per device per window
when shedding starts. Keeps the #142 dedup (dedup'd repeats don't consume budget).
Per-device, in-memory, resets on restart (modeled on lastPlayLogAt; does NOT reuse
reconnect-throttle's ban-semantics bucket).
Env (TUNING GUESSES, validate vs Bold's fleet): CONTENT_ACK_MAX_PER_WINDOW=20,
CONTENT_ACK_RATE_WINDOW_MS=10000 (=2/s, above legit ~<=1/s, below the flood).
- Step 2 — global pressure valve: reuses the #142 loop-lag band (+ its hysteresis,
no second control loop). Under CRITICAL band, shed content-acks even for an
in-budget device; reconnects + dashboard/HTTP are ALWAYS processed; a healthy
device in a non-critical band is never touched by the valve. Valve open/close
logged once at the band edge in services/loop-lag.js (not per shed message).
Tests (unique ports 3985/3986, not the 3982/3983/3984 set):
- unit: the #143 regression (cycling ids evading dedup IS rate-limited), under/over
budget, dedup still works + doesn't consume budget, valve sheds in-budget under
critical while normal is untouched, rate precedence, window reset, per-device
isolation.
- integration: socket flood is capped to budget with a single shed-start log;
under-budget passes every ack; valve OPEN sheds content-acks while a reconnect +
/api/status still succeed.
Full suite green serial AND parallel (208 tests).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
device:content-ack logged + emitted every message, so a device repeatedly
reporting the same "content <id>: ready" (observed from an older app version)
added avoidable load per message.
- Suppress identical (device_id, content_id, status) reports within
config.contentAckDedupMs (default 10s), modeled on the lastPlayLogAt throttle.
A status change has a different key and passes immediately; a fresh report after
the window passes too. In-memory, resets on restart. The handler does no DB
writes, so this is purely shedding redundant log+emit work.
test: integration over a real authenticated device socket — a burst of identical
"ready" collapses to one log/emit, a "ready" after the window passes, and a status
change is never deduped. Unique PORT (3984).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Gates genuine reconnects PER DEVICE before the heavy register work (DB writes +
playlist build) runs, so a single flapping device can no longer saturate the
event loop and take down the server.
- Actuator is per-device, keyed on device_id (modeled on lastPlayLogAt). A device
is flagged only when it exceeds reconnectBaseMax genuine reconnects per window.
Same-socket playlist refreshes (isPlaylistRefresh) are exempt.
- Load-awareness is BANDED (normal/elevated/critical from the step-2 lag signal),
not a continuous controller. The band only MULTIPLIES an already-flagged
device's backoff; global lag never gates a healthy device.
- Hysteresis: escalate immediately while storming (tighten fast); decay one level
per reconnectReleaseMs of calm (release slow).
- HARD CEILING per device, independent of band and warm-up — a slow-ramp attacker
can't train through it.
- COLD START: for reconnectWarmupMs after boot, force the normal band and apply
only the hard ceiling, so a full-fleet reconnect after a deploy doesn't throttle
healthy screens. State is in-memory, resets on restart.
- Observability: every throttle engagement logs device, band, observed vs allowed
rate, and backoff. Throttled device gets device:throttled + a deferred disconnect.
Tests (api.test.js style):
- unit: healthy-never-throttled, storm-throttled-with-growing-backoff, band
multiplies backoff, hard-ceiling-even-in-warmup, warm-up leniency, neighbor
isolation, slow release.
- integration GATE (the required one): full-fleet reconnect right after restart
throttles NO healthy device; a single device storming IS throttled; a neighbor
stays unaffected while another storms.
- also fixes pre-existing test PORT collisions (my new integration files clashed
with totp.test.js:3979 and totp-keyrotation.test.js:3980 -> moved to 3982/3983);
full suite now green serially AND in parallel.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the cache/backoff loop fix (aa23cf0): make a device that can't
self-install visible to operators, and fix the signature-verify bug that kept the
whole #139 fix from engaging on the actual Fire OS target.
Dashboard surface (Phase 2):
- devices gains ota_status / ota_target_version / ota_attempts / ota_updated_at
via the idempotent ALTER TABLE ADD COLUMN migration (non-destructive,
default-backfilled, idempotent on re-run).
- The device reports ota_status (OtaThrottle.statusFor -> none | pending |
manual_update_required) in device_info; the server persists it on register
(the reconnect backstop). devices d.* already surfaces it to the dashboard.
- Dashboard shows a non-blocking amber badge when manual_update_required
("Update available (vX) - install failed N times, manual update required");
i18n key in en.js (non-en inherits via the en fallback). Server suite +1 test.
Event-driven status (Option B):
- New device:ota-status WS message, emitted on STATE TRANSITIONS only
(enter-backoff -> manual_update_required, clear -> none), so the badge updates
promptly without waiting for a reconnect and without per-poll/heartbeat chatter.
Server handler persists the same fields; an unknown/forged device_id is a safe
no-op. The register-path persist stays as the reconnect backstop.
Signature-verify fix (the critical piece):
verifyApkSignature read the downloaded APK's signer via
getPackageArchiveInfo(GET_SIGNING_CERTIFICATES).signingInfo, but that field is
null for ARCHIVE files on API 28/29 (populated only from API 30). On Fire OS 8
(Android 9 / API 28) - the actual deployment target - this returned 0 certs from
a correctly-signed APK, so every OTA was refused as "tampered," the cache was
deleted, and the full APK re-downloaded every check cycle. This was the real
cause of the #139 re-download loop, NOT a silent-install failure: the cache and
backoff added in this branch sit behind this verify gate and never engaged on
the target.
Fix: below API 30, read the archive's signer via the legacy GET_SIGNATURES +
.signatures (its v1/JAR cert, which IS populated on 28/29). Keep
GET_SIGNING_CERTIFICATES + signingInfo for API >= 30 and for the installed-app
read (which works on 28+). The archive's signer is still extracted and compared
to the installed app's signer; a mismatch or zero-cert APK is still rejected.
This reads the cert correctly on old APIs - it does not weaken verification.
Verified on emulators:
- API 28: verify now passes for a legit APK (was: 0 certs, refused). Full backoff
then engages - 8.5MB pulled once, cache-hit on retries, backoff after 3,
manual_update_required emitted once; clears on successful update.
- API 28 negative: a re-signed (different-key) APK is still refused on cert
MISMATCH - no hole opened.
- API 30: unchanged path still passes (no regression).
- server suite 173/173, OtaThrottleTest 7/7.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two device-REPORTING fixes from the #134 investigation (the PiP rendering itself
was #135).
1) "Device reconnects every ~45s" was a logging artifact, not instability. The
player re-emits a full device:register on the SAME socket every ~45-60s
(requestPlaylistRefresh) to pull a fresh playlist; the server logged
"Device reconnected" for every register of a known device. The attached 4-day
log showed 1415 "reconnected" vs 30 real socket connects and 0 heartbeat
timeouts — the socket never dropped, so #134's "PiP lost between reconnects"
was a misdiagnosis. Fix: only log a genuine reconnect (new socket); a
same-socket re-register is a refresh (currentDeviceId === device_id) and stays
quiet. The playlist still refreshes.
2) Device reported 720p while the monitor showed a 1080 signal. DeviceInfo
reported getRealMetrics() — the UI RENDER SURFACE — but TV boxes render the UI
at 720p and upscale to a 1080p HDMI signal. Now report BOTH: screen_width/height
= the output mode (Display.Mode.physicalWidth/Height), render_width/height =
the render surface (getRealMetrics). Two new nullable devices columns, stored on
pairing INSERT + reconnect UPDATE, exposed via the device API, shown on the
dashboard as "1920x1080 (UI 1280x720)" when they differ.
Backward compatible (required + verified on emulator): a device that omits
render_* — or sends no device_info at all — still registers, with render_* = null,
on both the INSERT and UPDATE paths. New columns nullable; stores use
`?? null` / `|| null`. All 167 server tests pass.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the broken/fragmented preview with a single surface that renders a
DRAFT playlist exactly as a device does, by reusing the player's renderer in a
same-origin iframe. Fixes "not all items load" (one renderer, full type union)
and inherits the player's YouTube correctness (YT.Player handshake).
Server:
- deviceSocket: extract assemblePayload() (zone-reset + canonical shape) from
buildPlaylistPayload so the device path and preview can't drift. Pure refactor
(all 149 tests green).
- playlists: GET /:id/preview-payload (requirePlaylistRead, workspace-scoped).
Draft-aware via buildSnapshotItems (live items, not published_snapshot);
derivePreviewLayout() resolves layout from the playlist's own zone-bound items
(0 zoned -> fullscreen; 1 -> use it; >1 -> dominant + ambiguous flag, never
crashes). orientation validated/passthrough; wall_config/timezone null.
Player (renderer UNTOUCHED):
- ?preview=1&playlist=ID boot branch: fetch preview-payload (same-origin Bearer
token) and call handlePlaylistUpdate(). Gated before the pairing/socket path
so the unpaired auto-connect never fires. All socket emits already guarded.
- Webpage widgets: always-visible honest note (no auto-detection — an XFO
refusal is provably indistinguishable client-side from a working embed).
Dashboard:
- playlists: Preview button + player-iframe modal with landscape/portrait toggle.
- widgets: same honest note on the existing widget preview modal (the surface the
bug was reported on).
- i18n x6 (en/es/fr/de/it/pt) + player i18n x5.
Validated end-to-end (headless Chrome + CDP): preview boots, webpage note
renders, 3-zone layout derives+renders, shape parity with device snapshot proven
on real data, auth gate returns 401. The world-readable /uploads finding is
tracked separately as #107 (not a #104 concern — same path the device uses).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each playlist item can carry schedule blocks (active days, start/end
time-of-day, optional start/end dates). An item plays when the screen's
local "now" matches at least one block; an item with no blocks always
plays. #74 covers time-of-day/day-of-week windows including overnight
wrap; #75 covers inclusive date ranges (auto-expiry). Evaluation is
on-device, so dayparting and expiry work offline.
- Shared evaluator contract: shared/schedule-vectors.json (39 vectors —
DST US+AU, overnight-wrap anchoring, timezone correctness, date
boundaries). Canonical JS evaluator in server/lib/schedule-eval.js;
Kotlin and Tizen ports kept in lockstep by drift guards (Tizen byte-diff
test, Kotlin JUnit reads the shared JSON, new android-test CI job).
- All three players (web, Android, Tizen) filter by schedule against their
own clock, idle with a "Nothing scheduled" message + 30s re-check when
everything is filtered, and fail open on any evaluator error.
- Editor: per-item schedule modal + row badge in the playlist editor;
client validation mirrors the server; editing marks the playlist draft.
- Part B (behaviour change): device/group schedule overrides now evaluate
in each device's effective timezone instead of server-local time.
- Device detail shows the reported timezone + a clock-skew warning.
- i18n for en/es/fr/de/pt across all new strings (namespaced itemsched.*
to avoid colliding with the device-schedule calendar's schedule.*).
- CHANGELOG documents the feature, the Part B change, the fail-open
guarantee, and the scheduled-single-video re-render tradeoff.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A socket reconnecting with a device_id that no longer exists in `devices`
(e.g. the row was deleted server-side) hit the device_fingerprints insert
with an unknown foreign key. INSERT OR IGNORE does NOT suppress FOREIGN KEY
violations, so it threw a caught-but-noisy "Fingerprint tracking error" on
every such reconnect. Null out an unknown device_id before the insert; a
genuinely fresh device sends no device_id and was always fine.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Server (deviceSocket buildPlaylistPayload): when a device's layout has <2 zones
(single or none), strip leftover zone_id from assignments. After switching a
device from multi-zone back to fullscreen, content was stuck bound to a gone
left/right zone_id and never played; nulling it lets both players fall back to
the default fullscreen renderer.
- Web player: render multi-zone zones BEFORE the single-item 'renderable?' bail,
so an empty/placeholder current rotation item can't blank the whole screen.
A player stuck in a tight loop (playlist with 0-second item durations)
fires device:play-event 'play_start' ~3x/sec, inserting a play_logs row
each time. Three web players doing this generated ~909k rows (99.9% with
duration_sec=0) and grew the prod DB to 265 MB.
Throttle proof-of-play inserts to at most one per device per 2s (in-memory
lastPlayLogAt map). Skipped cycles create no row; the live dashboard
progress event still fires every time, so the UI is unaffected. The
play_end UPDATE only closes open rows, so throttling play_start is safe.
(Junk rows already pruned in prod: 909k deleted, DB 265 MB -> 9.8 MB.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two dashboard-accuracy improvements for issue #3.
Disconnect debounce (5s):
- Brief transient flaps (Engine.IO ping miss, eviction-then-reconnect,
Wi-Fi blip) no longer immediately flip the device to offline in the
dashboard. Disconnect handler now defers the offline transition;
register handlers cancel the pending timer if reconnect lands in
window.
- Existing stale-disconnect guard kept as fast-path for the eviction
case (no timer scheduled at all when the active heartbeat conn is
already a different socket).
- Re-check at timer fire compares socketIds: aborts only if a
GENUINELY DIFFERENT socket reclaimed the device. Just the closing
socket's own (not-yet-cleaned-up) entry is treated as stale and
proceeds with offline transition.
- Server-restart mid-grace is handled by the heartbeat checker safety
net (existing component): any 'online' row with last_heartbeat
older than heartbeatTimeout gets marked offline on next sweep.
Truthful single-device command feedback:
- dashboard:device-command handler now checks deviceNs.adapter.rooms
for an active socket before emitting (matches the group-command
route's pattern).
- If room is empty, falls through to commandQueue.queueCommand (lazy
require - if commit C is reverted, MODULE_NOT_FOUND is cached and
every subsequent call gets consistent queued=false behavior).
- Returns three-state ack to caller: { delivered, queued, reason }.
- Server log line was misleading - now logs 'Command delivered to
device X' vs 'Command for offline device X (queued=true/false)'.
Frontend:
- sendCommand() takes optional callback. Without one, fires-and-forgets
(no behavior change for non-wired callers). With one, uses Socket.IO
.timeout(5000).emit so the callback always fires (ack or no_ack).
- Six device-detail command buttons wired to three-state toasts:
reboot, shutdown, screen_off, screen_on, launch, update.
- delivered: green/success toast (existing localized message)
- queued: amber/warning toast (new generic message)
- no_ack: red/error toast
- fallback: red/error toast
- Two callers intentionally left fire-and-forget:
- window._sendCmd (generic remote-overlay keypress/touch helper)
- enable_system_capture (has its own visual state machine; out of
scope for this commit)
Three new i18n keys (en.js only; other locales follow later):
- device.toast.command_queued
- device.toast.command_undeliverable
- device.toast.command_no_ack
Refs #3
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Short-lived per-device queue covers the TV-flap window (issue #3):
when a device is mid-reconnect, prior code emitted to an empty room
and the event vanished. Now playlist-updates and commands targeting
an offline device are queued and flushed in order on the next
device:register for that device_id.
server/lib/command-queue.js (new):
- pendingPlaylistUpdate: per-device marker (rebuild via builder on
flush -> always fresh DB state, no stale snapshots)
- pendingCommands: per-device Map<type, payload> with last-of-type
dedup (most recent screen_off wins)
- TTL via COMMAND_QUEUE_TTL_MS env (default 30000)
- Active sweep every 30s prunes expired entries
Memory bounds: ~6 entries per device worst case (1 playlist marker
+ 5 command types), unref'd sweep timer.
Wired emit sites (8 total; the four direct socket.emit calls in
deviceSocket register handlers are intentionally NOT queued because
the socket is alive by definition at those points):
- server/routes/video-walls.js (pushWallPayloadToDevice)
- server/routes/device-groups.js (pushPlaylistToDevice)
- server/routes/content.js (content-delete fan-out)
- server/routes/playlists.js (pushToDevices + assign)
- server/services/scheduler.js (scheduled rotations)
- server/ws/deviceSocket.js x2 (wall leader reclaim/reassign)
server/ws/deviceSocket.js register paths now call flushQueue after
heartbeat.registerConnection + socket.join. Existing
socket.emit('device:playlist-update', ...) lines kept - they send
the initial state on register; the flush replays any queued events.
Player's handlePlaylistUpdate fingerprint check dedupes the
overlap.
Refs #3
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix: at connect, enumerate the user's accessible workspace_ids (direct workspace_members + org_owner/admin paths + platform_admin 'all') via new accessibleWorkspaceIds() helper in lib/tenancy.js; socket.join one room per workspace. All 12 dashboardNs.emit sites across deviceSocket / heartbeat / server.js / devices route / video-walls route now route via dashboardNs.to(workspaceRoom(...)).emit() with the workspace looked up from the relevant device or wall. New lib/socket-rooms.js holds the helpers and breaks a circular dependency (dashboardSocket already requires heartbeat, so heartbeat can't require dashboardSocket).
Inbound 6 commands rewired to canActOnDevice(socket, deviceId, tier): request-screenshot is read tier (workspace_viewer+); remote-touch/key/start/stop and device-command are write tier (workspace_editor+). Platform_admin and org_owner/admin always pass via actingAs. Legacy admin/superadmin branch dropped.
Lifecycle note: workspace-switch already calls window.location.reload (Phase 3 switcher), which forces a fresh socket with updated memberships - no per-emit re-evaluation needed.
Smoke tested with 3 simultaneous socket.io-client connections (switcher-test, swninja, dw5304 platform_admin) + direct canActOnDevice invocation for 6 user/device/tier combinations. All 9 outbound isolation cells and all 6 permission gates pass. Fixture mutation: switcher-test's Field Crew membership flipped from workspace_editor to workspace_viewer to exercise the read/write tier split in one login.
Wall editor: replaces the small grid with a Figma-style pan/zoom canvas. Each
display is a rectangle that can be dragged/resized to match its physical
arrangement; a separate semi-transparent player rect overlays the screens and
defines what content plays where. Drag empty space to pan, wheel to zoom,
"Center" button auto-fits content. Per-rect numeric x/y/w/h panel; arrow keys
nudge by 1px (10px with shift). Negative coordinates supported for screens
offset above/left of the origin. Coords rounded to integers on save.
Wall rendering: each device receives screen_rect + player_rect, maps the
player into its viewport with vw/vh and object-fit:fill so vertical position
of every source pixel is identical across devices that share viewport height.
Leader emits wall:sync at 4Hz with sent_at timestamp; followers apply
latency-adjusted target and use playbackRate ±3% for sub-300ms drift,
hard-seek for >300ms. Followers stay muted; leader unmutes via gesture with
AudioContext priming and pause+play retry to bypass Firefox autoplay.
"Tap to enable audio" overlay as a final fallback.
Reconnect handling: server re-evaluates leader on device:register so the
top-left tile reclaims leadership when it returns. Followers emit
wall:sync-request on entering wall mode (incl. reconnect) so they snap to
position immediately instead of drifting until the next periodic tick.
Group dissolve: removing a device from its last group clears its playlist
to mirror wall-leave semantics. Leaving a group with playlists on remaining
groups inherits the next group's playlist.
Dashboard: walls render as their own card section (hidden the device cards
they contain). Multi-select checkboxes on cards + "Create Video Wall" toolbar
action that creates the wall, removes devices from groups, and opens the
editor. dashboard:wall-changed broadcast triggers live re-render. Per-card
playback progress bar driven by play_start events forwarded from devices.
Security: PUT /walls/:id/devices verifies caller owns each device (or has
team-owner access via the widgets pattern), preventing cross-tenant device
takeover. wall:sync and wall:sync-request validate that the sending device
is a member of the named wall; relay re-stamps device_id with currentDeviceId
so clients can't spoof or shadow-exclude peers.
Schema: video_walls += player_x/y/width/height, playlist_id;
video_wall_devices += canvas_x/y/width/height. All idempotent migrations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HIGH 1 (teams IDOR): POST/DELETE /api/teams/:id/devices now require the
caller to own the device before assigning or detaching it. Without this
check, any team member could pull any device into their team via UUID
guess and gain remote-control access.
HIGH 2 (schedules IDOR): PUT /api/schedules/:id now re-verifies
ownership of every changed target field — device_id, group_id,
content_id, widget_id, layout_id, playlist_id. Previously only the
schedule owner was checked, letting users fire arbitrary content on
victim devices via update.
HIGH 3 (filename XSS): file.originalname captured by multer bypassed
sanitizeBody. New safeFilename() wraps every INSERT path (multipart
upload, remote URL, YouTube). Frontend sinks now go through esc() in
content-library.js, device-detail.js, video-wall.js. Web player gets
an inline escHtml helper for its info overlay where filenames, device
name, and serverUrl land in innerHTML.
HIGH 4 (kiosk public XSS): config.idleTimeout is now coerced via the
existing safeNumber() helper at both interpolation sites. A crafted
value with a newline can no longer escape the JS line comment to
inject arbitrary code into the public render endpoint.
HIGH 5 (folder DoS): POST /api/folders enforces a per-user cap of 100
folders (429 on overflow). Superadmin exempt.
MED 1 (SSRF): ImageLoader.decodeUrl rejects any URL scheme other than
http(s) so a malicious remote_url can't read local files via file://.
On the server, validateRemoteUrl() is extracted and now also runs on
PUT /api/content/:id remote_url updates — previously the SSRF check
only fired on POST.
MED 2 (fingerprint takeover): the WS device:register fingerprint
reclaim path now rejects takeover while the target device is online or
within 24h of its last heartbeat. A leaked fingerprint can no longer
hijack an active display.
MED 3 (npm audit): bumped uuid 9.x -> 14.0.0 (v3/v5/v6 buffer bounds
CVE; we only use v4 so not exploitable, but clears the audit). path-
to-regexp resolved to 0.1.13 via npm audit fix. 0 vulns remaining.
MED 4 (folder admin consistency): ownedFolder() and the content.js
folder_id move check now both treat only superadmin as privileged,
matching GET /api/folders. Previously a plain "admin" could rename
or delete folders they couldn't see, and could move content into
folders they couldn't list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Server-side: when a device reconnects on a fresh socket while the old
TCP zombie is still around, the old socket's eventual disconnect handler
flipped the device offline and removed the new heartbeat entry. Now we
proactively evict any prior socket on register and ignore disconnects
from sockets that are no longer the registered one for that device_id.
Frontend: dedupe devices by id from the API response and only render
each device in the first group it belongs to (multi-group membership
is still tracked for the Manage modal).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug 1 (SW): Rewrote service worker fetch handler:
- Skip range requests (video seeking) to avoid caching partial responses
- Skip non-GET requests entirely
- Use ignoreSearch on cache match to avoid query-param misses
- Don't cache opaque cross-origin responses
- Outer catch on Cache API failures
- Don't intercept catch-all requests (let browser handle natively)
- Bump cache version to v4 to purge broken cached responses
Bug 2 (auth): Playlist refresh register was missing device_token,
causing auth rejection every 5 minutes. Fixed by including token
in the refresh-register emit. Added diagnostic logging on both
client and server for token validation failures.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Web player:
- Cache playlist JSON to localStorage on every update
- Restore and start playing immediately on boot before connecting
- Clear cache on unpair/reset
Android app:
- Cache playlist JSON to EncryptedSharedPreferences on every update
- Restore cached playlist on cold-start, play from disk-cached content
- Update cache on content deletion, clear on unpair
Server (device socket):
- Fingerprint reconnect: issue fresh token instead of rejecting
- Send device:paired on fingerprint recovery for claimed devices
- Add status logging and dashboard notification on fingerprint reconnect
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Schema: add status and published_snapshot columns to playlists table.
Migration snapshots all existing playlists as published (idempotent via schema_migrations).
Devices always receive the published_snapshot, not live playlist_items.
Edits from device-detail/groups auto-publish immediately (display updates instantly).
Edits from playlist detail page go to draft (requires explicit publish).
POST /playlists/:id/publish snapshots and pushes to all devices.
POST /playlists/:id/discard reverts playlist_items from published snapshot.
Content deletion scrubs references from all published snapshots.
Frontend: draft badge in playlist list, prominent yellow banner with publish/discard
buttons on playlist detail and device detail pages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces the assignments-table query with a playlist_items query keyed on
device.playlist_id. Also eliminates the duplicate payload builder in
scheduler.js — it now calls the shared buildPlaylistPayload.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Dashboard now organizes devices by group with colored section headers
- Group command endpoint (POST /groups/:id/command) sends to all members
- Manage modal with multi-group confirmation prompt
- Destructive commands (reboot/shutdown) require confirmation
- Ungrouped devices shown separately at bottom
- trust proxy + X-Forwarded-For for real client IPs behind Nginx
- Hide Android-only telemetry (battery/storage/RAM/CPU/WiFi) for web players
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Push playlist updates to devices instantly via WebSocket on all
assignment mutations (add, update, delete, reorder, copy)
- Fix YouTube videos skipping early: remove duration_sec timeout (was
defaulting to 10s), use generation counter to ignore stale player
callbacks, disable YouTube loop param for multi-item playlists
- Auto-fetch YouTube video title via oEmbed API when no name provided
- Show actual video duration in M:SS format in playlist instead of
misleading assignment duration_sec
- Pre-fill server URL from origin on web player setup
- Bump playlist poll interval to 5min (fallback only, push is primary)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ScreenTinker - open source digital signage management software.
MIT License, all features included, no license gates.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>