/api/update/check offered the update whenever client !== latest (raw string
inequality, not semver) with no backoff. A device that can't APPLY the update
(broken OTA client 1.7.12, signing/Fire OS) keeps reporting the same version and is
told update_available=true on every poll; a fast poll loop saturates the event loop
(prod loop-lag 49s). All requests share one NAT IP, so IP-keying is useless.
server-only breaker (lib/ota-breaker.js), two independent axes:
- RATE breaker (primary, immediate): a key checking >THRESHOLD (3) times within
WINDOW (60s) is looping -> throttle update_available with exponential backoff
(30s->2m->8m->cap 30m). Healthy devices poll ~12 min and never approach this, so
rollout/stragglers are inherently safe -- NO grace-for-flood timer; slow == safe.
- PHANTOM guard (immediate): unrecognized version, or a prerelease of an OLDER core
(superseded old-minor beta e.g. 1.9.1-beta4), gets no-offer on the first check. A
RECENT real older version (beta3 vs latest beta4; stable 1.7.12) stays offerable.
- Never offers a downgrade (client >= latest -> no offer).
KEYING (#144 option 3): keyed on device_id when present, else reported version.
- server.js:581 accepts + logs ?device_id=, passes it to the breaker.
- UpdateChecker.kt:122 appends &device_id=<config.deviceId> (existing registered id;
omitted until provisioned). One-line client change.
beta4+ clients get precise per-device throttling; stuck legacy clients sending only
?version= are caught by the version-keyed + rate + phantom logic. Response gains
additive `reason` + `retry_after_seconds` (old clients ignore).
BOUNDED STATE: a periodic sweep (startSweep, wired in server.js) evicts buckets idle
> IDLE_RESET_MS so the keyed Map can't grow unbounded (churned device_ids); not
reset-on-access only.
SCOPE (deliberate): this targets the FAST flood + phantoms. The slow #144 drip
(stable 1.7.12 polling ~every 12 min, ~20/hr) stays below >3/60s and is NOT
throttled -- catching it needs #144 option-3 "skip-this-version after N cycles",
which is intentionally NOT in this build.
NOTE: carries a CLIENT/APK change -> versionCode must increment at the beta4 bump and
the release keystore is required for the APK. The device_id path only helps devices
that can install beta4+; the stuck legacy fleet is covered by the version-keyed path.
Tests: unit (lib/ota-breaker, injected time) a-f + comparator + escalation + sweep +
slow-drip-scope; HTTP integration (real endpoint, device_id passthrough). Full suite
green serial AND parallel (234). OTA-only delta -- reconnect/reclaim/shed/content-ack/
block untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bold: screens sit on the Connect page showing the server URL = paired server-side
but never told, so the app never starts playing.
Flow / gap (Step A):
- CLIENT leaves the Connect page ONLY on the 'device:paired' event — web player
(player/index.html) hides the setup screen; Android ProvisioningActivity.onPaired
launches MainActivity + finish(). That event is the sole signal.
- SERVER pushes 'device:paired' to the device's room from POST /api/provision/pair
(server.js) at pair time — but ONLY reaches a LIVE socket then. The normal
device_id reconnect path emitted device:registered + device:playlist-update but
NOT device:paired. So a screen paired while disconnected, or that reconnects after
pairing (exactly the screens cycling on the Connect page), is paired server-side
(user_id set, receiving playlists) yet never gets device:paired -> stuck on Connect.
Fix (server-only, uses the EXISTING client listener — no client update needed, which
matters because we can't push a client update to stuck screens): on the device_id
reconnect, if the device is paired (user_id set), re-emit 'device:paired'
{device_id, name}. Push-on-pair (server.js) already covers the live-at-pair-time
case; this covers paired-then-reconnect. A paired screen now leaves Connect and
plays on its next reconnect with no client change and no manual re-pair.
Tests (port 3989, real flow): provision -> pair via /api/provision/pair (socket
closed) -> reconnect RECEIVES device:paired (+name +playlist) — the stuck-screen
repro; an unpaired device gets NO device:paired (stays on the pairing flow); the fix
reuses the existing device:paired event (no new protocol). Full suite green serial
AND parallel (220); dbac699 / 404c330 / e734281 intact.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bold beta1: three devices spam "Fingerprint reclaim rejected ... device active
(status=offline, ~2500s since heartbeat, liveConn=false)" twice/~2s indefinitely —
contradictory: gone by every signal yet treated as active.
Root cause (NOT a missing clear — corrected the hypothesis). The reject condition
was `liveConn || status==='online' || secondsSince < RECLAIM_GRACE_SECONDS(24h)`.
For the observed devices liveConn=false and status=offline, so the ONLY true term
is `secondsSince < 24h` — an effective 24h CALENDAR grace, not a stale flag. Audited
the clears: liveConn (deviceConnections) is removed on the debounced disconnect
(heartbeat.removeConnection) AND the offline_timeout sweep (deviceConnections.delete);
status is set 'offline' on both. liveConn=false + status=offline PROVE the clears
ran — there is nothing stale to clear. The 24h time gate (mislabeled "device active")
blocked a legitimately-gone device from reclaiming for up to 24h, so it retried
every ~2s forever-in-practice. The "twice per ~2s" is two reclaim ATTEMPTS per cycle
(client reconnect + re-pair-on-auth-error), each hitting the single console.warn —
not double-logging in one attempt.
Fix:
- Decide "still alive" from RUNTIME signals: `!!liveConn || secondsSince <
reclaimSettleSeconds`. A device with no live socket and a heartbeat older than the
settle window is gone -> reclaimable. A live (or just-seen) device is still
rejected, so reclaim-abuse protection holds. NOT just ignoring "active" — it fixes
WHY it was stuck (the 24h gate). RECLAIM_SETTLE_SECONDS default 300 (was 24h).
SECURITY TRADEOFF flagged in config: shortens the anti-fingerprint-theft window;
raise to re-tighten. Tuning guess to validate vs Bold.
- Log throttle: the deferral logs at most once per device per RECLAIM_REJECT_LOG_
WINDOW_MS (default 60s) — collapses the double-log + the per-2s flood (same
discipline as the content-ack shed log). Cleared when a reclaim proceeds.
Recovery of the 3 wedged devices (2febcaa9, 1984694c, 139159eb): they SELF-HEAL on
their next reclaim attempt (~2s) once this ships — their heartbeats are ~2500s stale
(>300s settle) and liveConn=false, so the reclaim now succeeds. No operator SQL needed.
Tests (port 3988): gone device reclaims; live device still rejected; clear-on-leave
(disconnect clears liveConn -> stale device reclaims); deferral log <=1 per window.
Full suite green serial+parallel (217). reconnect-throttle.js, the dbac699 content-ack
limiter, and the 404c330 block/auth code untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Highest-priority #143 item (operator finding from Bold): nulling a device's token
did NOT lock it out — device 75c2a08a immediately reconnected and saturated the
loop. Two distinct defects:
1. Auth short-circuit (the cause). device:register used
if (device.device_token && !validateDeviceToken(...)) { reject }
so a NULL/empty STORED token made the guard falsy -> validation SKIPPED, and the
next block even MINTED a fresh token and persisted it. Nulling a token thus
RE-PROVISIONED the device instead of locking it out. Fix: drop the
`device.device_token &&` guard -> `if (!validateDeviceToken(device_id, device_token))`
(validateDeviceToken already returns false for null-stored/missing/mismatch), and
remove the legacy "mint a token for a null-token device" path (the re-provision
vector). An already-provisioned device (every row, incl. 'provisioning', is created
WITH a token) presenting null/empty/invalid is now REJECTED + disconnected.
The first-pairing seam is unaffected: a brand-new device has NO device_id and goes
through the pairing_code branch (which mints id+token) — a different code path.
2. No server-side kill switch. Added a `blocked` column (devices.blocked INTEGER
NOT NULL DEFAULT 0; schema.sql + a database.js migration). The block is the FIRST
gate at the top of device:register — before the fingerprint block, the reconnect
throttle, any DB writes, or playlist build — so a blocked device's socket is
refused immediately (auth-error 'Device blocked' + disconnect, zero further work).
It does NOT rely on null-token (the thing that failed). The row is re-read every
register, so a DIRECT SQLite edit takes effect on the device's NEXT reconnect with
NO server restart. Operator statements (dashboard-down, hand-edit):
block: UPDATE devices SET blocked = 1 WHERE id = '<device_id>';
unblock: UPDATE devices SET blocked = 0 WHERE id = '<device_id>';
Tests (port 3987): nulled-token provisioned device is REJECTED (75c2a08a repro);
blocked=1 refused at the first gate (no register/playlist); unblock reconnects;
first-pairing still works; normal valid-token device unaffected. Full suite green
serial AND parallel (213); reconnect-throttle.js + the dbac699 content-ack limiter
untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#142's content-ack dedup is insufficient: a device cycling 2-4 content IDs makes
every ack look unique so dedup never fires, while aggregate volume from ~30 devices
saturates the event loop (the #142 reconnect throttle kept the server responsive,
which is how this was even observable).
Folded ONE control on the content-ack path (no competing limiters; reconnect-
throttle.js untouched) in lib/content-ack-limiter.js:
- Step 1 — per-device RATE budget: caps TOTAL non-duplicate acks per device per
window regardless of differing content_id (the case dedup misses). Over budget =
DROP silently (the per-ack log+emit is the cost); log ONCE per device per window
when shedding starts. Keeps the #142 dedup (dedup'd repeats don't consume budget).
Per-device, in-memory, resets on restart (modeled on lastPlayLogAt; does NOT reuse
reconnect-throttle's ban-semantics bucket).
Env (TUNING GUESSES, validate vs Bold's fleet): CONTENT_ACK_MAX_PER_WINDOW=20,
CONTENT_ACK_RATE_WINDOW_MS=10000 (=2/s, above legit ~<=1/s, below the flood).
- Step 2 — global pressure valve: reuses the #142 loop-lag band (+ its hysteresis,
no second control loop). Under CRITICAL band, shed content-acks even for an
in-budget device; reconnects + dashboard/HTTP are ALWAYS processed; a healthy
device in a non-critical band is never touched by the valve. Valve open/close
logged once at the band edge in services/loop-lag.js (not per shed message).
Tests (unique ports 3985/3986, not the 3982/3983/3984 set):
- unit: the #143 regression (cycling ids evading dedup IS rate-limited), under/over
budget, dedup still works + doesn't consume budget, valve sheds in-budget under
critical while normal is untouched, rate precedence, window reset, per-device
isolation.
- integration: socket flood is capped to budget with a single shed-start log;
under-budget passes every ack; valve OPEN sheds content-acks while a reconnect +
/api/status still succeed.
Full suite green serial AND parallel (208 tests).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brings the full #142 stack onto main on top of the 1.9.1 stable cut:
- device_status_log index + de-dupe
- event-loop lag telemetry (bounded)
- load-aware per-device reconnect throttle (the outage fix)
- global device_status_log retention sweep (STATUS_LOG_RETENTION_DAYS)
- content-ack dedup
- provisioning-row cleanup window 365d -> 24h
services/heartbeat.js deleted unclaimed provisioning devices with
created_at < now - (365 * 86400) — a YEAR — while its own comment said "older
than 24 hours". So socket-register pairing junk lingered ~365x longer than
intended. Change the window to 24 * 3600 to match the comment.
Correctness fix only — does NOT touch the pre-auth register path or add a rate
limiter (that pre-auth hardening is a separate security issue, out of this cut).
Extracted the sweep into pruneProvisioningDevices() (still in heartbeat.js, called
from the same interval) so it is unit-testable. Test asserts a >24h unclaimed
provisioning row is swept while a <24h row, an imported row (user_id set), and a
non-provisioning row are kept.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
device:content-ack logged + emitted every message, so a device repeatedly
reporting the same "content <id>: ready" (observed from an older app version)
added avoidable load per message.
- Suppress identical (device_id, content_id, status) reports within
config.contentAckDedupMs (default 10s), modeled on the lastPlayLogAt throttle.
A status change has a different key and passes immediately; a fresh report after
the window passes too. In-memory, resets on restart. The handler does no DB
writes, so this is purely shedding redundant log+emit work.
test: integration over a real authenticated device socket — a burst of identical
"ready" collapses to one log/emit, a "ready" after the window passes, and a status
change is never deduped. Unique PORT (3984).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The per-device insert-time prune (deviceSocket.js) only ever touches a device
that is actively inserting, so it misses two paths: removed/idle devices whose
rows linger forever, and heartbeat.js's offline_timeout insert that bypasses
logDeviceStatus entirely. The reporter's 1.2M-row bloat accumulated UNDER a 7-day
per-device prune for exactly this reason.
- pruneStatusLog() (db/database.js): a GLOBAL time-range sweep across ALL devices,
modeled on the play_logs prune. Run once on startup (recovers a bloated table
right after deploy) and on the heartbeat interval (services/heartbeat.js).
- STATUS_LOG_RETENTION_DAYS env, default 3 (lower than the old hardcoded 7d; the
dashboard only shows a 24h uptime window, so 2-3d is ample for diagnostics).
- Deliberately NO per-device row cap: Step 3's throttle already bounds how fast a
storming device can generate status rows, so a cap would add sweep complexity
for little gain (noted for later if needed).
- NO VACUUM / auto_vacuum here (kept off the hot path); space reclaim is left as a
separate decision (see report).
test: deterministic in-process unit test proves the sweep deletes over-retention
rows across all devices — including a device absent from the devices table and an
offline_timeout row — while keeping recent rows; idempotent on an empty table.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Gates genuine reconnects PER DEVICE before the heavy register work (DB writes +
playlist build) runs, so a single flapping device can no longer saturate the
event loop and take down the server.
- Actuator is per-device, keyed on device_id (modeled on lastPlayLogAt). A device
is flagged only when it exceeds reconnectBaseMax genuine reconnects per window.
Same-socket playlist refreshes (isPlaylistRefresh) are exempt.
- Load-awareness is BANDED (normal/elevated/critical from the step-2 lag signal),
not a continuous controller. The band only MULTIPLIES an already-flagged
device's backoff; global lag never gates a healthy device.
- Hysteresis: escalate immediately while storming (tighten fast); decay one level
per reconnectReleaseMs of calm (release slow).
- HARD CEILING per device, independent of band and warm-up — a slow-ramp attacker
can't train through it.
- COLD START: for reconnectWarmupMs after boot, force the normal band and apply
only the hard ceiling, so a full-fleet reconnect after a deploy doesn't throttle
healthy screens. State is in-memory, resets on restart.
- Observability: every throttle engagement logs device, band, observed vs allowed
rate, and backoff. Throttled device gets device:throttled + a deferred disconnect.
Tests (api.test.js style):
- unit: healthy-never-throttled, storm-throttled-with-growing-backoff, band
multiplies backoff, hard-ceiling-even-in-warmup, warm-up leniency, neighbor
isolation, slow release.
- integration GATE (the required one): full-fleet reconnect right after restart
throttles NO healthy device; a single device storming IS throttled; a neighbor
stays unaffected while another storms.
- also fixes pre-existing test PORT collisions (my new integration files clashed
with totp.test.js:3979 and totp-keyrotation.test.js:3980 -> moved to 3982/3983);
full suite now green serially AND in parallel.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Continuously samples event-loop delay via perf_hooks.monitorEventLoopDelay()
(C++-backed histogram; cheap). Each window persists mean/p50/p99/max to a new
event_loop_lag table and recomputes a coarse load band (normal/elevated/critical)
from the window p99. Standalone value: current lag is exposed on /api/status and
band changes are logged, so site lag is diagnosable independent of throttling.
The band feeds the #142 reconnect throttle (next commit) but ships first as its
own subsystem.
- event_loop_lag is bounded from day one: indexed on sampled_at + scheduled prune
(LAG_TELEMETRY_RETENTION_DAYS, small default) modeled on the play_logs prune.
Deliberately NOT another unbounded-growth table.
- Band transitions are asymmetric: jump up immediately (tighten fast), release one
level at a time after N calm samples below a deadband (release slow, no flap).
Pure nextBand() function, unit-tested deterministically.
- config: LAG_SAMPLE_INTERVAL_MS, LAG_RESOLUTION_MS, LAG_TELEMETRY_RETENTION_DAYS,
LAG_PRUNE_INTERVAL_MS, LAG_ELEVATED_MS, LAG_CRITICAL_MS, LAG_RELEASE_SAMPLES.
- tests: band-transition unit tests; integration proves sampling persists, stays
bounded under the prune, and surfaces on /api/status.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The dashboard uptime query (WHERE device_id=? AND timestamp>?) and the
per-device retention prune (WHERE device_id=? AND timestamp<?) were both full
table scans. At 1M+ rows (the outage report) this was the dashboard-degradation
cause that persisted even after the reconnect storm stopped.
- schema.sql: add idx_device_status_log_device_ts(device_id, timestamp); both
queries now SEARCH ... USING INDEX instead of SCAN (verified via EXPLAIN).
- database.js: same index as a migration for existing DBs (idempotent).
- schema.sql defined device_status_log twice; drop the duplicate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A mute toggle wrote the draft playlist_items + emitted a live device:mute-changed but only markDraft()'d — it never updated playlists.published_snapshot, the copy the device actually plays. So the device's item.muted stayed 0 and every loop/reload re-applied full volume: dashboard icon red but audio kept playing (Android; web's native <video> loop masked it). emitMuteChanged now surgically patches the matching item's muted (0/1) inside the published_snapshot and re-pushes the playlist, so loops re-apply the correct flag. Surgical patch (not publishPlaylist) so a mute toggle can't prematurely publish other draft edits or flip publish state. Adds a regression test that fails without the patch.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the cache/backoff loop fix (aa23cf0): make a device that can't
self-install visible to operators, and fix the signature-verify bug that kept the
whole #139 fix from engaging on the actual Fire OS target.
Dashboard surface (Phase 2):
- devices gains ota_status / ota_target_version / ota_attempts / ota_updated_at
via the idempotent ALTER TABLE ADD COLUMN migration (non-destructive,
default-backfilled, idempotent on re-run).
- The device reports ota_status (OtaThrottle.statusFor -> none | pending |
manual_update_required) in device_info; the server persists it on register
(the reconnect backstop). devices d.* already surfaces it to the dashboard.
- Dashboard shows a non-blocking amber badge when manual_update_required
("Update available (vX) - install failed N times, manual update required");
i18n key in en.js (non-en inherits via the en fallback). Server suite +1 test.
Event-driven status (Option B):
- New device:ota-status WS message, emitted on STATE TRANSITIONS only
(enter-backoff -> manual_update_required, clear -> none), so the badge updates
promptly without waiting for a reconnect and without per-poll/heartbeat chatter.
Server handler persists the same fields; an unknown/forged device_id is a safe
no-op. The register-path persist stays as the reconnect backstop.
Signature-verify fix (the critical piece):
verifyApkSignature read the downloaded APK's signer via
getPackageArchiveInfo(GET_SIGNING_CERTIFICATES).signingInfo, but that field is
null for ARCHIVE files on API 28/29 (populated only from API 30). On Fire OS 8
(Android 9 / API 28) - the actual deployment target - this returned 0 certs from
a correctly-signed APK, so every OTA was refused as "tampered," the cache was
deleted, and the full APK re-downloaded every check cycle. This was the real
cause of the #139 re-download loop, NOT a silent-install failure: the cache and
backoff added in this branch sit behind this verify gate and never engaged on
the target.
Fix: below API 30, read the archive's signer via the legacy GET_SIGNATURES +
.signatures (its v1/JAR cert, which IS populated on 28/29). Keep
GET_SIGNING_CERTIFICATES + signingInfo for API >= 30 and for the installed-app
read (which works on 28+). The archive's signer is still extracted and compared
to the installed app's signer; a mismatch or zero-cert APK is still rejected.
This reads the cert correctly on old APIs - it does not weaken verification.
Verified on emulators:
- API 28: verify now passes for a legit APK (was: 0 certs, refused). Full backoff
then engages - 8.5MB pulled once, cache-hit on retries, backoff after 3,
manual_update_required emitted once; clears on successful update.
- API 28 negative: a re-signed (different-key) APK is still refused on cert
MISMATCH - no hole opened.
- API 30: unchanged path still passes (no regression).
- server suite 173/173, OtaThrottleTest 7/7.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Devices that download an OTA APK but cannot silently install it (Fire TV: no
device-owner path) re-downloaded the full APK every check cycle indefinitely -
install never completes, version never advances, next check re-triggers.
Client (UpdateChecker.kt, ServerConfig.kt, OtaThrottle.kt):
- Reuse a cached, signature-verified APK instead of re-downloading every cycle;
delete leftover invalid files; keep the verified APK on disk as the
manual-install artifact.
- Persisted per-version attempt budget (EncryptedSharedPreferences) so it
survives the Fire OS app restarts that drive the loop. An attempt is counted
only when an install is launched - a download/verify failure does not consume
the budget, so a transient network problem cannot park a healthy device in
backoff. After 3 failed installs, back off to one retry per 24h.
- Clear OTA state and caches when a check returns update_available=false while
state is pending (app relaunched as the new version).
- Report OTA status to the dashboard via device:log (tag ota) on state
transitions only (enter-backoff, clear) to avoid flooding the channel.
- Extract throttle decision logic into a pure OtaThrottle object (no Android
deps) with JUnit coverage (OtaThrottleTest) for the state transitions.
Server (server.js):
- Reword /download/apk log from "OTA update in progress" to "APK served" and
rate-limit to once per IP / 10 min so a looping device cannot flood the log.
Note: client-cooperative fix - prevents the loop in cohorts running this APK.
Currently-stuck beta4 devices still require a one-time manual update.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Version bump for the beta6 strobe artifact: VERSION, server package + lock, and
the Android client (versionName 1.9.1-beta6, versionCode 26 so OTA sees it as an
upgrade over beta5/25). Also removes two leftover untracked PiP test scratch files
(frontend/alert-overlay.html, frontend/overlay.js) from a prior session.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
captureAndSend() grabbed a single querySelector('video'|'img') and stretched it
across a fixed 960x540 canvas, so multi-zone Now-Playing screenshots and the 1fps
remote stream showed one zone stretched fullscreen instead of the actual layout.
- Multi-zone layouts now composite each zone from its REAL rendered geometry
(getBoundingClientRect relative to the container, scaled proportionally onto the
canvas), so positions/sizes stay true to the layout.
- Canvas height derives from the container aspect (not a hardcoded 540); media is
drawn honouring its object-fit (cover/contain/fill) instead of being stretched.
- Cross-origin / iframe zones (YouTube, widgets) can't be read back without
CORS-tainting the whole canvas (which makes toDataURL throw and kills the entire
capture). They now get a deliberate, labelled placeholder ("YouTube"/"Widget"/
"Video") so the shot still shows the layout structure with that zone marked,
instead of a transparent hole or a failed capture.
- Split rendering into renderCaptureCanvas() (socket-free, headlessly verifiable)
and captureAndSend() (encode + emit). One full-quality path serves BOTH the
on-demand screenshot and the 1fps stream — the composite is only a few drawImage
calls over already-decoded media, so no separate low-quality stream path.
Web player only; Android (view.draw already composites correctly) untouched.
Verified headlessly on a 3-zone device: red/green image zones render in their
correct positions, the YouTube zone shows a labelled placeholder, and the capture
succeeds with no CORS taint.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two independent multi-zone bugs, plus operator-facing warnings, i18n, and
regression tests guarding the data contracts.
Bug 1 — per-item mute was a no-op end to end:
- GET /api/devices/:id dropped the `muted` column from its assignments SELECT,
so the dashboard toggle never reflected state (the muted=false case in
particular). Column restored to the device payload.
- Android player now honours the per-item mute flag for YouTube (initial state
+ live via the IFrame JS API).
Bug 2 — items whose zone_id belongs to a different layout were silently dropped:
- Player fallback (web + Android): an orphaned zone_id is recovered into the
largest zone instead of vanishing, with telemetry.
- server/lib/zone-validate.js is the single source of truth for the orphan rule
(zone not in the device's active layout); used by the device payload
(per-item `orphan` flag + `active_layout_zones`) and the device list
(`orphan_count`).
- Assign-time hardening: a stale zone_id (not in the device's active layout) is
cleared to null on POST/PUT rather than persisted as a new orphan.
- scripts/find-orphan-zone-items.js: read-only sweep for existing orphans.
Dashboard warnings (operator-facing, never on the live player):
- Per-item badge + reassign affordance, device-list glance, preview banner.
- Graceful degradation: the zone selector falls back to /api/layouts/:id so it
can't vanish on a stale payload.
i18n: orphan-zone strings added to en/es/fr/de/pt/it (hi falls back by design;
count strings interpolate through tn()).
Tests: server/test/device-zone-contract.test.js adds 5 regression tests for the
data contracts above (muted true/false round-trip, active_layout_zones, orphan
flag + count, orphan-clears-on-reassign, assign-time clearing). 172/172 pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two device-REPORTING fixes from the #134 investigation (the PiP rendering itself
was #135).
1) "Device reconnects every ~45s" was a logging artifact, not instability. The
player re-emits a full device:register on the SAME socket every ~45-60s
(requestPlaylistRefresh) to pull a fresh playlist; the server logged
"Device reconnected" for every register of a known device. The attached 4-day
log showed 1415 "reconnected" vs 30 real socket connects and 0 heartbeat
timeouts — the socket never dropped, so #134's "PiP lost between reconnects"
was a misdiagnosis. Fix: only log a genuine reconnect (new socket); a
same-socket re-register is a refresh (currentDeviceId === device_id) and stays
quiet. The playlist still refreshes.
2) Device reported 720p while the monitor showed a 1080 signal. DeviceInfo
reported getRealMetrics() — the UI RENDER SURFACE — but TV boxes render the UI
at 720p and upscale to a 1080p HDMI signal. Now report BOTH: screen_width/height
= the output mode (Display.Mode.physicalWidth/Height), render_width/height =
the render surface (getRealMetrics). Two new nullable devices columns, stored on
pairing INSERT + reconnect UPDATE, exposed via the device API, shown on the
dashboard as "1920x1080 (UI 1280x720)" when they differ.
Backward compatible (required + verified on emulator): a device that omits
render_* — or sends no device_info at all — still registers, with render_* = null,
on both the INSERT and UPDATE paths. New columns nullable; stores use
`?? null` / `|| null`. All 167 server tests pass.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#109): render Android PiP overlay above the YouTube WebView video plane
The PiP overlay (#109) returned sent:1 and showed its title in `uiautomator
dump`, but nothing painted on screen while YouTube was playing. By elimination
(YouTube-specific, landscape so no off-screen transform, real on-screen bounds
in the dump) the cause is surface occlusion: pipLayout sat as the last child of
rootLayout — the SAME compositing band as R.id.youtubeWebView — so the playing
video surface drew over it.
Fix (task option 1a): reparent pipLayout out of rootLayout to the window
content (android.R.id.content) as a top-level sibling drawn after rootLayout, so
it composites above the WebView. MainActivity.mirrorTransformToPip() copies
rootView's orientation/wall transform onto it so corner positions still track
the rotated content (web/Tizen parity). show() also bringToFront()+
requestLayout()+invalidate() on attach (covers the cause-3 measure/visibility
path). Remote-view screenshots now capture the content root so the PiP is still
included.
Instrumentation (Phase 1, default OFF): PipOverlay.pipDebug paints a solid
magenta box + border with media on top (box paints even if media never loads)
and logs box/pipLayout/rootView/youtubeWebView geometry over device:log tag
"pip"; loadImageInto also logs on success. Toggled via device:command
{type:"pip_debug"} (routed through MainActivity.onCommand).
Server: POST /api/pip and the clear handler log one concise [pip] dispatch line
(target + sent/offline) so journalctl shows PiP activity.
Validated end-to-end on an emulator (pixel10/API34) paired to an isolated local
server with YouTube playing: no crash, the PiP box composites above the live
video frame (center + top-right), clear removes it, and the portrait transform
mirror rotates the overlay with the stage (no off-screen). The Fire TV
hardware-overlay punch-through still needs real hardware (emulator composites
video inline); pipDebug + docs/109-android-pip-visibility.md cover that.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#109): image PiPs never painted — set slot token before decode
Emulator e2e of an image PiP (a QR PNG) found the image area always blank (box
background + title only). Pre-existing defect, also on main, independent of the
occlusion reparent.
Root cause in PipOverlay.show(): teardown() clears `current` to null, then
loadImageInto() captured `token = current` (null) as its drop-if-replaced guard,
but `current` was set to the new pip_id AFTER the media was built. The image
decode finishes on a background thread and posts back after show() returns, so
`token != current` (null != pip_id) was always true and every decoded bitmap was
dropped. Web PiPs and the box/title were unaffected, which masked it.
Fix: set `current = pip_id` before building media so loadImageInto's token
matches. Verified on emulator — a QR image PiP now renders over both a static
image and live YouTube (hardware screencap + the app's software view.draw
capture both show it).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(#109): record web PiP (HTML+JS) verification on emulator
Web PiP type loads its WebView and executes JS (a page stamping JS OK · <time>
rendered over live YouTube). No code change — web PiPs don't use the image path
that had the token bug. Completes the image/web/box content-type verification.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(#109): implement PiP close_button on Android (was a documented no-op)
The server forwarded close_button (routes/pip.js) and it's in openapi.yaml, but
no player rendered it — Tizen deferred "close-button focus" as non-MVP, the web
player has none, and Android's PipOverlay never read the flag. So the documented
field did nothing on any device.
Implement it on Android: when close_button:true, a tappable ✕ floats at the box's
top-right in a FrameLayout wrapper that is a SIBLING of the box — so it isn't
clipped by the box outline or dimmed by the overlay opacity. Tapping it clears
THIS overlay (id-matched via the captured token). Only the ✕ is clickable; the
rest of the full-screen pipLayout stays touch-transparent, so taps elsewhere
fall through to the playing content (no input regression).
Verified on the emulator over live YouTube: the ✕ renders at the corner, and
tapping it removes the overlay while the video keeps playing.
Parity note: web/Tizen players still don't implement close_button; D-pad focus
of the ✕ on non-touch TV hardware is intentionally not wired (MVP = touch/pointer,
matching the Tizen focus deferral).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The PiP endpoints and the per-item mute field shipped without OpenAPI coverage.
- openapi.yaml: add POST /pip (show), DELETE /pip + POST /pip/clear (clear), all
x-required-scope: full; add the `muted` boolean to PUT /assignments/{id}; add a `pip` tag.
- openapi-contract.test.js: the scope heuristic only treated `command` paths as full-scope,
so a full-scope non-command route (/pip) would fail it — extend it to recognize /pip.
Docs-only as far as the running build goes (no route/behavior change). Lands on main; not
in the frozen v1.9.1-beta4 tag — ships in the next tag.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(server): proxy remote YouTube thumbnails instead of ENOENT on a local path
YouTube content stores thumbnail_path as a REMOTE URL
(https://img.youtube.com/vi/<id>/hqdefault.jpg), but the thumbnail-serving route
path.resolve'd it into contentDir -> a local file that never existed -> ENOENT logged
a few times a minute (the tester-log spam). Recreating content didn't help (new rows
store the same remote URL).
- GET /api/content/:id/thumbnail now proxies a remote http(s) thumbnail_path
server-side (same-origin, so dashboard CSP img-src is unaffected) via a non-throwing
helper: upstream 404 -> 404, other failure/timeout -> 502, image/* only (modest SSRF
hardening; the URL is server-set at ingest). Local thumbnails keep the sendFile path;
the playlist/widget/workspace access gating is unchanged for both branches.
- routes/widgets.js inlineUserContent skips the disk read for a remote thumbnail and
leaves the /api/content/:id/thumbnail reference in place (the proxy serves it).
- routes/content.js ingest unchanged; a comment notes the future download-at-ingest +
backfill option for CDN independence.
- New test/thumbnail-proxy.test.js: local sendFile still works; a remote thumbnail is
proxied (mock upstream, no local read, no ENOENT); upstream 404 -> clean 404. Full
server suite 164/164.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(server): boot banner shows the real version, not a hardcoded v1.2.0
The startup ASCII banner printed "ScreenTinker Server v1.2.0". Use the already-imported
VERSION (require('./version'), the single source of truth that reads the root VERSION
file) in a fixed-width field (VERSION.padEnd(22).slice(0, 22) — the same padEnd
discipline the port line uses) so the fixed-width box border stays aligned for any
version length. No other behavior changes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(server): persist + ship + real-time per-item mute (#129)
The dashboard mute toggle was a no-op end to end. The active model is playlist_items
(the device payload is its published_snapshot); the legacy `assignments` table the bug
report cited is unused for devices. Three breaks:
- PUT /api/assignments/:id silently dropped `muted` (only read sort_order/duration_sec/
zone_id). It now accepts muted (coerced 0/1) and ITEM_SELECT returns it, so the toggle
persists and its on/off state sticks.
- playlist_items had no `muted` column — added (schema + idempotent migration).
- buildSnapshotItems didn't select muted, so it never reached the published_snapshot /
device payload — now included.
Real-time: on a mute change, emit device:mute-changed { content_id, widget_id, muted } to
every device on that playlist so the player toggles the matching item's volume live,
decoupled from publish (the value is also in the next snapshot, so it persists). Adds a
[mute] log line (the report noted zero mute log entries).
Test: test/mute.test.js — PUT persists + returns muted, it reaches the published
snapshot, and a non-mute update doesn't reset it. Server suite 164/164.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(player): apply per-item mute live on Android + web (#129)
Honor the new per-item mute from the server, both in real time and on reload.
Android:
- WebSocketService: onMuteChanged callback + main-thread device:mute-changed handler.
- MediaPlayerManager.setVideoMuted(): flips the live ExoPlayer volume on the current
video (YouTube autoplays muted; images/widgets are silent).
- MainActivity: on device:mute-changed, apply immediately if the toggled item is the
one playing now.
- PlaylistController.sig(): include muted so a published mute change re-renders/persists
instead of being de-duped.
Web player (server/player/index.html):
- device:mute-changed handler toggles the current <video>; the video mount now also
honors item.muted so a published mute sticks across reloads.
Tizen intentionally not included: its player mutes ALL video for autoplay, so per-item
unmute isn't achievable there.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* PiP overlay MVP: push image/web overlays to a device or group (#109)
Implements the #109 MVP from docs proposal: a floating overlay PUSHED to a device or
group in real time, rendered above the playlist without disturbing it. Scope is the
MVP only — video/RTSP, MQTT, offline-queue, and the priority/stacking system are
deferred to follow-up PRs as the proposal specifies.
Protocol (/device socket, player-agnostic):
- device:pip-show { pip_id, type:image|web, uri, position, width, height, duration,
title?, title_color?, background_color?, opacity?, border_radius?, close_button? }
- device:pip-clear { pip_id? }
The player fetches uri itself (same trust model as remote_url content; server never
proxies). type:web is full-trust by design, hence the 'full' token scope.
Server (server/routes/pip.js, new; mounted in config/api-surface.js PUBLIC_ROUTERS):
- POST /api/pip and POST /api/pip/clear + DELETE /api/pip, all requireScope('full').
- Resolves device_id to a device OR a group, expands a group to members, and emits
per-device — reusing the group command route's room-size online check and
{device_id, name, status: sent|offline} result shape. Generates pip_id.
- Validates type/position allowlists, uri http(s), numeric bounds on
width/height/duration/opacity/border_radius, colors via the existing VALID_COLOR
(#RRGGBB; transparency is the separate opacity field).
- Workspace-isolated: every target query is scoped to req.workspaceId, so a token
bound to workspace A can't address workspace B (404). Offline devices are reported,
never queued (PiP is ephemeral).
Player overlay layer (Tizen; tizen/js/pip-overlay.js, new):
- A #pip sibling ABOVE #stage that PlaylistPlayer/ZoneRenderer never touch.
- applyOrientation now applies the SAME transform to #pip as #stage, so corner
positions track the visible CONTENT in all four orientations.
- image -> <img>, web -> <iframe> (muted by default: empty allow= denies autoplay),
sized/positioned/styled per payload, optional title bar.
- Single overlay slot, last-show-wins; duration timer (0 = until cleared); pip-clear
(id-aware) or timer tears down; teardown wrapped so a malformed payload can't wedge
the layer. Reports show/clear over device:log (tag 'pip').
Dashboard: a minimal "Send overlay" / "Clear overlay" tester on the device-detail
controls (device/group via the open device, type, uri, position, duration), calling
POST /api/pip through the api helper.
Tests (server suite green, 161/161):
- api.test.js: PiP tier — authz (read/write 403, full passes), workspace isolation
(wsA token -> wsB device 404), payload validation, device + group targeting, clear;
plus the PUBLIC_ROUTERS snapshot-firewall updated for /api/pip.
- pip-overlay.test.js: loads the real player.js + pip-overlay.js in a vm with a DOM
shim; proves the overlay shows, auto-dismisses on the duration timer, and never
changes the playlist signature / touches #stage; web->iframe, last-show-wins,
id-aware clear, malformed-payload safety.
Not in this PR (intentional):
- Android player overlay — fast-follow. Protocol + server are player-agnostic; the
Android layer (an overlay View above the player, orientation-matched to MainActivity's
rootView rotation) is the same shape and lands next.
- OpenAPI docs for POST /api/pip — the contract test's scope heuristic only treats
'command' paths as full-scope, so documenting a full-scope non-command route there
needs that heuristic extended first; deferred with the docs item (proposal §8.6).
- video/rtsp types, MQTT, offline queue-on-reconnect, priority/stacking, arbitrary
(x,y)/selector positioning (proposal §6).
Refs #109
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* PiP overlay: add Android + web players (#109)
Extends the #109 PiP MVP to the other two players so the protocol (device:pip-show /
device:pip-clear) is honored fleet-wide, not just on Tizen. No server/protocol changes —
the route and socket messages are player-agnostic; these are the two missing surfaces.
Web player (server/player/index.html):
- New #pipContainer layer above #playerContainer, pointer-transparent, that the playlist
render never touches. The same orientation transform is applied to it as to
#playerContainer (extended to also reset width/height on landscape so a
portrait->landscape switch realigns), so corner positions track the visible content.
- Inline PiP logic mirroring tizen/js/pip-overlay.js: image -> <img>, web -> <iframe>
(muted by default via empty allow=), position/size/bg/opacity/radius/title, single slot
last-show-wins, duration timer (0 = until cleared), id-aware clear, wrapped teardown.
- device:pip-show/clear handlers; reports show/clear over device:log (tag "pip").
Android player:
- activity_main.xml: a pipLayout FrameLayout as the LAST child of rootLayout — it draws
above the content AND inherits rootView's orientation rotation/translation, so corner
positioning is orientation-matched for free.
- PipOverlay.kt (new): builds the overlay box into pipLayout. image -> ImageView (decoded
off-thread via ImageLoader, dropped if torn down mid-decode); web -> WebView with
mediaPlaybackRequiresUserGesture=true (mute-by-default). Gravity-based corner/center
placement with a 4% inset, GradientDrawable bg + corner radius, alpha=opacity, optional
title bar. Single slot last-show-wins; duration timer; id-aware clear; teardown wrapped
and also run on activity destroy (WebView cleanup).
- WebSocketService: onPipShow/onPipClear callbacks + safeOn handlers posted to the main
thread (they build Views) + a sendLog(tag, level, message) emitter for device:log.
- MainActivity: instantiate PipOverlay (log -> wsService.sendLog("pip", ...)), wire the
callbacks, tear down on destroy.
Verified: Android assembleDebug builds clean; web player inline JS parses; server suite
still 161/161 (no server changes this commit). Not yet validated on real hardware —
four-orientation corner positioning mirrors the player container/rootView transform but
should be eyeballed on a panel.
Refs #109
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Opt-in, default-off UI gate (per strobe's spec; verified his file refs first).
When set, hides the Subscription sidebar item + billing view and bounces
#/billing to the dashboard. Billing shown by default -> existing deployments
unchanged. UI-only: /api/subscription/* untouched (internal usage reads stay).
- config.js: config.hideBilling from HIDE_BILLING (mirrors selfHosted).
- auth.js: surface hide_billing on GET /api/auth/me (client already fetches it
at boot, stored on the user object).
- index.html: id="billingNavItem" on the Subscription <li> (mirrors adminNavItem).
- app.js: toggle billingNavItem in updateSidebarUser (next to the admin toggle);
guard #/billing -> history.replaceState('#/') + render dashboard (replaceState
so the back button doesn't loop into the guard).
- .env.example + README documented.
Spec assumptions verified against code: adminNavItem toggle pattern exists;
/me is fetched at boot and updateSidebarUser runs both at boot (cached user)
and post-/me, so no-flash holds on warm loads (one-time flash possible on the
first load after the flag flips — same as the admin nav, minor); route dispatch
is an if/else chain. Nav label is static (no data-i18n) so no i18n change.
Validated (headless Chrome, both states):
- flag unset -> Subscription tab present, #/billing renders (backward-compat).
- HIDE_BILLING=true -> tab hidden, #/billing redirects to #/.
- config maps HIDE_BILLING both ways; live /me default hide_billing=false.
- 149 server tests green. Default-off = zero change for existing deployments.
Known cosmetic (harmless): after the redirect the billing nav LINK keeps its
'active' class, but the nav item is display:none so it's never visible.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A FK constraint violation crashed the whole process on 1.9.1-beta2 with a
bare "FOREIGN KEY constraint failed" and NO stack — so it couldn't be root-
caused. better-sqlite3 is synchronous, so such a throw inside a socket.io
handler (no local try/catch) propagates to uncaughtException, and with no
handler Node exits traceless.
Add a small top-of-server.js net that logs the FULL err.stack (file:line of
the offending write) + timestamp, best-effort closes the DB (WAL flush), then
exits(1) so systemd restarts fresh. NOT catch-and-continue — after an uncaught
throw the process state is undefined, so we never keep serving. This is the
investigation tool the root-cause fix is blocked on, plus the fleet-wide-crash
net #114 asked for.
Verified (not assumed):
- A synthetic synchronous FK throw inside a real socket.io handler IS caught by
uncaughtException, logs the full stack incl. the throwing file:line, exits 1.
- Non-over-reach: a FK throw in an Express route -> Express handles it (500), a
throw in a local try/catch -> caught (200); the global net does NOT fire and
the process stays alive. Last resort, not a catch-all.
- 149 server tests green; server boots clean (net doesn't trip on startup).
The root-cause FK fix is SEPARATE and waits on the stack trace this produces.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Option A: tile-on-tile (same section) reorders; tile-on-section / cross-
section stays group-assign (existing behavior untouched). Ordering is
cosmetic (dashboard only — nothing the device/player reads).
Backend:
- Migration: devices.sort_order column (idempotent ALTER; default 0).
- GET /api/devices ordering: sort_order ASC, created_at ASC (was created_at).
- POST /api/devices/reorder — ordered id array -> transactional
UPDATE sort_order=index, scoped WHERE workspace_id = caller's workspace
(forged cross-workspace ids are no-ops). Write-gated (viewer read-only).
Mirrors the playlist items reorder.
Frontend (the collision):
- Card-level dragover/drop: reorder ONLY when target is another card in the
SAME section; otherwise no-op so the event bubbles to the section's
group-assign handler. stopPropagation on the same-section drop prevents
the section handler also firing. Drop indicator (inset box-shadow).
Native HTML5 DnD; no library.
Validated (headless Chrome, synthetic DnD + a section-level drop spy):
- SAME-section reorder: section drop suppressed (sectionDrops=0), POST
/devices/reorder fires, NO group call, sort_order persists in DB.
- CROSS-section: section drop fires (sectionDrops=1), POST /groups/:id/
devices fires and membership actually changes — group-assign unbroken.
- The 0-vs-1 contrast proves stopPropagation disambiguates the shared gesture.
- 149 server tests green; migration applies clean on the prod-copy DB.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes #104's two surfaces by reusing the now-generalized player preview
for devices, seam-safe (device-bound layout, NOT playlist-derived).
Server:
- GET /api/devices/:id/preview-payload returns buildPlaylistPayload(deviceId)
— the device's OWN layout/orientation (device row) + its published items —
with wall_config forced null (v1: wall members preview full-frame; a
socket-free follower would otherwise freeze waiting for leader wall:sync).
Device-READ gate (mirrors GET /:id, viewers allowed); NOT requirePlaylistRead.
Player (generalized, shared seam):
- Boot dispatch now accepts ?preview=1 with EITHER playlist=ID OR device=ID.
- bootPreview(qs) builds the right URL; shared body factored into
renderPreviewFromUrl(url) used by both. Renderer still UNTOUCHED.
- derivePreviewLayout stays PLAYLIST-only; never touches the device path.
Dashboard:
- Device manager gets a Preview button -> /player?preview=1&device=ID
(modal iframe, aspect from device orientation). Playlist-view button as-is.
- i18n x6 (device.preview_btn).
Validated (not just tests): 149 server tests green (generalization didn't
break the playlist path); device preview renders socket-free in headless
Chrome; layout proven device-bound on real data (device playlist has 0 zoned
items -> playlist-derivation would give NULL, but payload returns the device
row's "Vertical Full HD"); wall-member device previews full-frame (inWallMode
false) without freezing; auth gate outsider->403, no-token->401; playlist
path still renders the webpage note post-refactor.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Duplicate and Replace per-item actions, both leaning on the normalized
playlist_items schema (only content_id/widget_id/zone_id/sort_order/
duration_sec; type-specific fields are JOINed at snapshot time).
- Replace: extend PUT /:id/items/:itemId to accept a content_id/widget_id
swap. Clean FK swap across ANY content type (image<->video<->youtube<->
widget) — sets one, nulls the other, preserving zone_id/duration/
sort_order/schedule rows. Only acts when content_id|widget_id is present,
so partial PUTs are unaffected. Workspace-validated; markDraft.
- Duplicate: new POST /:id/items/:itemId/duplicate — copies the row +
its schedule blocks (new ids) in one transaction, appended (sort_order
MAX+1). markDraft.
- Frontend: Replace + Duplicate icon buttons per item; Replace reuses the
add-item picker in a replaceItemId mode (PUT instead of POST). i18n x6.
Validated end-to-end against the live API: duplicate (incl. schedule copy
with distinct ids), replace same-type and cross-type both directions,
preservation of duration/schedule/zone, and validation (both->400,
missing->404). 149 server tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the broken/fragmented preview with a single surface that renders a
DRAFT playlist exactly as a device does, by reusing the player's renderer in a
same-origin iframe. Fixes "not all items load" (one renderer, full type union)
and inherits the player's YouTube correctness (YT.Player handshake).
Server:
- deviceSocket: extract assemblePayload() (zone-reset + canonical shape) from
buildPlaylistPayload so the device path and preview can't drift. Pure refactor
(all 149 tests green).
- playlists: GET /:id/preview-payload (requirePlaylistRead, workspace-scoped).
Draft-aware via buildSnapshotItems (live items, not published_snapshot);
derivePreviewLayout() resolves layout from the playlist's own zone-bound items
(0 zoned -> fullscreen; 1 -> use it; >1 -> dominant + ambiguous flag, never
crashes). orientation validated/passthrough; wall_config/timezone null.
Player (renderer UNTOUCHED):
- ?preview=1&playlist=ID boot branch: fetch preview-payload (same-origin Bearer
token) and call handlePlaylistUpdate(). Gated before the pairing/socket path
so the unpaired auto-connect never fires. All socket emits already guarded.
- Webpage widgets: always-visible honest note (no auto-detection — an XFO
refusal is provably indistinguishable client-side from a working embed).
Dashboard:
- playlists: Preview button + player-iframe modal with landscape/portrait toggle.
- widgets: same honest note on the existing widget preview modal (the surface the
bug was reported on).
- i18n x6 (en/es/fr/de/it/pt) + player i18n x5.
Validated end-to-end (headless Chrome + CDP): preview boots, webpage note
renders, 3-zone layout derives+renders, shape parity with device snapshot proven
on real data, auth gate returns 401. The world-readable /uploads finding is
tracked separately as #107 (not a #104 concern — same path the device uses).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Agencies can only be designated FULL-SCREEN playlists (no item with zone_id) - a full-screen
agency upload can't safely target a zone, so the ambiguous case is excluded rather than
solved. Checked at THREE points:
- Designation (tokens.js create + PUT /:id/targets) -> 400: reject a zoned target.
- Upload (agency.js item-add) -> 409: block if the playlist BECAME zoned after designation.
MANDATORY because auto-publish has no draft net - a full-screen playlist designated to an
auto-publish token, then zone-assigned, would otherwise auto-publish a full-screen upload
into a zoned playlist. The upload check is the only thing that catches it.
- Picker (settings.js): zoned playlists greyed/disabled with the reason (GET /playlists now
returns a zoned flag); backend reject is the guard if the UI is bypassed. i18n x5.
isZonedPlaylist = EXISTS(playlist_items WHERE zone_id IS NOT NULL). Pure restriction - no
zone structure, no api_token_target_zones.
Bite-test (the exact sequence) GREEN and re-proven to bite: full-screen -> designate to an
auto-publish token -> zone-assign the playlist -> agency upload is BLOCKED (409), not
auto-published; neutralizing the upload check makes it go red. 149 suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Settings → API Tokens: each agency token gets an "Edit playlists" control that opens the
playlist picker pre-checked with the token's CURRENT designations (from the list GET's
tok.targets), lets the admin add/remove, and calls the existing PUT /:id/targets to
atomically re-designate. Reuses the creation picker pattern; common.save/cancel reused;
edit_targets + targets_updated i18n across all 5 locales. No security-model change - the
endpoint was already proven.
Test (integration): PUT /:id/targets re-designates (add + remove) and the confinement
follows the NEW set - a re-designated token reaches only its new playlists (router.param
403s the removed one). 148 suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Investigation found zone placement is a DEVICE property (device.layout_id), not a playlist
property: a normal playlist has no derivable layout (zone_id is NULL unless set in the
device-assignment flow), so a playlist-scoped zone grant can't reach the normal flow. The
right model: placement belongs to the device (same playlist can be full-screen on one screen,
a zone on another); the agency just gets whole-playlist grants + size-guidance.
Removed the zone-grant machinery (security-adjacent dead surface is a liability, not dormant
convenience): api_token_target_zones (schema + a DROP migration for the dev DB where the
short-lived CREATE ran), resolveGrantedZone, grantableZoneIds, buildZoneGrantRows, the
create/PUT zone validation, GET /api/playlists/:id/zones, getPlaylistZones, the settings
zone-picker + its i18n, and the zone-grant bite-test.
KEPT (model-agnostic, good): the reactive per-playlist size-guidance card - GET
/api/agency/playlists/:playlistId/layout (router.param-confined) now reports the zones the
playlist actually feeds (where/what-size content lands), or full-screen when it has no layout.
Whole-playlist grants = today's working model. 147 suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Issuance (on the proven seam):
- tokens.js create + PUT /:id/targets accept per-playlist zone grants (target_zones), inserted
into api_token_target_zones inside the same transaction as the playlist grants (FK requires
the parent, so order matters and is correct).
- Issuance validation (the mirror of runtime confinement): grantableZoneIds() - can grant ONLY
a zone the playlist's layout actually feeds; can't grant one it doesn't have or one from
another playlist's layout. Bite-tested. PUT re-designate stays atomic: delete parent rows ->
zone grants cascade out (no manual child delete).
- settings.js: checking a designated playlist reveals its grantable zones (GET
/api/playlists/:id/zones, JWT); leave unchecked = whole-playlist. i18n across all 5 locales.
Card:
- GET /api/agency/playlists/:playlistId/layout (rides router.param - confined; a non-
designated playlist -> 403, asserted). "Your zone" = the GRANTED zones. Retired the
token-wide /layouts (the per-playlist card replaces the disconnected lump).
- Portal card reacts to the playlist selector: pick a playlist -> its layout renders, the
granted zone highlighted with px size, siblings as context.
Full suite + agency bite-suite green (154).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Placement-as-grant, replacing the inferred auto-place idea. api_token_target_zones is an
ADDITIVE second table (does NOT touch the proven api_token_targets), structurally anchored:
a composite FK to api_token_targets(token_id, playlist_id) makes a zone grant orphan-
impossible and cascade away when the playlist grant is revoked - "narrow" is structural, not
conventional. zone_id FK -> layout_zones cascades on zone/layout delete.
Confinement (lib/agency-targets.resolveGrantedZone, called in the item-add): grants exist ->
the item MUST land in a granted zone (a body zone_id picks among grants, never escapes them);
none -> whole-playlist/full-screen as before. The item-add stamps the granted zone_id.
Bite-tested (6, all proven incl. neutralize->red on the confinement): granted YES; non-
granted/cross-playlist/ambiguous blocked; orphan-grant rejected by the FK; cascade on
playlist-grant revoke, on playlist delete, on zone/layout delete; and foreign_keys=ON
asserted (a cascade that no-ops because FKs are off is the trap). 153 suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reuses the existing scheduler + sendEmail infra (no new scheduler). The agency endpoint
enqueues one agency_notifications row per item added; a 15-min flush groups unsent rows per
token+playlist+action and sends ONE digest per group to the workspace owner/admins + the
playlist owner (deduped via UNION). Draft -> "added N items, awaiting approval"; published ->
"updated <playlist>".
Two robustness rules, both tested:
- Queue never balloons when SMTP is off: the endpoint skips enqueue when !isConfigured(),
and the flush drains-and-discards unsent rows as a backstop.
- sent_at is stamped ONLY after a successful send, so a failed send retries next cycle
instead of silently dropping.
Wired into boot via startAgencyDigest(). 147 suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
So the agency can size/place content: returns the canvas size + zone positions/sizes for the
layouts its designated playlists feed, marking which zones are theirs. DEVICE-FREE BY
CONSTRUCTION - the query path is playlist_items.zone_id -> layout_zones -> layouts and never
touches devices/groups/schedules, so device names/locations/IPs/topology are structurally
absent, not filtered. Geometry only - no sibling-zone content. layout.name included (admin's
canvas name); thumbnail_data omitted (could render other zones' content).
Confinement query in lib/agency-layouts.js, bite-tested: own layout YES, a non-designated
playlist's layout NO, response has NO device fields (asserted on a db where a location-named
device exists), and neutralizing the t.token_id filter goes red. 142 suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>