Short-lived per-device queue covers the TV-flap window (issue #3):
when a device is mid-reconnect, prior code emitted to an empty room
and the event vanished. Now playlist-updates and commands targeting
an offline device are queued and flushed in order on the next
device:register for that device_id.
server/lib/command-queue.js (new):
- pendingPlaylistUpdate: per-device marker (rebuild via builder on
flush -> always fresh DB state, no stale snapshots)
- pendingCommands: per-device Map<type, payload> with last-of-type
dedup (most recent screen_off wins)
- TTL via COMMAND_QUEUE_TTL_MS env (default 30000)
- Active sweep every 30s prunes expired entries
Memory bounds: ~6 entries per device worst case (1 playlist marker
+ 5 command types), unref'd sweep timer.
Wired emit sites (8 total; the four direct socket.emit calls in
deviceSocket register handlers are intentionally NOT queued because
the socket is alive by definition at those points):
- server/routes/video-walls.js (pushWallPayloadToDevice)
- server/routes/device-groups.js (pushPlaylistToDevice)
- server/routes/content.js (content-delete fan-out)
- server/routes/playlists.js (pushToDevices + assign)
- server/services/scheduler.js (scheduled rotations)
- server/ws/deviceSocket.js x2 (wall leader reclaim/reassign)
server/ws/deviceSocket.js register paths now call flushQueue after
heartbeat.registerConnection + socket.join. Existing
socket.emit('device:playlist-update', ...) lines kept - they send
the initial state on register; the flush replays any queued events.
Player's handlePlaylistUpdate fingerprint check dedupes the
overlap.
Refs #3
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously sendEmail() only logged on error/suppression paths; success
was silent. After prod deploy of c71c401 it was unclear whether the
first alert tick had actually delivered email or not - the answer was
yes but had to be derived from 'no error log + recipient query showed
matching device'. Add a log line on success so future observability
doesn't require detective work.
Replaces the unused EMAIL_WEBHOOK_URL stub with a real Microsoft Graph
Mail.Send pipeline via @azure/msal-node client-credentials flow. Prior
state on prod: every alert email was logged to journalctl and never
sent (21 fallback log lines per hour for the chronic-offline devices).
Four coordinated changes shipped as one commit since they're all part
of making email delivery actually work responsibly:
1. services/email.js (NEW): Graph send via plain HTTPS (no SDK), in-memory
MSAL token cache (refresh 60s pre-expiry), graceful stdout fallback
when GRAPH_* env vars absent. Drop-in replacement for the old webhook.
2. services/alerts.js refactored: sequential await around sendEmail (was
parallel fire-and-forget; first run hit Graph's MailboxConcurrency 429
ApplicationThrottled on a 30-device backlog). Sequential at ~250ms per
send takes 5-8s for the full backlog, well within the 60s tick. Also:
24h long-offline cutoff to stop nagging about chronic-offline devices
(the 20,000+ minute ones); 2-hour dedup window (was 1h) via a generic
shouldSendAlert(type, id, windowMs) helper that future alert types
(payment_failed, plan_limit_hit, etc.) can reuse.
3. Preferences UI: single checkbox in settings.js Account section bound
to users.email_alerts. Saved via the existing Save Profile button. PUT
/api/auth/me extended to accept email_alerts. requireAuth middleware
SELECT now includes email_alerts so it propagates via req.user.
4. Dev safety net: GRAPH_DEV_RESTRICT_TO env var as an allow-list. When
set, only listed recipients reach Graph; everyone else is suppressed
with a log line. Prevents local dev (which often runs against fresh
prod DB copies) from accidentally emailing real prod users. UNSET on
prod systemd unit so production fans out normally.
Also: package.json scripts use --env-file-if-exists=.env so local dev
picks up .env automatically (Node 20.6+ built-in, no dotenv dep). Prod
runs via systemd ExecStart and is unaffected. server/.gitignore added
to keep .env out of git.
Smoke verified end-to-end:
- Sequential send pattern verified (a prior parallel-send tick had hit
Graph's MailboxConcurrency 429 on 30 simultaneous sends; sequential
at ~250ms each completes the same backlog without throttling)
- 24h cutoff silenced 20/21 prod devices on the next tick
- Dev restrict suppressed the 1 within-24h send
- User-preference toggle flipped via UI -> DB -> alert path silently
continued before reaching even the suppression log
Fix: at connect, enumerate the user's accessible workspace_ids (direct workspace_members + org_owner/admin paths + platform_admin 'all') via new accessibleWorkspaceIds() helper in lib/tenancy.js; socket.join one room per workspace. All 12 dashboardNs.emit sites across deviceSocket / heartbeat / server.js / devices route / video-walls route now route via dashboardNs.to(workspaceRoom(...)).emit() with the workspace looked up from the relevant device or wall. New lib/socket-rooms.js holds the helpers and breaks a circular dependency (dashboardSocket already requires heartbeat, so heartbeat can't require dashboardSocket).
Inbound 6 commands rewired to canActOnDevice(socket, deviceId, tier): request-screenshot is read tier (workspace_viewer+); remote-touch/key/start/stop and device-command are write tier (workspace_editor+). Platform_admin and org_owner/admin always pass via actingAs. Legacy admin/superadmin branch dropped.
Lifecycle note: workspace-switch already calls window.location.reload (Phase 3 switcher), which forces a fresh socket with updated memberships - no per-emit re-evaluation needed.
Smoke tested with 3 simultaneous socket.io-client connections (switcher-test, swninja, dw5304 platform_admin) + direct canActOnDevice invocation for 6 user/device/tier combinations. All 9 outbound isolation cells and all 6 permission gates pass. Fixture mutation: switcher-test's Field Crew membership flipped from workspace_editor to workspace_viewer to exercise the read/write tier split in one login.
KNOWN REGRESSION (Phase 3 fix): platform_admin / superadmin no longer has cross-workspace 'see everything' view. Every route migrated tonight (2.2a-2.2m) deliberately removed the role-based bypass per design doc - cross-workspace visibility will come via dedicated admin endpoints in Phase 3, not magic role bypasses. Until Phase 3 ships, platform admins must switch-workspace to see other workspaces' data.
Express's req.ip was resolving to a Cloudflare edge address (e.g.
172.70.x.x) for any request fronted by Cloudflare, because trust proxy
was set to '1' — that trusts the immediate hop, which IS Cloudflare.
All activity_log rows from API paths captured the proxy, not the
client. The WebSocket path was unaffected and recorded the real IP.
Two layers of defense:
1. trust proxy now lists Cloudflare's published v4 + v6 ranges plus
loopback / linklocal / uniquelocal (config/cloudflareIps.js). With
this list req.ip resolves to the original client when fronted by
CF, and X-Forwarded-For from any non-trusted source is ignored —
so the value can't be spoofed.
2. New getClientIp(req) helper in services/activity.js prefers the
CF-Connecting-IP header but only honors it when the immediate TCP
peer is itself a trusted address. Same gate as trust proxy, so a
visitor who hits the origin directly with a forged header is
logged at their real address.
Routed all five activity-log call sites (auth login success/failure,
admin password reset, generic activityLogger middleware, and the
in-memory rate-limiter key) through the helper.
Logging-only change. No schema changes. Existing rows are not
modified — fix applies to new entries going forward.
Verified locally:
- Bare loopback hit logs 127.0.0.1 (not a proxy address).
- Helper unit cases including an untrusted peer (203.0.113.7) sending
a forged CF-Connecting-IP correctly fall back to the real peer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 group scheduling: schema migration adds group_id to schedules with
CHECK constraint, scheduler evaluates group+device schedules with priority,
group deletion converts schedules to per-device copies. Dashboard gets
playlist assignment dropdown and current playlist label on group headers.
Player persists audio unlock state in localStorage so version reloads
don't lose audio on unattended displays.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Schedule CRUD now includes playlist_id field. List queries join playlist name.
Scheduler tracks active overrides per device and reverts to original
playlist/layout when no schedule is active.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces the assignments-table query with a playlist_items query keyed on
device.playlist_id. Also eliminates the duplicate payload builder in
scheduler.js — it now calls the shared buildPlaylistPayload.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ScreenTinker - open source digital signage management software.
MIT License, all features included, no license gates.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>