Five low-risk, high-value fixes surfaced by the security review:
#3 Branding lockdown — `custom_domain`/`custom_css` (which feed the PUBLIC,
pre-auth branding resolver and the login-page <style>) are now settable only
by platform admins; a workspace_admin can no longer hijack the platform login
page by claiming its domain. The public /api/branding (+ /domain) now return
only presentational fields via publicBranding() (no id/user_id/workspace_id/
custom_domain/timestamps leak).
#6 Strip device_token — the device WS auth secret (validated with
timingSafeEqual) was returned in device list/get/update + pairing responses
(SELECT d.* / *). New lib/device-sanitize.js strips it everywhere; prevents
device impersonation by any workspace user.
#7 must_change_password enforced server-side — was a frontend-only redirect, so
a provisioned temp password worked indefinitely via the API. requireAuth now
403s every route except GET/PUT /api/auth/me (the password change, which
clears the flag) and logout while the flag is set.
#8 XSS — escape user data interpolated into innerHTML in teams.js, kiosk.js,
layout-editor.js (team/page/layout/zone names, member name/email, kiosk
config fields). scriptSrcAttr 'unsafe-inline' made these exploitable via
injected event handlers, not just markup.
#9 Thumbnail IDOR — /api/content/:id/thumbnail had no auth/scope gate (any UUID
served any tenant's thumbnail). Now mirrors the /file route's playlist/widget
workspace-scoped reference check.
Tests: new test/security-fixes.test.js (device strip, publicBranding field
allowlist, must_change_password gate). Full suite 41/41. Verified live against a
prod-data copy: device_token absent from /api/devices, /api/branding trimmed.
Not addressed here (tracked for follow-up): Android OTA signature verification
(Critical), public widget-render XSS, token revocation/logout, pairing-code
strength, validateRemoteUrl hardening, import quota.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
White-label is stored per-workspace (white_labels.workspace_id); unbranded and
new workspaces - and the login page - fell back to hardcoded ScreenTinker. Add a
single platform default that everything inherits beneath the per-workspace layer.
Resolution (lib/branding.js): workspace row -> custom-domain match -> platform
default -> hardcoded ScreenTinker. Row-level override: a workspace with its own
row keeps it (current behavior); only row-less workspaces inherit the default,
so editing the default propagates instantly (no row-copying at creation).
The platform default is a white_labels row with a FIXED id ('platform-default'),
not a "workspace_id IS NULL" sentinel - legacy pre-multitenancy rows can also
have a null workspace_id, which would be ambiguous.
- routes/admin.js: GET/PUT /api/admin/branding (requirePlatformAdmin) to read/
upsert the single platform-default row; audit-logged.
- server.js: public GET /api/branding (domain match -> platform default ->
hardcoded) for pre-login/pre-workspace contexts.
- routes/white-label.js: authed GET now falls back to the platform default
(was hardcoded) for row-less workspaces.
- Frontend: login page resolves + applies branding (logo, name, colors, favicon,
custom CSS) pre-auth; Admin page gets a "Default branding" form.
Tests: resolver order incl. legacy null-ws safety; admin GET/PUT (single row,
upsert, platform-admin-only 403). Full suite 37/37. Verified end-to-end:
public + authed + login-page all inherit the platform default; per-workspace
override preserved.
Closes#15.
The #18 user-delete bug was the first symptom of a broader gap: 13 tables
reference workspaces(id) (and activity_log also organizations(id)) with NO
ACTION, so deleting a workspace or organization fails the same FK wall once it
holds any content. SQLite can't ALTER an FK action, so this migration rebuilds
each table (the create-copy-rename pattern the assignments/schedules migrations
already use), changing only the tenant FK clause:
workspace_id -> ON DELETE CASCADE (resources belong to the workspace)
activity_log.workspace_id / organization_id -> ON DELETE SET NULL (keep audit)
user_id FKs are intentionally left as-is - user deletion stays handled app-side
by lib/user-deletion.js (the #18 fix).
- lib/tenant-cascade-migration.js: pure, idempotent core (table-existence
guarded; transforms the stored CREATE text, copies rows verbatim, recreates
indexes; fixes activity_log's AUTOINCREMENT sequence; baseline-vs-after
foreign_key_check so pre-existing orphan rows don't abort it but a botched
rebuild does).
- db/database.js: boot wrapper owns the pre-migration snapshot + process.exit
on failure, matching the other heavy migrations.
Tests (node:test): reproduces the workspace-delete FK failure, applies the
migration, verifies FK actions (CASCADE / SET NULL), index recreation, data
preserved, and that workspace/org delete now cascades (activity_log preserved).
Full suite 27/27. Verified on a copy of a real DB: 13 tables rebuilt,
integrity_check ok, workspace delete cascades, no new FK violations.
DELETE /api/auth/users/:id ran a bare `DELETE FROM users`, but 23 columns
reference users(id) and only 4 cascade, so with foreign_keys=ON the delete
fails the moment the user is referenced anywhere - and a real user always is
(owns an org, created a workspace, has login activity). Reproduces on a fresh
DB, exactly as reported.
The schema also lacks cascades from workspaces -> tenant resources, so the DB
can't clean up on its own. New lib/user-deletion.js resolves every reference in
one transaction (defer_foreign_keys=ON for forgiving order; table-existence
guard for resilience):
- Refuse (409) if the user OWNS an organization that has other members -
don't nuke a shared tenant; transfer ownership first.
- Hard-delete the organizations they SOLELY own (workspaces + all contents).
- In orgs they don't own, PRESERVE resources: SET NULL the nullable
creator/inviter columns, and reassign the NOT NULL legacy creator user_id to
the resource's org owner (fallback: the acting admin).
- Memberships (organization_members/workspace_members/team_members/
content_folders) cascade on the user delete; pending invites they sent and
legacy teams they own are removed.
The handler now 404s an unknown id and 409s the shared-org case.
Tests (node:test): reproduces the FK failure, then verifies provisioned-member
delete (resources preserved + unlinked/reassigned), solo-org-owner cascade,
shared-org refusal (409), self-delete 400, non-superadmin 403, unknown 404.
Full suite 22/22. Verified end-to-end on a copy of a real DB: deleted a user
owning 2 solo orgs, foreign_key_check clean.
Closes#18.
platform_operator is cross-org STAFF: it can see and act-as into every
org and read/write workspace-scoped resources (content, playlists,
layouts, schedules, devices, widgets, kiosk) anywhere - but holds NO
owner-level power.
Design is deny-by-default: operator is NEVER added to PLATFORM_ROLES /
isPlatformRole, so every owner capability (billing, org/workspace
deletion, user/role management, shared & template asset curation,
branding, workspace member mgmt/rename) stays denied, and any NEW owner
endpoint added later inherits that denial automatically.
Operator gets power from exactly two levers:
- middleware/auth.js: new PLATFORM_STAFF set + isPlatformStaff(); owner
guards (PLATFORM_ROLES, requireAdmin, requireSuperAdmin) unchanged.
- tenancy.js: accessContext + resolveTenancy treat staff as act-as
capable; new req.isPlatformStaff / req.isPlatformOperator (req.isPlatformAdmin
stays owner-only); accessibleWorkspaceIds + switch-workspace guard use staff.
- permissions.js: canRead/canWrite + canAccessWorkspace (read) grant staff;
canAdmin / canAdminWorkspace / isOrgAdmin / isOrgOwner stay owner-gated.
Read-only edges (per review): operator may VIEW workspace member lists
(canAccessWorkspace) and the unassigned device pool (devices.js), but
cannot mutate either.
Frontend: platform role dropdown adds "Platform operator"; the user-mgmt
view stays isPlatformAdmin-gated so operators can't open it. EN i18n only.
Behaviour identical under HOSTED_INSTANCE set or unset.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The legacy /api/auth/users dropdown could write 'superadmin' and 'admin'
role strings that not every code path recognized. Some checks matched only
'platform_admin' (tenancy accessContext/resolveTenancy), so a 'superadmin'
user could list orgs but not act-as into them.
Normalize to the current two-tier platform model (users.role holds the
PLATFORM role only; org/workspace roles live in the membership tables):
- Migration (idempotent, exact-string): superadmin -> platform_admin,
admin -> user. No-ops on rows already in the current model.
- Add isPlatformRole() helper in middleware/auth.js; route the two
superadmin-excluding checks in tenancy.js through it so a stray
'superadmin' is never treated as lower-privileged (fixes act-as).
- Remove the dead/stricter requirePlatformAdmin in permissions.js (bare
=== 'platform_admin'); the single guard is the one in middleware/auth.js.
- Recovery-token default role admin -> platform_admin so emergency
recovery keeps full access once 'admin' no longer implies elevation.
- PUT /api/auth/users/:id/role whitelist -> ['user','platform_admin'];
self-demote guard retargeted via isPlatformRole.
- Frontend: platform user-management dropdown now offers User / Platform
admin only; owner-delete guard and settings highlight use isPlatformAdmin.
EN i18n: add admin.role.platform_admin.
Behaviour is identical under HOSTED_INSTANCE set or unset; the migration
only touches exact legacy strings.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Slice 1 + 3 of the user-management feature from the May 12 plan.
Backend-only - no UI yet (slice 2 ships separately). Backend +
accept-handler together so the email accept link is functional
from day one without a half-state.
Endpoints added:
- GET /api/workspaces/:id/members (any member; via_org=true
for org-level entries,
read-only from ws context)
- GET /api/workspaces/:id/invites (workspace_admin)
- POST /api/workspaces/:id/invites (workspace_admin)
- DELETE /api/workspaces/:id/invites/:inviteId (workspace_admin)
- PUT /api/workspaces/:id/members/:userId (workspace_admin)
- DELETE /api/workspaces/:id/members/:userId (workspace_admin)
- POST /api/auth/accept-invite/:inviteId (requireAuth +
case-insensitive
email match)
Permission gating:
- canAdminWorkspace (existing) for admin-gated endpoints
- canAccessWorkspace (new helper in lib/permissions.js) for the
members read endpoint - mirrors canAdminWorkspace shape but
admits any workspace_members role plus org/platform paths
Security additions vs the original plan:
- Transaction-bounded collision check on POST /invites closes the
TOCTOU race between simultaneous duplicate POSTs (no UNIQUE
constraint on workspace_invites(workspace_id, email))
- Per-(inviter, workspace), hour-window rate limit on POST /invites
to prevent abuse / cost runaway. Env-configurable via
INVITE_RATE_LIMIT_PER_HOUR with conservative 50/hour default.
429 response is generic - does not echo the configured value.
- Invite expiry env-configurable via INVITE_EXPIRY_DAYS (default 7)
- PUBLIC_URL env var (optional) pins the accept-URL origin in prod;
falls back to request-derived for local dev
Rollback rule on email send: only graph_error (real send attempt
failed at Graph) deletes the row and returns 502. not_configured
and dev_restricted are intentional non-sends - keep the row, count
against rate limit, allow local accept-invite testing to proceed.
Other safety blocks:
- Cannot demote/remove the last workspace_admin (409)
- Cannot remove the parent-org's org_owner via workspace path (403)
- Accept-invite is idempotent if user already a member
- Expired invites delete-on-read and return 410
- Wrong-account accept returns 403 without touching the invite
Expired-invite cleanup added to services/heartbeat.js mirroring
the team_invites sweep pattern.
Verification: 9-case curl-driven E2E against the dev DB fixture
(switcher-test + invitee-existing + invitee-new mid-flow register).
All 9 pass: create / collision-409 / second-create / rate-limit-429 /
existing-user-accept / register-then-accept / wrong-account-403 /
expired-410 / viewer-cannot-invite-403.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The non-admin branch of /me's accessible_workspaces query drove
from workspace_members, so users with org_owner or org_admin on
an organization but no direct workspace_members row were missing
those workspaces from their /me response - and therefore from the
switcher dropdown. Mirrors the access logic in
accessibleWorkspaceIds() (lib/tenancy.js) while keeping the
full-row SELECT shape /me needs.
Verified end-to-end with switcher-test@local.test acting as
org_owner of Acme Studios with no workspace_members row on
Studio B - Studio B now appears in /me's accessible_workspaces
with workspace_role: null, can_admin: true.
Also updates the stale TODO comment in tenancy.js that flagged
this exact gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Short-lived per-device queue covers the TV-flap window (issue #3):
when a device is mid-reconnect, prior code emitted to an empty room
and the event vanished. Now playlist-updates and commands targeting
an offline device are queued and flushed in order on the next
device:register for that device_id.
server/lib/command-queue.js (new):
- pendingPlaylistUpdate: per-device marker (rebuild via builder on
flush -> always fresh DB state, no stale snapshots)
- pendingCommands: per-device Map<type, payload> with last-of-type
dedup (most recent screen_off wins)
- TTL via COMMAND_QUEUE_TTL_MS env (default 30000)
- Active sweep every 30s prunes expired entries
Memory bounds: ~6 entries per device worst case (1 playlist marker
+ 5 command types), unref'd sweep timer.
Wired emit sites (8 total; the four direct socket.emit calls in
deviceSocket register handlers are intentionally NOT queued because
the socket is alive by definition at those points):
- server/routes/video-walls.js (pushWallPayloadToDevice)
- server/routes/device-groups.js (pushPlaylistToDevice)
- server/routes/content.js (content-delete fan-out)
- server/routes/playlists.js (pushToDevices + assign)
- server/services/scheduler.js (scheduled rotations)
- server/ws/deviceSocket.js x2 (wall leader reclaim/reassign)
server/ws/deviceSocket.js register paths now call flushQueue after
heartbeat.registerConnection + socket.join. Existing
socket.emit('device:playlist-update', ...) lines kept - they send
the initial state on register; the flush replays any queued events.
Player's handlePlaylistUpdate fingerprint check dedupes the
overlap.
Refs #3
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix: at connect, enumerate the user's accessible workspace_ids (direct workspace_members + org_owner/admin paths + platform_admin 'all') via new accessibleWorkspaceIds() helper in lib/tenancy.js; socket.join one room per workspace. All 12 dashboardNs.emit sites across deviceSocket / heartbeat / server.js / devices route / video-walls route now route via dashboardNs.to(workspaceRoom(...)).emit() with the workspace looked up from the relevant device or wall. New lib/socket-rooms.js holds the helpers and breaks a circular dependency (dashboardSocket already requires heartbeat, so heartbeat can't require dashboardSocket).
Inbound 6 commands rewired to canActOnDevice(socket, deviceId, tier): request-screenshot is read tier (workspace_viewer+); remote-touch/key/start/stop and device-command are write tier (workspace_editor+). Platform_admin and org_owner/admin always pass via actingAs. Legacy admin/superadmin branch dropped.
Lifecycle note: workspace-switch already calls window.location.reload (Phase 3 switcher), which forces a fresh socket with updated memberships - no per-emit re-evaluation needed.
Smoke tested with 3 simultaneous socket.io-client connections (switcher-test, swninja, dw5304 platform_admin) + direct canActOnDevice invocation for 6 user/device/tier combinations. All 9 outbound isolation cells and all 6 permission gates pass. Fixture mutation: switcher-test's Field Crew membership flipped from workspace_editor to workspace_viewer to exercise the read/write tier split in one login.