Archive/screentinker - Chris' Git

Archive/screentinker

mirror of https://github.com/screentinker/screentinker.git synced 2026-06-29 09:23:16 -06:00

Author	SHA1	Message	Date
ScreenTinker	e73428182d	fix(#143 ): fingerprint-reclaim stuck loop — reclaim by runtime liveness, throttle log Bold beta1: three devices spam "Fingerprint reclaim rejected ... device active (status=offline, ~2500s since heartbeat, liveConn=false)" twice/~2s indefinitely — contradictory: gone by every signal yet treated as active. Root cause (NOT a missing clear — corrected the hypothesis). The reject condition was `liveConn \|\| status==='online' \|\| secondsSince < RECLAIM_GRACE_SECONDS(24h)`. For the observed devices liveConn=false and status=offline, so the ONLY true term is `secondsSince < 24h` — an effective 24h CALENDAR grace, not a stale flag. Audited the clears: liveConn (deviceConnections) is removed on the debounced disconnect (heartbeat.removeConnection) AND the offline_timeout sweep (deviceConnections.delete); status is set 'offline' on both. liveConn=false + status=offline PROVE the clears ran — there is nothing stale to clear. The 24h time gate (mislabeled "device active") blocked a legitimately-gone device from reclaiming for up to 24h, so it retried every ~2s forever-in-practice. The "twice per ~2s" is two reclaim ATTEMPTS per cycle (client reconnect + re-pair-on-auth-error), each hitting the single console.warn — not double-logging in one attempt. Fix: - Decide "still alive" from RUNTIME signals: `!!liveConn \|\| secondsSince < reclaimSettleSeconds`. A device with no live socket and a heartbeat older than the settle window is gone -> reclaimable. A live (or just-seen) device is still rejected, so reclaim-abuse protection holds. NOT just ignoring "active" — it fixes WHY it was stuck (the 24h gate). RECLAIM_SETTLE_SECONDS default 300 (was 24h). SECURITY TRADEOFF flagged in config: shortens the anti-fingerprint-theft window; raise to re-tighten. Tuning guess to validate vs Bold. - Log throttle: the deferral logs at most once per device per RECLAIM_REJECT_LOG_ WINDOW_MS (default 60s) — collapses the double-log + the per-2s flood (same discipline as the content-ack shed log). Cleared when a reclaim proceeds. Recovery of the 3 wedged devices (2febcaa9, 1984694c, 139159eb): they SELF-HEAL on their next reclaim attempt (~2s) once this ships — their heartbeats are ~2500s stale (>300s settle) and liveConn=false, so the reclaim now succeeds. No operator SQL needed. Tests (port 3988): gone device reclaims; live device still rejected; clear-on-leave (disconnect clears liveConn -> stale device reclaims); deferral log <=1 per window. Full suite green serial+parallel (217). reconnect-throttle.js, the `dbac699` content-ack limiter, and the `404c330` block/auth code untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 22:56:48 -05:00
ScreenTinker	dbac699854	fix(#143 ): content-ack flood control — per-device rate budget + loop-lag valve #142's content-ack dedup is insufficient: a device cycling 2-4 content IDs makes every ack look unique so dedup never fires, while aggregate volume from ~30 devices saturates the event loop (the #142 reconnect throttle kept the server responsive, which is how this was even observable). Folded ONE control on the content-ack path (no competing limiters; reconnect- throttle.js untouched) in lib/content-ack-limiter.js: - Step 1 — per-device RATE budget: caps TOTAL non-duplicate acks per device per window regardless of differing content_id (the case dedup misses). Over budget = DROP silently (the per-ack log+emit is the cost); log ONCE per device per window when shedding starts. Keeps the #142 dedup (dedup'd repeats don't consume budget). Per-device, in-memory, resets on restart (modeled on lastPlayLogAt; does NOT reuse reconnect-throttle's ban-semantics bucket). Env (TUNING GUESSES, validate vs Bold's fleet): CONTENT_ACK_MAX_PER_WINDOW=20, CONTENT_ACK_RATE_WINDOW_MS=10000 (=2/s, above legit ~<=1/s, below the flood). - Step 2 — global pressure valve: reuses the #142 loop-lag band (+ its hysteresis, no second control loop). Under CRITICAL band, shed content-acks even for an in-budget device; reconnects + dashboard/HTTP are ALWAYS processed; a healthy device in a non-critical band is never touched by the valve. Valve open/close logged once at the band edge in services/loop-lag.js (not per shed message). Tests (unique ports 3985/3986, not the 3982/3983/3984 set): - unit: the #143 regression (cycling ids evading dedup IS rate-limited), under/over budget, dedup still works + doesn't consume budget, valve sheds in-budget under critical while normal is untouched, rate precedence, window reset, per-device isolation. - integration: socket flood is capped to budget with a single shed-start log; under-budget passes every ack; valve OPEN sheds content-acks while a reconnect + /api/status still succeed. Full suite green serial AND parallel (208 tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 22:21:57 -05:00
ScreenTinker	15448d1c5d	fix(#142 ): dedup repeated content-ack reports (secondary load) device:content-ack logged + emitted every message, so a device repeatedly reporting the same "content <id>: ready" (observed from an older app version) added avoidable load per message. - Suppress identical (device_id, content_id, status) reports within config.contentAckDedupMs (default 10s), modeled on the lastPlayLogAt throttle. A status change has a different key and passes immediately; a fresh report after the window passes too. In-memory, resets on restart. The handler does no DB writes, so this is purely shedding redundant log+emit work. test: integration over a real authenticated device socket — a burst of identical "ready" collapses to one log/emit, a "ready" after the window passes, and a status change is never deduped. Unique PORT (3984). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 19:35:04 -05:00
ScreenTinker	29a8896aa8	fix(#142 ): global device_status_log retention sweep + STATUS_LOG_RETENTION_DAYS The per-device insert-time prune (deviceSocket.js) only ever touches a device that is actively inserting, so it misses two paths: removed/idle devices whose rows linger forever, and heartbeat.js's offline_timeout insert that bypasses logDeviceStatus entirely. The reporter's 1.2M-row bloat accumulated UNDER a 7-day per-device prune for exactly this reason. - pruneStatusLog() (db/database.js): a GLOBAL time-range sweep across ALL devices, modeled on the play_logs prune. Run once on startup (recovers a bloated table right after deploy) and on the heartbeat interval (services/heartbeat.js). - STATUS_LOG_RETENTION_DAYS env, default 3 (lower than the old hardcoded 7d; the dashboard only shows a 24h uptime window, so 2-3d is ample for diagnostics). - Deliberately NO per-device row cap: Step 3's throttle already bounds how fast a storming device can generate status rows, so a cap would add sweep complexity for little gain (noted for later if needed). - NO VACUUM / auto_vacuum here (kept off the hot path); space reclaim is left as a separate decision (see report). test: deterministic in-process unit test proves the sweep deletes over-retention rows across all devices — including a device absent from the devices table and an offline_timeout row — while keeping recent rows; idempotent on an empty table. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 19:34:19 -05:00
ScreenTinker	101f086204	fix(#142 ): load-aware per-device reconnect throttle (the outage fix) Gates genuine reconnects PER DEVICE before the heavy register work (DB writes + playlist build) runs, so a single flapping device can no longer saturate the event loop and take down the server. - Actuator is per-device, keyed on device_id (modeled on lastPlayLogAt). A device is flagged only when it exceeds reconnectBaseMax genuine reconnects per window. Same-socket playlist refreshes (isPlaylistRefresh) are exempt. - Load-awareness is BANDED (normal/elevated/critical from the step-2 lag signal), not a continuous controller. The band only MULTIPLIES an already-flagged device's backoff; global lag never gates a healthy device. - Hysteresis: escalate immediately while storming (tighten fast); decay one level per reconnectReleaseMs of calm (release slow). - HARD CEILING per device, independent of band and warm-up — a slow-ramp attacker can't train through it. - COLD START: for reconnectWarmupMs after boot, force the normal band and apply only the hard ceiling, so a full-fleet reconnect after a deploy doesn't throttle healthy screens. State is in-memory, resets on restart. - Observability: every throttle engagement logs device, band, observed vs allowed rate, and backoff. Throttled device gets device:throttled + a deferred disconnect. Tests (api.test.js style): - unit: healthy-never-throttled, storm-throttled-with-growing-backoff, band multiplies backoff, hard-ceiling-even-in-warmup, warm-up leniency, neighbor isolation, slow release. - integration GATE (the required one): full-fleet reconnect right after restart throttles NO healthy device; a single device storming IS throttled; a neighbor stays unaffected while another storms. - also fixes pre-existing test PORT collisions (my new integration files clashed with totp.test.js:3979 and totp-keyrotation.test.js:3980 -> moved to 3982/3983); full suite now green serially AND in parallel. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 19:18:00 -05:00
ScreenTinker	ed3cf72b82	feat(#142 ): event-loop lag telemetry (perf_hooks) + bounded storage Continuously samples event-loop delay via perf_hooks.monitorEventLoopDelay() (C++-backed histogram; cheap). Each window persists mean/p50/p99/max to a new event_loop_lag table and recomputes a coarse load band (normal/elevated/critical) from the window p99. Standalone value: current lag is exposed on /api/status and band changes are logged, so site lag is diagnosable independent of throttling. The band feeds the #142 reconnect throttle (next commit) but ships first as its own subsystem. - event_loop_lag is bounded from day one: indexed on sampled_at + scheduled prune (LAG_TELEMETRY_RETENTION_DAYS, small default) modeled on the play_logs prune. Deliberately NOT another unbounded-growth table. - Band transitions are asymmetric: jump up immediately (tighten fast), release one level at a time after N calm samples below a deadband (release slow, no flap). Pure nextBand() function, unit-tested deterministically. - config: LAG_SAMPLE_INTERVAL_MS, LAG_RESOLUTION_MS, LAG_TELEMETRY_RETENTION_DAYS, LAG_PRUNE_INTERVAL_MS, LAG_ELEVATED_MS, LAG_CRITICAL_MS, LAG_RELEASE_SAMPLES. - tests: band-transition unit tests; integration proves sampling persists, stays bounded under the prune, and surfaces on /api/status. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 19:01:08 -05:00
ScreenTinker	674a34ba45	feat(config): HIDE_BILLING flag to hide the Subscription/billing UI (#116 ) Opt-in, default-off UI gate (per strobe's spec; verified his file refs first). When set, hides the Subscription sidebar item + billing view and bounces #/billing to the dashboard. Billing shown by default -> existing deployments unchanged. UI-only: /api/subscription/* untouched (internal usage reads stay). - config.js: config.hideBilling from HIDE_BILLING (mirrors selfHosted). - auth.js: surface hide_billing on GET /api/auth/me (client already fetches it at boot, stored on the user object). - index.html: id="billingNavItem" on the Subscription <li> (mirrors adminNavItem). - app.js: toggle billingNavItem in updateSidebarUser (next to the admin toggle); guard #/billing -> history.replaceState('#/') + render dashboard (replaceState so the back button doesn't loop into the guard). - .env.example + README documented. Spec assumptions verified against code: adminNavItem toggle pattern exists; /me is fetched at boot and updateSidebarUser runs both at boot (cached user) and post-/me, so no-flash holds on warm loads (one-time flash possible on the first load after the flag flips — same as the admin nav, minor); route dispatch is an if/else chain. Nav label is static (no data-i18n) so no i18n change. Validated (headless Chrome, both states): - flag unset -> Subscription tab present, #/billing renders (backward-compat). - HIDE_BILLING=true -> tab hidden, #/billing redirects to #/. - config maps HIDE_BILLING both ways; live /me default hide_billing=false. - 149 server tests green. Default-off = zero change for existing deployments. Known cosmetic (harmless): after the redirect the billing nav LINK keeps its 'active' class, but the nav item is display:none so it's never visible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 09:19:24 -05:00
ScreenTinker	52b10408be	chore(version): single-source VERSION, env-configurable data paths, bump tooling - server/version.js: shared version helper that reads the root VERSION file once (fallback 0.0.0). Replaces the stale hardcoded 1.2.0 / 1.5.1 / 1.0.0 fallbacks in /api/version, /api/update/check, and /api/status. - config.js: DATA_DIR / DB_PATH / UPLOADS_DIR / CERTS_DIR env overrides for the db, uploads, and certs/jwt-secret locations. Unset resolves to exactly the legacy in-repo paths, so existing installs (including production) are byte-for-byte unchanged. Guarded by test/config-paths.test.js. - package.json: rename remote-display-server -> screentinker (+ lockfile name). - scripts/bump-version.sh: one-shot bump across VERSION, package.json (+lock), android (versionName and versionCode + 1), and the tizen widget version; makes one commit plus an annotated tag; prints the push command, never pushes. - .gitignore: global .db / .db-wal / .db-shm / .db.* so no database file (including .db.devbak backups, at any path) can be committed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 12:56:03 -05:00
ScreenTinker	54549420e7	feat(signup): optional org-on-create for self-service signups (#12 ) MSP-style deployments want self-service signups created WITHOUT a personal org, so an admin/operator can assign them into an existing customer org afterward. - config.autoCreateOrgOnSignup (AUTO_CREATE_ORG_ON_SIGNUP env), default true - single-tenant and the hosted self-service flow are unchanged. - ensureDefaultOrgForUser gains { allowCreate }: an existing membership is always returned (idempotent); the MINT path is gated. allowCreate=false + no membership -> returns null (user created org-less). - register accepts a per-request createOrg flag overriding the deployment default; the first-ever user is always given an org (never headless). login / Google / Microsoft pass allowCreate from the global config, so an org-less user is not silently given an org on next sign-in. Edge case: a non-platform user with zero workspaces now lands on a "no workspaces yet" empty state (new no-workspace view) instead of being bounced into onboarding (whose pairing step needs a workspace). route() redirects them there, and refreshCurrentUser() redirects once /me reveals zero accessible_workspaces (covers the first-load race). The workspace switcher already rendered an empty placeholder and resource routes already return [] for a null workspace, so nothing crashes in between. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:16:27 -05:00
ScreenTinker	742d8c4b09	feat(socket): delivery queue for offline-device emits Short-lived per-device queue covers the TV-flap window (issue #3): when a device is mid-reconnect, prior code emitted to an empty room and the event vanished. Now playlist-updates and commands targeting an offline device are queued and flushed in order on the next device:register for that device_id. server/lib/command-queue.js (new): - pendingPlaylistUpdate: per-device marker (rebuild via builder on flush -> always fresh DB state, no stale snapshots) - pendingCommands: per-device Map<type, payload> with last-of-type dedup (most recent screen_off wins) - TTL via COMMAND_QUEUE_TTL_MS env (default 30000) - Active sweep every 30s prunes expired entries Memory bounds: ~6 entries per device worst case (1 playlist marker + 5 command types), unref'd sweep timer. Wired emit sites (8 total; the four direct socket.emit calls in deviceSocket register handlers are intentionally NOT queued because the socket is alive by definition at those points): - server/routes/video-walls.js (pushWallPayloadToDevice) - server/routes/device-groups.js (pushPlaylistToDevice) - server/routes/content.js (content-delete fan-out) - server/routes/playlists.js (pushToDevices + assign) - server/services/scheduler.js (scheduled rotations) - server/ws/deviceSocket.js x2 (wall leader reclaim/reassign) server/ws/deviceSocket.js register paths now call flushQueue after heartbeat.registerConnection + socket.join. Existing socket.emit('device:playlist-update', ...) lines kept - they send the initial state on register; the flush replays any queued events. Player's handlePlaylistUpdate fingerprint check dedupes the overlap. Refs #3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 13:06:43 -05:00
ScreenTinker	3da49ec79c	chore(config): env-configurable heartbeat timing Make HEARTBEAT_INTERVAL and HEARTBEAT_TIMEOUT env-tunable so self-hosters with slow/jittery networks don't have to edit config.js (issue #3 reporter did exactly this to confirm the diagnosis). Defaults unchanged at 10000ms / 45000ms so existing deployments keep current behavior. Same parseInt(env) \|\| default pattern as PORT/HTTPS_PORT/PING_*. README env table extended. Refs #3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 13:03:02 -05:00
ScreenTinker	1aee4f2d5b	fix(socket): raise Engine.IO ping/pong + prefer WebSocket transport Connection-stability layer for issue #3. LG webOS WebKit (and other TV-grade clients) miss Engine.IO pongs under decode load with the Socket.IO defaults of 25s ping / 20s timeout, causing spurious transport drops and a connect/reconnect/evict/disconnect loop on the device. Default polling-first transport adds another fragility layer via the polling->WebSocket upgrade dance. - pingInterval / pingTimeout default to 30000 / 30000 (worst-case dead-socket detection 60s, up from ~45s). Both env-configurable via PING_INTERVAL / PING_TIMEOUT. - Player Socket.IO client: transports: ['websocket', 'polling']. Tries WebSocket first; falls back to polling on the same connect attempt if WebSocket fails. Polling fallback preserved for firewall-restricted networks. App-level heartbeat checker is unchanged and remains the safety net for clients that miss the transport-level ping/pong window. Tradeoffs documented in inline comments. README env table extended with PING_INTERVAL and PING_TIMEOUT rows. Refs #3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 13:02:34 -05:00
ScreenTinker	3dfec5d2f9	feat(config): DISABLE_HOMEPAGE env var to redirect / to the app Self-hosters running internal-only deployments don't need the marketing homepage. With DISABLE_HOMEPAGE=true, requests to / 302-redirect to /app instead of serving the landing page. Unset/false preserves current behavior. Requested via discord feedback.	2026-05-14 12:03:29 -05:00
ScreenTinker	c71c4016ca	feat(email): Microsoft Graph send + alert spam protection + preferences UI Replaces the unused EMAIL_WEBHOOK_URL stub with a real Microsoft Graph Mail.Send pipeline via @azure/msal-node client-credentials flow. Prior state on prod: every alert email was logged to journalctl and never sent (21 fallback log lines per hour for the chronic-offline devices). Four coordinated changes shipped as one commit since they're all part of making email delivery actually work responsibly: 1. services/email.js (NEW): Graph send via plain HTTPS (no SDK), in-memory MSAL token cache (refresh 60s pre-expiry), graceful stdout fallback when GRAPH_* env vars absent. Drop-in replacement for the old webhook. 2. services/alerts.js refactored: sequential await around sendEmail (was parallel fire-and-forget; first run hit Graph's MailboxConcurrency 429 ApplicationThrottled on a 30-device backlog). Sequential at ~250ms per send takes 5-8s for the full backlog, well within the 60s tick. Also: 24h long-offline cutoff to stop nagging about chronic-offline devices (the 20,000+ minute ones); 2-hour dedup window (was 1h) via a generic shouldSendAlert(type, id, windowMs) helper that future alert types (payment_failed, plan_limit_hit, etc.) can reuse. 3. Preferences UI: single checkbox in settings.js Account section bound to users.email_alerts. Saved via the existing Save Profile button. PUT /api/auth/me extended to accept email_alerts. requireAuth middleware SELECT now includes email_alerts so it propagates via req.user. 4. Dev safety net: GRAPH_DEV_RESTRICT_TO env var as an allow-list. When set, only listed recipients reach Graph; everyone else is suppressed with a log line. Prevents local dev (which often runs against fresh prod DB copies) from accidentally emailing real prod users. UNSET on prod systemd unit so production fans out normally. Also: package.json scripts use --env-file-if-exists=.env so local dev picks up .env automatically (Node 20.6+ built-in, no dotenv dep). Prod runs via systemd ExecStart and is unaffected. server/.gitignore added to keep .env out of git. Smoke verified end-to-end: - Sequential send pattern verified (a prior parallel-send tick had hit Graph's MailboxConcurrency 429 on 30 simultaneous sends; sequential at ~250ms each completes the same backlog without throttling) - 24h cutoff silenced 20/21 prod devices on the next tick - Dev restrict suppressed the 1 within-24h send - User-preference toggle flipped via UI -> DB -> alert path silently continued before reaching even the suppression log	2026-05-12 18:16:40 -05:00
ScreenTinker	4392bb460a	Add DISABLE_REGISTRATION env var to block public sign-ups When DISABLE_REGISTRATION=true (or 1), POST /api/auth/register returns 403 with a clear error. OAuth endpoints (/google, /microsoft) also refuse to auto-create new accounts — existing OAuth users can still sign in. First-user setup (empty users table) is always allowed so a fresh install can still be initialized. GET /api/auth/config now returns registration_enabled so the login view can hide the "Create Account" button and the trial banner when registration is off. Absence of the flag is treated as enabled for back-compat with older servers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 19:35:32 -05:00
ScreenTinker	1594a9d4a4	Initial open source release ScreenTinker - open source digital signage management software. MIT License, all features included, no license gates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 12:14:53 -05:00