docs(#142): 1.9.2-beta1 changelog + device_status_log VACUUM maintenance note

Documents the #142 changes and tells operators with an already-bloated
device_status_log to reclaim space with a one-time manual VACUUM in a maintenance
window (retention now bounds further growth). Explains why auto-VACUUM is not
enabled. New doc: docs/maintenance-device-status-log.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
ScreenTinker 2026-06-27 19:59:17 -05:00
parent f206537fed
commit ce78d0dde4
2 changed files with 75 additions and 0 deletions

View file

@ -1,5 +1,36 @@
# Changelog
## 1.9.2-beta1 — unreleased
### Fixed — server resilience (#142)
- **A single flapping device can no longer saturate the event loop.** A new
load-aware, per-device reconnect throttle (`lib/reconnect-throttle.js`) gates
genuine reconnects *before* the heavy register work (DB writes + playlist build).
The verdict is per-device; global event-loop lag only multiplies an
already-flagged device's backoff and never throttles a healthy one. Hard ceiling
+ cold-start warm-up so a full-fleet reconnect after a deploy is never throttled.
- **`device_status_log` growth is bounded.** Added
`idx_device_status_log_device_ts`, a global retention sweep (`pruneStatusLog`,
`STATUS_LOG_RETENTION_DAYS` default 3) covering removed/idle devices and the
`offline_timeout` path, and de-duplicated the table's `CREATE TABLE`.
- **`content-ack` spam de-duplicated.** Repeated identical
`(device_id, content_id, status)` reports are suppressed within
`CONTENT_ACK_DEDUP_MS` (default 10s).
- **Provisioning cleanup window corrected.** Unclaimed provisioning devices are now
swept after 24h (the code used `365 * 86400` — a year — contradicting its own
comment).
### Added — observability (#142)
- **Event-loop lag telemetry** via `perf_hooks.monitorEventLoopDelay()`. Sampled to
a bounded `event_loop_lag` table (indexed + pruned, `LAG_TELEMETRY_RETENTION_DAYS`)
and surfaced on `/api/status` as `loop_lag` (mean/p50/p99/max + band).
### Maintenance
- Operators whose `device_status_log` is already bloated from a pre-1.9.2 deployment
should reclaim disk with a **one-time manual `VACUUM`** in a maintenance window;
retention now bounds further growth. Auto-VACUUM is intentionally not enabled.
See [`docs/maintenance-device-status-log.md`](docs/maintenance-device-status-log.md).
## 1.9.1-beta3 — unreleased
### Fixed — Tizen player

View file

@ -0,0 +1,44 @@
# Maintenance: `device_status_log` growth & space reclaim (#142)
## What changed in 1.9.2-beta1
`device_status_log` previously grew without an effective bound (the per-device
insert-time prune missed removed/idle devices and the heartbeat `offline_timeout`
insert). In one deployment it reached ~1.2M rows / ~119 MB over ~23 days and
degraded dashboard performance.
1.9.2-beta1 bounds further growth:
- **Index** `idx_device_status_log_device_ts(device_id, timestamp)` — the dashboard
uptime query and the prunes now use an index instead of a full scan.
- **Global retention sweep** (`pruneStatusLog()`), run on startup and on the
heartbeat interval, deletes rows older than **`STATUS_LOG_RETENTION_DAYS`**
(default **3**) across *all* devices — including removed/idle devices and the
`offline_timeout` rows the per-device prune never revisited.
## Reclaiming space on an already-bloated database
> **Operator action — only needed once, only if your `device_status_log` is already
> bloated from a pre-1.9.2 deployment.**
Retention bounds *future* growth, but SQLite does **not** return freed pages to the
filesystem on `DELETE` — the file stays at its high-water mark until a `VACUUM`.
After upgrading (which prunes the old rows), reclaim the disk with a **one-time
manual `VACUUM` in a maintenance window**:
```sh
# stop the server (or do this during a low-traffic window — VACUUM takes a global
# write lock and rewrites the whole DB file; the app cannot write during it)
sqlite3 /opt/screentinker/server/db/remote_display.db 'VACUUM;'
```
In the reference incident this took the DB from **119 MB → 39 MB**.
### Why VACUUM is not automatic
`VACUUM` locks the database and rewrites the entire file — unacceptable on the hot
path. `PRAGMA auto_vacuum=INCREMENTAL` is **not** enabled either: it only takes
effect on a freshly-created database (set before the first table) or after a
one-time full `VACUUM` to convert an existing DB, so enabling it would be a no-op on
existing installs and a silent behavior change on new ones. Space reclaim is left as
a deliberate operator decision; ongoing growth is already bounded by retention.