mirror of
https://github.com/screentinker/screentinker.git
synced 2026-06-29 09:23:16 -06:00
docs(#142): 1.9.2-beta1 changelog + device_status_log VACUUM maintenance note
Documents the #142 changes and tells operators with an already-bloated device_status_log to reclaim space with a one-time manual VACUUM in a maintenance window (retention now bounds further growth). Explains why auto-VACUUM is not enabled. New doc: docs/maintenance-device-status-log.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
f206537fed
commit
ce78d0dde4
31
CHANGELOG.md
31
CHANGELOG.md
|
|
@ -1,5 +1,36 @@
|
|||
# Changelog
|
||||
|
||||
## 1.9.2-beta1 — unreleased
|
||||
|
||||
### Fixed — server resilience (#142)
|
||||
- **A single flapping device can no longer saturate the event loop.** A new
|
||||
load-aware, per-device reconnect throttle (`lib/reconnect-throttle.js`) gates
|
||||
genuine reconnects *before* the heavy register work (DB writes + playlist build).
|
||||
The verdict is per-device; global event-loop lag only multiplies an
|
||||
already-flagged device's backoff and never throttles a healthy one. Hard ceiling
|
||||
+ cold-start warm-up so a full-fleet reconnect after a deploy is never throttled.
|
||||
- **`device_status_log` growth is bounded.** Added
|
||||
`idx_device_status_log_device_ts`, a global retention sweep (`pruneStatusLog`,
|
||||
`STATUS_LOG_RETENTION_DAYS` default 3) covering removed/idle devices and the
|
||||
`offline_timeout` path, and de-duplicated the table's `CREATE TABLE`.
|
||||
- **`content-ack` spam de-duplicated.** Repeated identical
|
||||
`(device_id, content_id, status)` reports are suppressed within
|
||||
`CONTENT_ACK_DEDUP_MS` (default 10s).
|
||||
- **Provisioning cleanup window corrected.** Unclaimed provisioning devices are now
|
||||
swept after 24h (the code used `365 * 86400` — a year — contradicting its own
|
||||
comment).
|
||||
|
||||
### Added — observability (#142)
|
||||
- **Event-loop lag telemetry** via `perf_hooks.monitorEventLoopDelay()`. Sampled to
|
||||
a bounded `event_loop_lag` table (indexed + pruned, `LAG_TELEMETRY_RETENTION_DAYS`)
|
||||
and surfaced on `/api/status` as `loop_lag` (mean/p50/p99/max + band).
|
||||
|
||||
### Maintenance
|
||||
- Operators whose `device_status_log` is already bloated from a pre-1.9.2 deployment
|
||||
should reclaim disk with a **one-time manual `VACUUM`** in a maintenance window;
|
||||
retention now bounds further growth. Auto-VACUUM is intentionally not enabled.
|
||||
See [`docs/maintenance-device-status-log.md`](docs/maintenance-device-status-log.md).
|
||||
|
||||
## 1.9.1-beta3 — unreleased
|
||||
|
||||
### Fixed — Tizen player
|
||||
|
|
|
|||
44
docs/maintenance-device-status-log.md
Normal file
44
docs/maintenance-device-status-log.md
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
# Maintenance: `device_status_log` growth & space reclaim (#142)
|
||||
|
||||
## What changed in 1.9.2-beta1
|
||||
|
||||
`device_status_log` previously grew without an effective bound (the per-device
|
||||
insert-time prune missed removed/idle devices and the heartbeat `offline_timeout`
|
||||
insert). In one deployment it reached ~1.2M rows / ~119 MB over ~23 days and
|
||||
degraded dashboard performance.
|
||||
|
||||
1.9.2-beta1 bounds further growth:
|
||||
|
||||
- **Index** `idx_device_status_log_device_ts(device_id, timestamp)` — the dashboard
|
||||
uptime query and the prunes now use an index instead of a full scan.
|
||||
- **Global retention sweep** (`pruneStatusLog()`), run on startup and on the
|
||||
heartbeat interval, deletes rows older than **`STATUS_LOG_RETENTION_DAYS`**
|
||||
(default **3**) across *all* devices — including removed/idle devices and the
|
||||
`offline_timeout` rows the per-device prune never revisited.
|
||||
|
||||
## Reclaiming space on an already-bloated database
|
||||
|
||||
> **Operator action — only needed once, only if your `device_status_log` is already
|
||||
> bloated from a pre-1.9.2 deployment.**
|
||||
|
||||
Retention bounds *future* growth, but SQLite does **not** return freed pages to the
|
||||
filesystem on `DELETE` — the file stays at its high-water mark until a `VACUUM`.
|
||||
After upgrading (which prunes the old rows), reclaim the disk with a **one-time
|
||||
manual `VACUUM` in a maintenance window**:
|
||||
|
||||
```sh
|
||||
# stop the server (or do this during a low-traffic window — VACUUM takes a global
|
||||
# write lock and rewrites the whole DB file; the app cannot write during it)
|
||||
sqlite3 /opt/screentinker/server/db/remote_display.db 'VACUUM;'
|
||||
```
|
||||
|
||||
In the reference incident this took the DB from **119 MB → 39 MB**.
|
||||
|
||||
### Why VACUUM is not automatic
|
||||
|
||||
`VACUUM` locks the database and rewrites the entire file — unacceptable on the hot
|
||||
path. `PRAGMA auto_vacuum=INCREMENTAL` is **not** enabled either: it only takes
|
||||
effect on a freshly-created database (set before the first table) or after a
|
||||
one-time full `VACUUM` to convert an existing DB, so enabling it would be a no-op on
|
||||
existing installs and a silent behavior change on new ones. Space reclaim is left as
|
||||
a deliberate operator decision; ongoing growth is already bounded by retention.
|
||||
Loading…
Reference in a new issue