From ad5384ecc102041ad9b74efda2af44ec2a267ac7 Mon Sep 17 00:00:00 2001 From: Copilot Date: Wed, 3 Jun 2026 07:10:18 -0500 Subject: [PATCH] docs(scrape): add salvage and lock operator playbook Document scrape-lock-status, reclaim-stale, and salvage-before flags in operator checklist, merge-readiness, and GUI bridge guide. --- docs/gui-zip-recurring-scrape-bridge.md | 34 +++++++++++++++- ...ocs-salvage-lock-operator-playbook-plan.md | 39 +++++++++++++++++++ docs/recurring-scrape-merge-readiness.md | 24 +++++++++--- docs/recurring-scrape-operator-checklist.md | 28 +++++++++++++ 4 files changed, 118 insertions(+), 7 deletions(-) create mode 100644 docs/plans/2026-06-04-060-docs-salvage-lock-operator-playbook-plan.md diff --git a/docs/gui-zip-recurring-scrape-bridge.md b/docs/gui-zip-recurring-scrape-bridge.md index b8f9cd03..ebd6162d 100644 --- a/docs/gui-zip-recurring-scrape-bridge.md +++ b/docs/gui-zip-recurring-scrape-bridge.md @@ -27,14 +27,46 @@ Optional integrity tools: ```bash ./scripts/audit-archive-json.sh +./scripts/scrape-lock-status.sh # show archive-root scrape lock +./scripts/scrape-lock-status.sh --reclaim-stale # clear dead-holder lock artifacts # ./scripts/salvage-truncated-export.sh path/to/export.json ``` +### Stuck or crashed export (partial `.dce-temp`) + +After stopping a long run, merge quiescent partial exports before re-downloading history: + +```bash +./scripts/scrape-lock-status.sh +./scripts/scrape-lock-status.sh --reclaim-stale # when state is stale + +# Merge partial temps only (no Discord) +./scripts/operator-handoff.sh --salvage-only --target KotOR_discord_msgs --channel 221726893064454144 + +# Salvage then incremental catch-up (with audit + log) +DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \ + --salvage-before-scrape \ + --target KotOR_discord_msgs \ + --channel 221726893064454144 \ + --log-file logs/kotor-yes-general-$(date -u +%Y%m%d-%H%M%S).log +``` + +Or direct documents scrape: + +```bash +./scripts/run-documents-scrape.sh \ + --salvage-before-scrape \ + --target KotOR_discord_msgs \ + --channel 221726893064454144 +``` + +If a temp is still being written, stop the export first. To merge an active temp after confirming nothing is writing: `DCE_SALVAGE_ACTIVE_TEMPS=1`. + Archives: `config/scrape-targets.json` (typically `~/Documents/*` per target `output_dir`). **Disk:** Free several GiB on `/home` and archive roots before large scrapes (`DCE_MIN_FREE_MB`, default 1024). -**Validate scripts:** `DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh` (19 offline smokes) +**Validate scripts:** `DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh` (21 offline smokes) **Podman (Fedora):** install `podman-compose` when `docker compose` cannot reach the socket; scripts auto-prefer it. diff --git a/docs/plans/2026-06-04-060-docs-salvage-lock-operator-playbook-plan.md b/docs/plans/2026-06-04-060-docs-salvage-lock-operator-playbook-plan.md new file mode 100644 index 00000000..40b1ee44 --- /dev/null +++ b/docs/plans/2026-06-04-060-docs-salvage-lock-operator-playbook-plan.md @@ -0,0 +1,39 @@ +--- +title: "docs: Salvage and lock operator playbook" +type: docs +status: active +date: 2026-06-04 +origin: /lfg — plans 054–059 landed salvage/lock tooling; operator docs still show 19 smokes and omit catch-up playbook +--- + +# docs: Salvage and lock operator playbook + +## Summary + +Refresh operator-facing docs with scrape lock diagnostics, salvage flags, and KotOR yes_general catch-up playbook. Sync GUI bridge copy. + +## Requirements + +| ID | Requirement | +|----|-------------| +| R1 | `gui-zip-recurring-scrape-bridge.md` documents lock status, reclaim, salvage-only, salvage-before | +| R2 | `recurring-scrape-operator-checklist.md` adds stuck-channel / partial temp section | +| R3 | `recurring-scrape-merge-readiness.md` reflects 21 smokes and plans 054–059 | +| R4 | Run `sync-gui-bridge-doc.sh` when sibling GUI zip path exists | + +## Implementation Units + +### U1. Operator doc updates + +**Files:** `docs/gui-zip-recurring-scrape-bridge.md`, `docs/recurring-scrape-operator-checklist.md`, `docs/recurring-scrape-merge-readiness.md` + +### U2. GUI bridge sync + +**Command:** `./scripts/sync-gui-bridge-doc.sh` + +## Scope Boundaries + +### Deferred + +- Live KotOR catch-up execution on host +- `.docs/Recurring-Scrape-Setup.md` full rewrite diff --git a/docs/recurring-scrape-merge-readiness.md b/docs/recurring-scrape-merge-readiness.md index 506953b7..12019cbf 100644 --- a/docs/recurring-scrape-merge-readiness.md +++ b/docs/recurring-scrape-merge-readiness.md @@ -1,18 +1,18 @@ # Recurring scrape — merge readiness -## Branch status (2026-05-30) +## Branch status (2026-06-04) | Gate | Status | |------|--------| -| Offline smokes (`run-all-smokes.sh`) | 19/19 pass (includes abort exit 134 skip regression) | +| Offline smokes (`run-all-smokes.sh`) | 21/21 pass | | Live proof (`run-operator-proof.sh --sync-gui --target eod_discord`) | Passed on maintainer host | | Monthly cron (`setup-cron.sh`) | Installed (`00 04 1 * *`); dry-run preflight OK for all enabled targets | | Upstream CI (fork PR) | `action_required` until Tyrrrz approves workflow runs | -**Merge-ready** for upstream review. Further feature work should use a new branch; avoid additional `/lfg` passes unless scope changes. - Fork branch `feat/recurring-cli-scrape` adds append-only, Docker-based incremental exports with optional monthly cron. Intended for personal archive trees under a configurable `archive_root` (for example `~/Documents/*`). +**Recent operator tooling (plans 054–059):** `salvage` subcommand, archive-root scrape lock + `scrape-lock-status.sh`, `--salvage-only` / `--salvage-before-scrape` on validation/documents/handoff/proof, lock gate before scrape, `--reclaim-stale` for dead holders. + GUI zip users: [docs/gui-zip-recurring-scrape-bridge.md](gui-zip-recurring-scrape-bridge.md). ## What ships @@ -22,7 +22,7 @@ GUI zip users: [docs/gui-zip-recurring-scrape-bridge.md](gui-zip-recurring-scrap - **Host:** `scripts/run-discord-scrape-host.sh`, `scripts/run-documents-scrape.sh`, `scripts/bootstrap-recurring-scrape.sh` - **Auth:** `scrape.env`, `scripts/setup-scrape-auth.sh`, `scripts/sync-token-from-gui.sh` - **Cron:** `scripts/setup-cron.sh` (`--interval monthly` default) -- **Integrity:** `scripts/audit-archive-json.sh`, `scripts/salvage-truncated-export.sh`, `scripts/prove-incremental-append.sh` +- **Integrity:** `scripts/audit-archive-json.sh`, `scripts/salvage-truncated-export.sh`, `scripts/prove-incremental-append.sh`, `scripts/scrape-lock-status.sh` - **CI:** `.github/workflows/main.yml` job `recurring-scrape-smoke` runs `./scripts/run-all-smokes.sh` ## Validate before merge @@ -63,6 +63,16 @@ Full validation with log (GUI token sync + scrape + audit): ./scripts/run-operator-validation.sh --sync-gui --target eod_discord ./scripts/run-operator-validation.sh --sync-gui --per-target --continue-on-error ./scripts/run-operator-validation.sh --dry-run +./scripts/run-operator-validation.sh --salvage-only --target KotOR_discord_msgs --channel 221726893064454144 +./scripts/run-operator-validation.sh --salvage-before-scrape --target KotOR_discord_msgs --channel 221726893064454144 +``` + +Lock and salvage helpers: + +```bash +./scripts/scrape-lock-status.sh +./scripts/scrape-lock-status.sh --reclaim-stale +./scripts/run-documents-scrape.sh --salvage-before-scrape --target NAME [--channel ID] ``` Detail: [.docs/Recurring-Scrape-Setup.md](../.docs/Recurring-Scrape-Setup.md) · [operator checklist](recurring-scrape-operator-checklist.md) · [troubleshooting](../.docs/Recurring-Scrape-Troubleshooting.md) @@ -121,7 +131,9 @@ DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh --sync-gui --per-target - **Plan 045 (2026-06-04):** `audit-archive-json.sh` and `verify-documents-archives.sh` skip `*/.dce-temp/*` (in-progress partial exports). Salvage run 2026-06-03: 7 merged, 17 unchanged, 3 skipped (+5404 messages); yes_general OOM-skipped with partial temps preserved for next salvage. -**Plan 044 (2026-06-04):** Offline smoke asserts partial temp preserved on OOM skip (channel 134). Host wrapper prefers `DISCORD_TOKEN_FILE` over inherited shell tokens. `run-all-smokes.sh` → 19/19 pass. +**Plan 044 (2026-06-04):** Offline smoke asserts partial temp preserved on OOM skip (channel 134). Host wrapper prefers `DISCORD_TOKEN_FILE` over inherited shell tokens. + +**Plans 054–059 (2026-06-04):** Salvage-only subcommand; archive-root lock with meta sidecar; operator validation/proof/handoff salvage flags; `scrape-lock-status.sh` + `--reclaim-stale`; documents scrape lock gate + `--salvage-before-scrape`. `run-all-smokes.sh` → 21/21 pass. **KotOR / yes_general (plan 040–043):** Incremental `--after` works for all channels; most return `UNCHANGED` in seconds. `yes_general` archive last message was **2021-01-17** — the first catch-up legitimately fetches years of history. Prior bug: OOM skip **deleted** partial temp exports, causing re-download loops. Plan 043 preserves partial temps and salvages on next run. diff --git a/docs/recurring-scrape-operator-checklist.md b/docs/recurring-scrape-operator-checklist.md index 3d70dd92..87b4812d 100644 --- a/docs/recurring-scrape-operator-checklist.md +++ b/docs/recurring-scrape-operator-checklist.md @@ -29,9 +29,37 @@ Installed jobs are marked `# BEGIN discord-scrape` in `crontab -l`. Logs append ```bash ./scripts/run-documents-scrape.sh --target KotOR_discord_msgs +./scripts/run-documents-scrape.sh --target KotOR_discord_msgs --channel CHANNEL_ID ./scripts/setup-cron.sh --target KotOR_discord_msgs --channel CHANNEL_ID ``` +## Scrape lock and salvage + +Only one scrape should run per `archive_root`. Lock file: `{archive_root}/.dce-scrape.lock`. + +```bash +./scripts/scrape-lock-status.sh +./scripts/scrape-lock-status.sh --reclaim-stale # after crashed run; only when stale/free +``` + +Salvage partial exports under `output_dir/.dce-temp/` without calling Discord: + +```bash +./scripts/operator-handoff.sh --salvage-only --target NAME [--channel ID] +./scripts/run-documents-scrape.sh --salvage-only --target NAME [--channel ID] +./scripts/run-operator-validation.sh --salvage-only --target NAME [--channel ID] --log-file logs/salvage.log +``` + +Salvage then incremental scrape: + +```bash +./scripts/run-documents-scrape.sh --salvage-before-scrape --target NAME [--channel ID] +./scripts/run-operator-validation.sh --salvage-before-scrape --target NAME [--channel ID] --log-file logs/scrape.log +./scripts/run-operator-proof.sh --salvage-before-scrape --sync-gui --target NAME +``` + +**KotOR yes_general** (`221726893064454144`): first catch-up after a 2021 archive cursor can take hours and may OOM; salvage preserved partials before retrying. Stop duplicate validation processes (MyBook vs Downloads checkouts share the same lock). + ## GUI zip only See [gui-zip-recurring-scrape-bridge.md](gui-zip-recurring-scrape-bridge.md), run `./scripts/sync-gui-bridge-doc.sh`, or use `../DiscordChatExporter.linux-x64/bootstrap-recurring-scrape.sh`.