mirror of
https://github.com/Tyrrrz/DiscordChatExporter.git
synced 2026-06-09 15:52:37 -06:00
docs(scrape): add salvage and lock operator playbook
Document scrape-lock-status, reclaim-stale, and salvage-before flags in operator checklist, merge-readiness, and GUI bridge guide.
This commit is contained in:
parent
e82007a2c5
commit
ad5384ecc1
|
|
@ -27,14 +27,46 @@ Optional integrity tools:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
./scripts/audit-archive-json.sh
|
./scripts/audit-archive-json.sh
|
||||||
|
./scripts/scrape-lock-status.sh # show archive-root scrape lock
|
||||||
|
./scripts/scrape-lock-status.sh --reclaim-stale # clear dead-holder lock artifacts
|
||||||
# ./scripts/salvage-truncated-export.sh path/to/export.json
|
# ./scripts/salvage-truncated-export.sh path/to/export.json
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Stuck or crashed export (partial `.dce-temp`)
|
||||||
|
|
||||||
|
After stopping a long run, merge quiescent partial exports before re-downloading history:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/scrape-lock-status.sh
|
||||||
|
./scripts/scrape-lock-status.sh --reclaim-stale # when state is stale
|
||||||
|
|
||||||
|
# Merge partial temps only (no Discord)
|
||||||
|
./scripts/operator-handoff.sh --salvage-only --target KotOR_discord_msgs --channel 221726893064454144
|
||||||
|
|
||||||
|
# Salvage then incremental catch-up (with audit + log)
|
||||||
|
DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \
|
||||||
|
--salvage-before-scrape \
|
||||||
|
--target KotOR_discord_msgs \
|
||||||
|
--channel 221726893064454144 \
|
||||||
|
--log-file logs/kotor-yes-general-$(date -u +%Y%m%d-%H%M%S).log
|
||||||
|
```
|
||||||
|
|
||||||
|
Or direct documents scrape:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/run-documents-scrape.sh \
|
||||||
|
--salvage-before-scrape \
|
||||||
|
--target KotOR_discord_msgs \
|
||||||
|
--channel 221726893064454144
|
||||||
|
```
|
||||||
|
|
||||||
|
If a temp is still being written, stop the export first. To merge an active temp after confirming nothing is writing: `DCE_SALVAGE_ACTIVE_TEMPS=1`.
|
||||||
|
|
||||||
Archives: `config/scrape-targets.json` (typically `~/Documents/*` per target `output_dir`).
|
Archives: `config/scrape-targets.json` (typically `~/Documents/*` per target `output_dir`).
|
||||||
|
|
||||||
**Disk:** Free several GiB on `/home` and archive roots before large scrapes (`DCE_MIN_FREE_MB`, default 1024).
|
**Disk:** Free several GiB on `/home` and archive roots before large scrapes (`DCE_MIN_FREE_MB`, default 1024).
|
||||||
|
|
||||||
**Validate scripts:** `DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh` (19 offline smokes)
|
**Validate scripts:** `DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh` (21 offline smokes)
|
||||||
|
|
||||||
**Podman (Fedora):** install `podman-compose` when `docker compose` cannot reach the socket; scripts auto-prefer it.
|
**Podman (Fedora):** install `podman-compose` when `docker compose` cannot reach the socket; scripts auto-prefer it.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,39 @@
|
||||||
|
---
|
||||||
|
title: "docs: Salvage and lock operator playbook"
|
||||||
|
type: docs
|
||||||
|
status: active
|
||||||
|
date: 2026-06-04
|
||||||
|
origin: /lfg — plans 054–059 landed salvage/lock tooling; operator docs still show 19 smokes and omit catch-up playbook
|
||||||
|
---
|
||||||
|
|
||||||
|
# docs: Salvage and lock operator playbook
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Refresh operator-facing docs with scrape lock diagnostics, salvage flags, and KotOR yes_general catch-up playbook. Sync GUI bridge copy.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
| ID | Requirement |
|
||||||
|
|----|-------------|
|
||||||
|
| R1 | `gui-zip-recurring-scrape-bridge.md` documents lock status, reclaim, salvage-only, salvage-before |
|
||||||
|
| R2 | `recurring-scrape-operator-checklist.md` adds stuck-channel / partial temp section |
|
||||||
|
| R3 | `recurring-scrape-merge-readiness.md` reflects 21 smokes and plans 054–059 |
|
||||||
|
| R4 | Run `sync-gui-bridge-doc.sh` when sibling GUI zip path exists |
|
||||||
|
|
||||||
|
## Implementation Units
|
||||||
|
|
||||||
|
### U1. Operator doc updates
|
||||||
|
|
||||||
|
**Files:** `docs/gui-zip-recurring-scrape-bridge.md`, `docs/recurring-scrape-operator-checklist.md`, `docs/recurring-scrape-merge-readiness.md`
|
||||||
|
|
||||||
|
### U2. GUI bridge sync
|
||||||
|
|
||||||
|
**Command:** `./scripts/sync-gui-bridge-doc.sh`
|
||||||
|
|
||||||
|
## Scope Boundaries
|
||||||
|
|
||||||
|
### Deferred
|
||||||
|
|
||||||
|
- Live KotOR catch-up execution on host
|
||||||
|
- `.docs/Recurring-Scrape-Setup.md` full rewrite
|
||||||
|
|
@ -1,18 +1,18 @@
|
||||||
# Recurring scrape — merge readiness
|
# Recurring scrape — merge readiness
|
||||||
|
|
||||||
## Branch status (2026-05-30)
|
## Branch status (2026-06-04)
|
||||||
|
|
||||||
| Gate | Status |
|
| Gate | Status |
|
||||||
|------|--------|
|
|------|--------|
|
||||||
| Offline smokes (`run-all-smokes.sh`) | 19/19 pass (includes abort exit 134 skip regression) |
|
| Offline smokes (`run-all-smokes.sh`) | 21/21 pass |
|
||||||
| Live proof (`run-operator-proof.sh --sync-gui --target eod_discord`) | Passed on maintainer host |
|
| Live proof (`run-operator-proof.sh --sync-gui --target eod_discord`) | Passed on maintainer host |
|
||||||
| Monthly cron (`setup-cron.sh`) | Installed (`00 04 1 * *`); dry-run preflight OK for all enabled targets |
|
| Monthly cron (`setup-cron.sh`) | Installed (`00 04 1 * *`); dry-run preflight OK for all enabled targets |
|
||||||
| Upstream CI (fork PR) | `action_required` until Tyrrrz approves workflow runs |
|
| Upstream CI (fork PR) | `action_required` until Tyrrrz approves workflow runs |
|
||||||
|
|
||||||
**Merge-ready** for upstream review. Further feature work should use a new branch; avoid additional `/lfg` passes unless scope changes.
|
|
||||||
|
|
||||||
Fork branch `feat/recurring-cli-scrape` adds append-only, Docker-based incremental exports with optional monthly cron. Intended for personal archive trees under a configurable `archive_root` (for example `~/Documents/*`).
|
Fork branch `feat/recurring-cli-scrape` adds append-only, Docker-based incremental exports with optional monthly cron. Intended for personal archive trees under a configurable `archive_root` (for example `~/Documents/*`).
|
||||||
|
|
||||||
|
**Recent operator tooling (plans 054–059):** `salvage` subcommand, archive-root scrape lock + `scrape-lock-status.sh`, `--salvage-only` / `--salvage-before-scrape` on validation/documents/handoff/proof, lock gate before scrape, `--reclaim-stale` for dead holders.
|
||||||
|
|
||||||
GUI zip users: [docs/gui-zip-recurring-scrape-bridge.md](gui-zip-recurring-scrape-bridge.md).
|
GUI zip users: [docs/gui-zip-recurring-scrape-bridge.md](gui-zip-recurring-scrape-bridge.md).
|
||||||
|
|
||||||
## What ships
|
## What ships
|
||||||
|
|
@ -22,7 +22,7 @@ GUI zip users: [docs/gui-zip-recurring-scrape-bridge.md](gui-zip-recurring-scrap
|
||||||
- **Host:** `scripts/run-discord-scrape-host.sh`, `scripts/run-documents-scrape.sh`, `scripts/bootstrap-recurring-scrape.sh`
|
- **Host:** `scripts/run-discord-scrape-host.sh`, `scripts/run-documents-scrape.sh`, `scripts/bootstrap-recurring-scrape.sh`
|
||||||
- **Auth:** `scrape.env`, `scripts/setup-scrape-auth.sh`, `scripts/sync-token-from-gui.sh`
|
- **Auth:** `scrape.env`, `scripts/setup-scrape-auth.sh`, `scripts/sync-token-from-gui.sh`
|
||||||
- **Cron:** `scripts/setup-cron.sh` (`--interval monthly` default)
|
- **Cron:** `scripts/setup-cron.sh` (`--interval monthly` default)
|
||||||
- **Integrity:** `scripts/audit-archive-json.sh`, `scripts/salvage-truncated-export.sh`, `scripts/prove-incremental-append.sh`
|
- **Integrity:** `scripts/audit-archive-json.sh`, `scripts/salvage-truncated-export.sh`, `scripts/prove-incremental-append.sh`, `scripts/scrape-lock-status.sh`
|
||||||
- **CI:** `.github/workflows/main.yml` job `recurring-scrape-smoke` runs `./scripts/run-all-smokes.sh`
|
- **CI:** `.github/workflows/main.yml` job `recurring-scrape-smoke` runs `./scripts/run-all-smokes.sh`
|
||||||
|
|
||||||
## Validate before merge
|
## Validate before merge
|
||||||
|
|
@ -63,6 +63,16 @@ Full validation with log (GUI token sync + scrape + audit):
|
||||||
./scripts/run-operator-validation.sh --sync-gui --target eod_discord
|
./scripts/run-operator-validation.sh --sync-gui --target eod_discord
|
||||||
./scripts/run-operator-validation.sh --sync-gui --per-target --continue-on-error
|
./scripts/run-operator-validation.sh --sync-gui --per-target --continue-on-error
|
||||||
./scripts/run-operator-validation.sh --dry-run
|
./scripts/run-operator-validation.sh --dry-run
|
||||||
|
./scripts/run-operator-validation.sh --salvage-only --target KotOR_discord_msgs --channel 221726893064454144
|
||||||
|
./scripts/run-operator-validation.sh --salvage-before-scrape --target KotOR_discord_msgs --channel 221726893064454144
|
||||||
|
```
|
||||||
|
|
||||||
|
Lock and salvage helpers:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/scrape-lock-status.sh
|
||||||
|
./scripts/scrape-lock-status.sh --reclaim-stale
|
||||||
|
./scripts/run-documents-scrape.sh --salvage-before-scrape --target NAME [--channel ID]
|
||||||
```
|
```
|
||||||
|
|
||||||
Detail: [.docs/Recurring-Scrape-Setup.md](../.docs/Recurring-Scrape-Setup.md) · [operator checklist](recurring-scrape-operator-checklist.md) · [troubleshooting](../.docs/Recurring-Scrape-Troubleshooting.md)
|
Detail: [.docs/Recurring-Scrape-Setup.md](../.docs/Recurring-Scrape-Setup.md) · [operator checklist](recurring-scrape-operator-checklist.md) · [troubleshooting](../.docs/Recurring-Scrape-Troubleshooting.md)
|
||||||
|
|
@ -121,7 +131,9 @@ DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh --sync-gui --per-target -
|
||||||
|
|
||||||
**Plan 045 (2026-06-04):** `audit-archive-json.sh` and `verify-documents-archives.sh` skip `*/.dce-temp/*` (in-progress partial exports). Salvage run 2026-06-03: 7 merged, 17 unchanged, 3 skipped (+5404 messages); yes_general OOM-skipped with partial temps preserved for next salvage.
|
**Plan 045 (2026-06-04):** `audit-archive-json.sh` and `verify-documents-archives.sh` skip `*/.dce-temp/*` (in-progress partial exports). Salvage run 2026-06-03: 7 merged, 17 unchanged, 3 skipped (+5404 messages); yes_general OOM-skipped with partial temps preserved for next salvage.
|
||||||
|
|
||||||
**Plan 044 (2026-06-04):** Offline smoke asserts partial temp preserved on OOM skip (channel 134). Host wrapper prefers `DISCORD_TOKEN_FILE` over inherited shell tokens. `run-all-smokes.sh` → 19/19 pass.
|
**Plan 044 (2026-06-04):** Offline smoke asserts partial temp preserved on OOM skip (channel 134). Host wrapper prefers `DISCORD_TOKEN_FILE` over inherited shell tokens.
|
||||||
|
|
||||||
|
**Plans 054–059 (2026-06-04):** Salvage-only subcommand; archive-root lock with meta sidecar; operator validation/proof/handoff salvage flags; `scrape-lock-status.sh` + `--reclaim-stale`; documents scrape lock gate + `--salvage-before-scrape`. `run-all-smokes.sh` → 21/21 pass.
|
||||||
|
|
||||||
**KotOR / yes_general (plan 040–043):** Incremental `--after` works for all channels; most return `UNCHANGED` in seconds. `yes_general` archive last message was **2021-01-17** — the first catch-up legitimately fetches years of history. Prior bug: OOM skip **deleted** partial temp exports, causing re-download loops. Plan 043 preserves partial temps and salvages on next run.
|
**KotOR / yes_general (plan 040–043):** Incremental `--after` works for all channels; most return `UNCHANGED` in seconds. `yes_general` archive last message was **2021-01-17** — the first catch-up legitimately fetches years of history. Prior bug: OOM skip **deleted** partial temp exports, causing re-download loops. Plan 043 preserves partial temps and salvages on next run.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -29,9 +29,37 @@ Installed jobs are marked `# BEGIN discord-scrape` in `crontab -l`. Logs append
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
./scripts/run-documents-scrape.sh --target KotOR_discord_msgs
|
./scripts/run-documents-scrape.sh --target KotOR_discord_msgs
|
||||||
|
./scripts/run-documents-scrape.sh --target KotOR_discord_msgs --channel CHANNEL_ID
|
||||||
./scripts/setup-cron.sh --target KotOR_discord_msgs --channel CHANNEL_ID
|
./scripts/setup-cron.sh --target KotOR_discord_msgs --channel CHANNEL_ID
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Scrape lock and salvage
|
||||||
|
|
||||||
|
Only one scrape should run per `archive_root`. Lock file: `{archive_root}/.dce-scrape.lock`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/scrape-lock-status.sh
|
||||||
|
./scripts/scrape-lock-status.sh --reclaim-stale # after crashed run; only when stale/free
|
||||||
|
```
|
||||||
|
|
||||||
|
Salvage partial exports under `output_dir/.dce-temp/` without calling Discord:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/operator-handoff.sh --salvage-only --target NAME [--channel ID]
|
||||||
|
./scripts/run-documents-scrape.sh --salvage-only --target NAME [--channel ID]
|
||||||
|
./scripts/run-operator-validation.sh --salvage-only --target NAME [--channel ID] --log-file logs/salvage.log
|
||||||
|
```
|
||||||
|
|
||||||
|
Salvage then incremental scrape:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/run-documents-scrape.sh --salvage-before-scrape --target NAME [--channel ID]
|
||||||
|
./scripts/run-operator-validation.sh --salvage-before-scrape --target NAME [--channel ID] --log-file logs/scrape.log
|
||||||
|
./scripts/run-operator-proof.sh --salvage-before-scrape --sync-gui --target NAME
|
||||||
|
```
|
||||||
|
|
||||||
|
**KotOR yes_general** (`221726893064454144`): first catch-up after a 2021 archive cursor can take hours and may OOM; salvage preserved partials before retrying. Stop duplicate validation processes (MyBook vs Downloads checkouts share the same lock).
|
||||||
|
|
||||||
## GUI zip only
|
## GUI zip only
|
||||||
|
|
||||||
See [gui-zip-recurring-scrape-bridge.md](gui-zip-recurring-scrape-bridge.md), run `./scripts/sync-gui-bridge-doc.sh`, or use `../DiscordChatExporter.linux-x64/bootstrap-recurring-scrape.sh`.
|
See [gui-zip-recurring-scrape-bridge.md](gui-zip-recurring-scrape-bridge.md), run `./scripts/sync-gui-bridge-doc.sh`, or use `../DiscordChatExporter.linux-x64/bootstrap-recurring-scrape.sh`.
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue