DiscordChatExporter/docs/gui-zip-recurring-scrape-bridge.md
Copilot 27e133f692 docs(scrape): sync KotOR wrapper across GUI bridge docs (plan 084)
GUI bridge and troubleshooting lead with run-kotor-yes-general-catchup.sh;
merge-readiness HEAD updated; bridge sync smoke asserts wrapper and 24/24 gate.
2026-06-03 12:02:03 -05:00

3.6 KiB

Recurring scrape — GUI zip users

The DiscordChatExporter.linux-x64 (or similar) GUI zip is only the desktop app. Docker, cron, and append-only JSON merges live in the source repo checked out beside the zip:

DiscordChatExporter.linux-x64/     ← GUI (this folder)
DiscordChatExporter/               ← source repo (scripts, Docker, config)

Sync to the GUI zip folder: ./scripts/sync-gui-bridge-doc.sh (writes ../DiscordChatExporter.linux-x64/RECURRING-SCRAPE.md by default).

Quick start (run from source repo)

cd ../DiscordChatExporter   # or your clone path
./scripts/operator-handoff.sh              # disk + verify + archive dry-run
./scripts/sync-token-from-gui.sh --force   # token from GUI Settings.dat
# or: cp scrape.env.example scrape.env and set DISCORD_TOKEN (never commit)
./scripts/bootstrap-recurring-scrape.sh    # verify, build image, preflight
./scripts/run-documents-scrape.sh          # incremental append-only scrape
./scripts/run-operator-proof.sh --sync-gui --target eod_discord   # one-target handoff + scrape + grow-only proof
./scripts/setup-cron.sh --dry-run          # preview monthly job (default 1st @ 04:00)
./scripts/setup-cron.sh                    # install cron after handoff passes

Optional integrity tools:

./scripts/audit-archive-json.sh
./scripts/scrape-lock-status.sh              # show archive-root scrape lock
./scripts/scrape-lock-status.sh --reclaim-stale   # clear dead-holder lock artifacts
# ./scripts/salvage-truncated-export.sh path/to/export.json

Stuck or crashed export (partial .dce-temp)

After stopping a long run, merge quiescent partial exports before re-downloading history:

./scripts/scrape-lock-status.sh
./scripts/scrape-lock-status.sh --reclaim-stale   # when state is stale

# Merge partial temps only (no Discord)
./scripts/operator-handoff.sh --salvage-only --target KotOR_discord_msgs --channel 221726893064454144

# Salvage then incremental catch-up (one command; 8g cap via scrape-targets.json for KotOR)
./scripts/run-kotor-yes-general-catchup.sh
# Or: --dry-run | --salvage-only | --validation | --prove
# Inspect: ./scripts/print-scrape-summary.sh logs/kotor-yes-general.summary.json

# Manual equivalent (validation with audit):
# DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \
#   --salvage-before-scrape --target KotOR_discord_msgs \
#   --channel 221726893064454144 --log-file logs/kotor-yes-general.log

Or direct documents scrape:

./scripts/run-documents-scrape.sh \
  --salvage-before-scrape \
  --target KotOR_discord_msgs \
  --channel 221726893064454144

If a temp is still being written, stop the export first. To merge an active temp after confirming nothing is writing: DCE_SALVAGE_ACTIVE_TEMPS=1.

Archives: config/scrape-targets.json (typically ~/Documents/* per target output_dir).

Disk: Free several GiB on /home and archive roots before large scrapes (DCE_MIN_FREE_MB, default 1024).

Validate scripts: DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh (24/24 offline smokes; --include-container optional)

Podman (Fedora): install podman-compose when docker compose cannot reach the socket; scripts auto-prefer it.

Wrappers in the GUI zip folder

From DiscordChatExporter.linux-x64/:

./bootstrap-recurring-scrape.sh --dry-run

That script delegates to ../DiscordChatExporter/scripts/bootstrap-recurring-scrape.sh.

More detail