mirror of
https://github.com/Tyrrrz/DiscordChatExporter.git
synced 2026-06-09 15:52:37 -06:00
feat(scrape): add KotOR yes_general catch-up wrapper (plan 083)
One-command operator entry for channel 221726893064454144 with salvage-before-scrape, log/summary defaults, and dry-run/validation/prove modes. Smoke suite now 24/24.
This commit is contained in:
parent
65b12ff174
commit
7171d7b6e3
|
|
@ -336,7 +336,7 @@ Space requirements:
|
||||||
|
|
||||||
## Smoke test validation
|
## Smoke test validation
|
||||||
|
|
||||||
Run the full offline suite from the repo root (requires `jq`). **23 offline smokes** run by default; add `--include-container` for a 24th local-only check:
|
Run the full offline suite from the repo root (requires `jq`). **24 offline smokes** run by default; add `--include-container` for a 25th local-only check:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
./scripts/run-all-smokes.sh
|
./scripts/run-all-smokes.sh
|
||||||
|
|
@ -366,6 +366,7 @@ With Docker/Podman, include the container smoke:
|
||||||
| `end-to-end-preflight-smoke.sh` | yes | Preflight wiring |
|
| `end-to-end-preflight-smoke.sh` | yes | Preflight wiring |
|
||||||
| `error-path-smoke.sh` | yes | Failure paths |
|
| `error-path-smoke.sh` | yes | Failure paths |
|
||||||
| `gh-approve-pr-runs-smoke.sh` | yes | Fork PR workflow helper |
|
| `gh-approve-pr-runs-smoke.sh` | yes | Fork PR workflow helper |
|
||||||
|
| `kotor-yes-general-catchup-smoke.sh` | yes | KotOR yes_general wrapper dry-run |
|
||||||
| `operator-handoff-smoke.sh` | yes | Operator handoff dry-run |
|
| `operator-handoff-smoke.sh` | yes | Operator handoff dry-run |
|
||||||
| `print-scrape-summary-smoke.sh` | yes | JSON summary pretty-print CLI |
|
| `print-scrape-summary-smoke.sh` | yes | JSON summary pretty-print CLI |
|
||||||
| `prove-incremental-append-smoke.sh` | yes | Offline prove snapshot/compare |
|
| `prove-incremental-append-smoke.sh` | yes | Offline prove snapshot/compare |
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,51 @@
|
||||||
|
---
|
||||||
|
title: "feat: KotOR yes_general catch-up wrapper"
|
||||||
|
type: feat
|
||||||
|
status: complete
|
||||||
|
date: 2026-06-04
|
||||||
|
origin: /lfg — plan 082 deferred live KotOR catch-up; encode operator path as one script
|
||||||
|
---
|
||||||
|
|
||||||
|
# feat: KotOR yes_general catch-up wrapper
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Add `scripts/run-kotor-yes-general-catchup.sh` — thin wrapper for channel `221726893064454144` with default log/summary paths, `--salvage-before-scrape`, and subcommands for validation/prove/salvage-only/dry-run.
|
||||||
|
|
||||||
|
## Problem Frame
|
||||||
|
|
||||||
|
KotOR yes_general catch-up is documented in five places with long flag chains. Operators need one entry point; LFG still cannot run live Discord scrape in CI.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
| ID | Requirement |
|
||||||
|
|----|-------------|
|
||||||
|
| R1 | Default live run: `--salvage-before-scrape` + documents scrape + `--log-file logs/kotor-yes-general.log` |
|
||||||
|
| R2 | `--dry-run`, `--salvage-only`, `--validation`, `--prove` modes |
|
||||||
|
| R3 | `--log-file` and `--config` overrides |
|
||||||
|
| R4 | Prints summary inspect hint after live scrape |
|
||||||
|
| R5 | `kotor-yes-general-catchup-smoke.sh` dry-run passes offline |
|
||||||
|
| R6 | Docs updated; smoke count **24/24** in setup doc + merge-readiness |
|
||||||
|
| R7 | `DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh` → 24/24 |
|
||||||
|
|
||||||
|
## Implementation Units
|
||||||
|
|
||||||
|
### U1. Wrapper script
|
||||||
|
|
||||||
|
**Files:** `scripts/run-kotor-yes-general-catchup.sh`, `scripts/tests/kotor-yes-general-catchup-smoke.sh`
|
||||||
|
|
||||||
|
### U2. Docs
|
||||||
|
|
||||||
|
**Files:** `docs/recurring-scrape-merge-readiness.md`, `docs/recurring-scrape-operator-checklist.md`, `.docs/Recurring-Scrape-Setup.md`
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scope Boundaries
|
||||||
|
|
||||||
|
### Deferred
|
||||||
|
|
||||||
|
- Running live KotOR catch-up inside LFG/CI (operator host only)
|
||||||
|
|
@ -4,7 +4,7 @@
|
||||||
|
|
||||||
| Gate | Status |
|
| Gate | Status |
|
||||||
|------|--------|
|
|------|--------|
|
||||||
| Offline smokes (`run-all-smokes.sh`) | 23/23 pass |
|
| Offline smokes (`run-all-smokes.sh`) | 24/24 pass |
|
||||||
| Branch HEAD (fork) | `18a22a6` — PR #1538 pruned stale Latest blocks (plan 082) |
|
| Branch HEAD (fork) | `18a22a6` — PR #1538 pruned stale Latest blocks (plan 082) |
|
||||||
| Live proof (`run-operator-proof.sh --sync-gui --target eod_discord`) | Passed on maintainer host |
|
| Live proof (`run-operator-proof.sh --sync-gui --target eod_discord`) | Passed on maintainer host |
|
||||||
| Monthly cron (`setup-cron.sh`) | Installed (`00 04 1 * *`); dry-run preflight OK for all enabled targets |
|
| Monthly cron (`setup-cron.sh`) | Installed (`00 04 1 * *`); dry-run preflight OK for all enabled targets |
|
||||||
|
|
@ -147,6 +147,13 @@ DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh --target KotOR_discord_ms
|
||||||
|
|
||||||
Large `yes_general` may still skip without a higher container cap; `KotOR_discord_msgs` sets `container_memory: "8g"` in `scrape-targets.json` for single-target runs (override globally with `DCE_CONTAINER_MEMORY` in `scrape.env`):
|
Large `yes_general` may still skip without a higher container cap; `KotOR_discord_msgs` sets `container_memory: "8g"` in `scrape-targets.json` for single-target runs (override globally with `DCE_CONTAINER_MEMORY` in `scrape.env`):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/run-kotor-yes-general-catchup.sh
|
||||||
|
# writes logs/kotor-yes-general.log + .summary.json; --dry-run | --validation | --prove
|
||||||
|
```
|
||||||
|
|
||||||
|
Manual equivalent:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \
|
DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \
|
||||||
--salvage-before-scrape --target KotOR_discord_msgs \
|
--salvage-before-scrape --target KotOR_discord_msgs \
|
||||||
|
|
@ -195,6 +202,8 @@ DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \
|
||||||
|
|
||||||
**Plan 082 (2026-06-04):** PR #1538 pruned 30+ stale `Latest` blocks; single plans 070–081 operator delta remains.
|
**Plan 082 (2026-06-04):** PR #1538 pruned 30+ stale `Latest` blocks; single plans 070–081 operator delta remains.
|
||||||
|
|
||||||
|
**Plan 083 (2026-06-04):** `run-kotor-yes-general-catchup.sh` — one-command yes_general path (salvage-before, log, summary hint).
|
||||||
|
|
||||||
**Disk:** ~65 GiB free on `/home` (2026-05-30); large channel merges still need headroom.
|
**Disk:** ~65 GiB free on `/home` (2026-05-30); large channel merges still need headroom.
|
||||||
|
|
||||||
## CI note (fork PRs)
|
## CI note (fork PRs)
|
||||||
|
|
|
||||||
|
|
@ -60,7 +60,14 @@ Salvage then incremental scrape:
|
||||||
# Live documents scrape auto-tees to logs/documents-scrape-<UTC>.log (or --log-file); summary at <log-basename>.summary.json
|
# Live documents scrape auto-tees to logs/documents-scrape-<UTC>.log (or --log-file); summary at <log-basename>.summary.json
|
||||||
```
|
```
|
||||||
|
|
||||||
**KotOR yes_general** (`221726893064454144`): first catch-up after a 2021 archive cursor can take hours and may OOM; salvage preserved partials before retrying. Stop duplicate validation processes (MyBook vs Downloads checkouts share the same lock). `KotOR_discord_msgs` sets `container_memory: "8g"` in `scrape-targets.json` for single-target runs; override globally with `DCE_CONTAINER_MEMORY` in `scrape.env` if needed. Channel-scoped proof:
|
**KotOR yes_general** (`221726893064454144`): first catch-up after a 2021 archive cursor can take hours and may OOM; salvage preserved partials before retrying. One-command path:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/run-kotor-yes-general-catchup.sh
|
||||||
|
# Or: --dry-run | --salvage-only | --validation | --prove
|
||||||
|
```
|
||||||
|
|
||||||
|
Manual equivalent:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
./scripts/run-operator-validation.sh --salvage-before-scrape \
|
./scripts/run-operator-validation.sh --salvage-before-scrape \
|
||||||
|
|
|
||||||
122
scripts/run-kotor-yes-general-catchup.sh
Executable file
122
scripts/run-kotor-yes-general-catchup.sh
Executable file
|
|
@ -0,0 +1,122 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
set -Eeuo pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)
|
||||||
|
REPO_ROOT="${DCE_REPO_ROOT:-$(cd "$SCRIPT_DIR/.." && pwd -P)}"
|
||||||
|
CONFIG_PATH="${DCE_CONFIG_FILE:-$REPO_ROOT/config/scrape-targets.json}"
|
||||||
|
LOG_DIR="${DCE_LOG_DIR:-$REPO_ROOT/logs}"
|
||||||
|
LOG_FILE="${DCE_KOTOR_LOG_FILE:-$LOG_DIR/kotor-yes-general.log}"
|
||||||
|
DOCUMENTS="$REPO_ROOT/scripts/run-documents-scrape.sh"
|
||||||
|
VALIDATION="$REPO_ROOT/scripts/run-operator-validation.sh"
|
||||||
|
PROVE="$REPO_ROOT/scripts/prove-incremental-append.sh"
|
||||||
|
PRINT_SUMMARY="$REPO_ROOT/scripts/print-scrape-summary.sh"
|
||||||
|
|
||||||
|
TARGET=KotOR_discord_msgs
|
||||||
|
CHANNEL=221726893064454144
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
cat <<EOF
|
||||||
|
Usage:
|
||||||
|
$(basename "$0") [options]
|
||||||
|
|
||||||
|
Focused operator entry for KotOR yes_general (channel $CHANNEL):
|
||||||
|
salvage-before-scrape → incremental documents scrape → optional summary hint
|
||||||
|
|
||||||
|
Options:
|
||||||
|
--dry-run Archive verify only (no Discord)
|
||||||
|
--salvage-only Merge stale .dce-temp exports only
|
||||||
|
--validation Run operator-validation (salvage-before-scrape + audit)
|
||||||
|
--prove Run prove-incremental-append for this channel only
|
||||||
|
--log-file PATH Log for scrape/validation (default: logs/kotor-yes-general.log)
|
||||||
|
--config PATH Targets JSON (default: config/scrape-targets.json)
|
||||||
|
--help Show this help text
|
||||||
|
|
||||||
|
Live scrape (default):
|
||||||
|
$(basename "$0")
|
||||||
|
|
||||||
|
After a live run, inspect:
|
||||||
|
./scripts/print-scrape-summary.sh logs/kotor-yes-general.summary.json
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
die() {
|
||||||
|
printf 'ERROR: %s\n' "$*" >&2
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
main() {
|
||||||
|
local dry_run=0 salvage_only=0 validation=0 prove=0
|
||||||
|
|
||||||
|
while (($#)); do
|
||||||
|
case "$1" in
|
||||||
|
--dry-run)
|
||||||
|
dry_run=1
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--salvage-only)
|
||||||
|
salvage_only=1
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--validation)
|
||||||
|
validation=1
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--prove)
|
||||||
|
prove=1
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--log-file)
|
||||||
|
[[ $# -ge 2 ]] || die "Missing value for --log-file."
|
||||||
|
LOG_FILE=$2
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--config)
|
||||||
|
[[ $# -ge 2 ]] || die "Missing value for --config."
|
||||||
|
CONFIG_PATH=$2
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--help|-h)
|
||||||
|
usage
|
||||||
|
exit 0
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
die "Unknown option: $1"
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
|
||||||
|
local modes=0
|
||||||
|
(( dry_run == 1 )) && modes=$((modes + 1))
|
||||||
|
(( salvage_only == 1 )) && modes=$((modes + 1))
|
||||||
|
(( validation == 1 )) && modes=$((modes + 1))
|
||||||
|
(( prove == 1 )) && modes=$((modes + 1))
|
||||||
|
(( modes > 1 )) && die "Use only one of --dry-run, --salvage-only, --validation, or --prove."
|
||||||
|
|
||||||
|
local -a common=(--config "$CONFIG_PATH" --target "$TARGET" --channel "$CHANNEL")
|
||||||
|
|
||||||
|
if (( prove == 1 )); then
|
||||||
|
exec "$PROVE" "${common[@]}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if (( validation == 1 )); then
|
||||||
|
exec "$VALIDATION" --salvage-before-scrape "${common[@]}" --log-file "$LOG_FILE"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if (( dry_run == 1 )); then
|
||||||
|
exec "$DOCUMENTS" --dry-run "${common[@]}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if (( salvage_only == 1 )); then
|
||||||
|
exec "$DOCUMENTS" --salvage-only "${common[@]}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf 'KotOR yes_general catch-up: target=%s channel=%s\n' "$TARGET" "$CHANNEL"
|
||||||
|
printf 'Log file: %s\n' "$LOG_FILE"
|
||||||
|
"$DOCUMENTS" --salvage-before-scrape "${common[@]}" --log-file "$LOG_FILE"
|
||||||
|
local st=$?
|
||||||
|
printf 'Inspect summary: %s %s.summary.json\n' "$PRINT_SUMMARY" "${LOG_FILE%.log}"
|
||||||
|
exit "$st"
|
||||||
|
}
|
||||||
|
|
||||||
|
main "$@"
|
||||||
61
scripts/tests/kotor-yes-general-catchup-smoke.sh
Executable file
61
scripts/tests/kotor-yes-general-catchup-smoke.sh
Executable file
|
|
@ -0,0 +1,61 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
set -Eeuo pipefail
|
||||||
|
|
||||||
|
REPO_ROOT=$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd -P)
|
||||||
|
RUNNER="$REPO_ROOT/scripts/run-kotor-yes-general-catchup.sh"
|
||||||
|
TMP_DIR=$(mktemp -d "${TMPDIR:-/tmp}/dce-kotor-catchup-smoke.XXXXXX")
|
||||||
|
ARCHIVE="$TMP_DIR/archive/kotor"
|
||||||
|
CONFIG_PATH="$TMP_DIR/config.json"
|
||||||
|
|
||||||
|
cleanup() {
|
||||||
|
rm -rf "$TMP_DIR"
|
||||||
|
}
|
||||||
|
trap cleanup EXIT
|
||||||
|
|
||||||
|
mkdir -p "$ARCHIVE"
|
||||||
|
printf '{"messages":[{"id":"1"}],"channel":{"id":"221726893064454144"}}\n' \
|
||||||
|
>"$ARCHIVE/Guild - yes_general [221726893064454144].json"
|
||||||
|
|
||||||
|
cat >"$CONFIG_PATH" <<JSON
|
||||||
|
{
|
||||||
|
"archive_root": "$TMP_DIR/archive",
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"name": "KotOR_discord_msgs",
|
||||||
|
"kind": "guild",
|
||||||
|
"output_dir": "$ARCHIVE",
|
||||||
|
"enabled": true
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
JSON
|
||||||
|
|
||||||
|
OUT="$TMP_DIR/dry-run.log"
|
||||||
|
"$RUNNER" --dry-run --config "$CONFIG_PATH" >"$OUT" 2>&1
|
||||||
|
|
||||||
|
grep -q 'KotOR_discord_msgs' "$OUT" || {
|
||||||
|
printf 'ERROR: dry-run missing target in plan output\n' >&2
|
||||||
|
cat "$OUT" >&2
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
grep -q '221726893064454144' "$RUNNER" || {
|
||||||
|
printf 'ERROR: wrapper missing yes_general channel id\n' >&2
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
grep -q 'Documents scrape run plan' "$OUT" || {
|
||||||
|
printf 'ERROR: dry-run missing documents scrape plan\n' >&2
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
grep -q 'JSON summary file:' "$OUT" && {
|
||||||
|
printf 'ERROR: dry-run should not enable JSON summary\n' >&2
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
HELP=$("$RUNNER" --help 2>&1)
|
||||||
|
grep -q 'yes_general' <<<"$HELP" || {
|
||||||
|
printf 'ERROR: help missing yes_general reference\n' >&2
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
printf 'kotor-yes-general-catchup-smoke: ok\n'
|
||||||
Loading…
Reference in a new issue