diff --git a/docs/plans/2026-06-04-076-feat-per-target-json-summary-plan.md b/docs/plans/2026-06-04-076-feat-per-target-json-summary-plan.md new file mode 100644 index 00000000..69b720bf --- /dev/null +++ b/docs/plans/2026-06-04-076-feat-per-target-json-summary-plan.md @@ -0,0 +1,63 @@ +--- +title: "feat: Per-target JSON summaries in multi-target loops" +type: feat +status: complete +date: 2026-06-04 +origin: /lfg — plan 075 deferred per-target separate summary files in validation/proof loops +--- + +# feat: Per-target JSON summaries in multi-target loops + +## Summary + +When operator validation runs `--per-target` (all enabled targets) or operator proof scrapes multiple targets, pass `--summary-file` per target so each scrape writes `logs/--.summary.json` instead of overwriting a single combined path. + +## Problem Frame + +Plans 070–075 auto-export JSON summaries for single-target and documents-scrape runs. Multi-target loops still set one global `DCE_RUN_SUMMARY_FILE` tied to the teed log basename — only the last target's scrape wins on disk, and recovery from the combined log cannot disambiguate targets. + +## Requirements + +| ID | Requirement | +|----|-------------| +| R1 | `scripts/lib/scrape-summary-json.sh` exposes `per_target_summary_file LOG_DIR PREFIX TARGET` with sanitized target slug | +| R2 | `run-operator-validation.sh --per-target` (no `--target`) skips global `DCE_RUN_SUMMARY_FILE`; each live scrape passes `--summary-file` | +| R3 | Validation logs `Per-target JSON summary: ` before each live scrape | +| R4 | `run-operator-proof.sh` with 2+ targets uses per-target `--summary-file`; single-target keeps log-basename summary | +| R5 | Proof logs per-target summary path in the target loop when exporting JSON | +| R6 | End-of-run log recovery skipped when per-target mode (files written directly by scrape) | +| R7 | `scrape-summary-json-smoke.sh` asserts helper output shape | +| R8 | `run-operator-validation-smoke.sh` multi-target dry-run still passes; optional fake-docker live per-target asserts two distinct `--summary-file` paths in subprocess output | +| R9 | `DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh` → 23/23 | + +## Implementation Units + +### U1. Shared helper + +**Files:** `scripts/lib/scrape-summary-json.sh`, `scripts/tests/scrape-summary-json-smoke.sh` + +### U2. Operator validation + +**Files:** `scripts/run-operator-validation.sh` + +### U3. Operator proof + +**Files:** `scripts/run-operator-proof.sh` + +### U4. Docs + +**Files:** `docs/recurring-scrape-merge-readiness.md`, `docs/recurring-scrape-operator-checklist.md` + +## Verification + +```bash +DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh +``` + +## Scope Boundaries + +### Deferred + +- Live KotOR catch-up on host +- Tee full documents-scrape stdout to persistent log +- Refresh PR #1538 body with plans 070–076 stamps diff --git a/docs/recurring-scrape-merge-readiness.md b/docs/recurring-scrape-merge-readiness.md index 1bdd3e91..a8a3add6 100644 --- a/docs/recurring-scrape-merge-readiness.md +++ b/docs/recurring-scrape-merge-readiness.md @@ -180,6 +180,8 @@ DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \ **Plan 075 (2026-06-04):** `run-documents-scrape.sh` auto-writes `logs/documents-scrape-.summary.json` on live scrapes. +**Plan 076 (2026-06-04):** Multi-target validation (`--per-target`) and proof loops write separate `logs/operator-*--.summary.json` per scrape. + **Disk:** ~65 GiB free on `/home` (2026-05-30); large channel merges still need headroom. ## CI note (fork PRs) diff --git a/docs/recurring-scrape-operator-checklist.md b/docs/recurring-scrape-operator-checklist.md index e398acda..61f941ef 100644 --- a/docs/recurring-scrape-operator-checklist.md +++ b/docs/recurring-scrape-operator-checklist.md @@ -56,7 +56,8 @@ Salvage then incremental scrape: ./scripts/run-documents-scrape.sh --salvage-before-scrape --target NAME [--channel ID] ./scripts/run-operator-validation.sh --salvage-before-scrape --target NAME [--channel ID] --log-file logs/scrape.log ./scripts/run-operator-proof.sh --salvage-before-scrape --sync-gui --target NAME -# When scraping, also writes logs/operator-proof-.summary.json beside the proof log +# When scraping one target, also writes logs/operator-proof-.summary.json beside the proof log +# All enabled targets: each gets logs/operator-proof--.summary.json ``` **KotOR yes_general** (`221726893064454144`): first catch-up after a 2021 archive cursor can take hours and may OOM; salvage preserved partials before retrying. Stop duplicate validation processes (MyBook vs Downloads checkouts share the same lock). `KotOR_discord_msgs` sets `container_memory: "8g"` in `scrape-targets.json` for single-target runs; override globally with `DCE_CONTAINER_MEMORY` in `scrape.env` if needed. Channel-scoped proof: diff --git a/scripts/lib/scrape-summary-json.sh b/scripts/lib/scrape-summary-json.sh index b8e7b8af..e6b80b2c 100644 --- a/scripts/lib/scrape-summary-json.sh +++ b/scripts/lib/scrape-summary-json.sh @@ -33,3 +33,19 @@ recover_json_summary_if_missing() { [[ -s "$dest_file" ]] && return 1 extract_json_summary_from_log "$run_log" "$dest_file" } + +sanitize_target_slug() { + local raw=$1 + printf '%s' "$raw" | sed 's/[^A-Za-z0-9._-]/_/g' +} + +per_target_summary_file() { + local log_dir=$1 + local prefix=$2 + local target=$3 + local slug + + [[ -n "$log_dir" && -n "$prefix" && -n "$target" ]] || return 1 + slug=$(sanitize_target_slug "$target") + printf '%s/%s-%s-%s.summary.json' "$log_dir" "$prefix" "$slug" "$(date -u +%Y%m%dT%H%M%SZ)" +} diff --git a/scripts/run-operator-proof.sh b/scripts/run-operator-proof.sh index c3ad9d61..7cdb0108 100755 --- a/scripts/run-operator-proof.sh +++ b/scripts/run-operator-proof.sh @@ -9,9 +9,11 @@ HANDOFF="$REPO_ROOT/scripts/operator-handoff.sh" DOCUMENTS="$REPO_ROOT/scripts/run-documents-scrape.sh" PROVE="$REPO_ROOT/scripts/prove-incremental-append.sh" SYNC_GUI="$REPO_ROOT/scripts/sync-token-from-gui.sh" -LOG_DIR="$REPO_ROOT/logs" +LOG_DIR="${DCE_LOG_DIR:-$REPO_ROOT/logs}" # shellcheck source=lib/scrape-run-plan.sh source "$SCRIPT_DIR/lib/scrape-run-plan.sh" +# shellcheck source=lib/scrape-summary-json.sh +source "$SCRIPT_DIR/lib/scrape-summary-json.sh" TARGET="" SYNC_GUI_FLAG=0 @@ -36,8 +38,9 @@ When --target is omitted, all enabled targets in the config are processed. --salvage-before-scrape Merge stale .dce-temp exports before incremental scrape --log-file PATH Append output to this file (default: logs/operator-proof-UTC.log) -Logs append to logs/operator-proof-.log (or --log-file). When scraping, also writes -.summary.json unless DCE_RUN_SUMMARY_FILE is already set. +Logs append to logs/operator-proof-.log (or --log-file). When scraping one target, also writes +.summary.json unless DCE_RUN_SUMMARY_FILE is already set. Multiple targets each get +logs/operator-proof--.summary.json. EOF } @@ -121,11 +124,17 @@ main() { fi local export_json_summary=0 + local per_target_summaries=0 + if ((${#targets[@]} > 1)); then + per_target_summaries=1 + fi if (( DRY_RUN == 0 && SALVAGE_ONLY == 0 )); then export_json_summary=1 export DCE_RUN_SUMMARY_JSON=1 - if [[ -z "${DCE_RUN_SUMMARY_FILE:-}" ]]; then - export DCE_RUN_SUMMARY_FILE="${log_file%.log}.summary.json" + if (( per_target_summaries == 0 )); then + if [[ -z "${DCE_RUN_SUMMARY_FILE:-}" ]]; then + export DCE_RUN_SUMMARY_FILE="${log_file%.log}.summary.json" + fi fi fi @@ -140,7 +149,11 @@ main() { printf 'config: %s\n' "$CONFIG_PATH" print_scrape_config_plan "$CONFIG_PATH" "Operator proof" "${targets[@]}" if (( export_json_summary )); then - printf 'JSON summary file: %s\n' "${DCE_RUN_SUMMARY_FILE:-}" + if (( per_target_summaries )); then + printf 'JSON summaries: per-target under %s\n' "$(dirname "$log_file")" + else + printf 'JSON summary file: %s\n' "${DCE_RUN_SUMMARY_FILE:-}" + fi fi printf 'started: %s\n\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" @@ -168,6 +181,12 @@ main() { printf '\n--- Target: %s ---\n' "$name" local -a scrape_args=(--config "$CONFIG_PATH" --target "$name") scrape_args+=("${CHANNEL_ARGS[@]}") + if (( export_json_summary && per_target_summaries )); then + local summary_file + summary_file=$(per_target_summary_file "$(dirname "$log_file")" operator-proof "$name") + printf 'JSON summary file: %s\n' "$summary_file" + scrape_args+=(--summary-file "$summary_file") + fi if (( SALVAGE_BEFORE )); then if ! "$DOCUMENTS" "${scrape_args[@]}" --salvage-only; then failed=$((failed + 1)) @@ -189,7 +208,7 @@ main() { (( failed == 0 )) || exit 1 } 2>&1 | tee "$log_file" - if (( export_json_summary )) && [[ -n "${DCE_RUN_SUMMARY_FILE:-}" ]]; then + if (( export_json_summary )) && (( per_target_summaries == 0 )) && [[ -n "${DCE_RUN_SUMMARY_FILE:-}" ]]; then # shellcheck source=lib/scrape-summary-json.sh source "$SCRIPT_DIR/lib/scrape-summary-json.sh" if recover_json_summary_if_missing "$log_file" "$DCE_RUN_SUMMARY_FILE"; then diff --git a/scripts/run-operator-validation.sh b/scripts/run-operator-validation.sh index 884aa5a1..dedc330a 100755 --- a/scripts/run-operator-validation.sh +++ b/scripts/run-operator-validation.sh @@ -13,6 +13,8 @@ AUDIT_JSON="$REPO_ROOT/scripts/audit-archive-json.sh" LOCK_STATUS="$REPO_ROOT/scripts/scrape-lock-status.sh" # shellcheck source=lib/scrape-lock.sh source "$SCRIPT_DIR/lib/scrape-lock.sh" +# shellcheck source=lib/scrape-summary-json.sh +source "$SCRIPT_DIR/lib/scrape-summary-json.sh" DRY_RUN=0 SKIP_SCRAPE=0 @@ -167,7 +169,10 @@ scrape_per_target() { continue fi fi - if ! run_step "run-documents-scrape ($name)" "$DOCUMENTS_SCRAPE" "${per_args[@]}"; then + local summary_file + summary_file=$(per_target_summary_file "$LOG_DIR" operator-validation "$name") + log_step "Per-target JSON summary: $summary_file" + if ! run_step "run-documents-scrape ($name)" "$DOCUMENTS_SCRAPE" "${per_args[@]}" --summary-file "$summary_file"; then log_step "Per-target failed: $name (scrape)" failures=$((failures + 1)) if (( CONTINUE_ON_ERROR == 0 )); then @@ -269,11 +274,17 @@ main() { fi local export_json_summary=0 + local per_target_summaries=0 + if (( PER_TARGET )) && [[ -z "$TARGET" ]]; then + per_target_summaries=1 + fi if (( DRY_RUN == 0 && SKIP_SCRAPE == 0 && SALVAGE_ONLY == 0 )); then export_json_summary=1 export DCE_RUN_SUMMARY_JSON=1 - if [[ -z "${DCE_RUN_SUMMARY_FILE:-}" ]]; then - export DCE_RUN_SUMMARY_FILE="${LOG_FILE%.log}.summary.json" + if (( per_target_summaries == 0 )); then + if [[ -z "${DCE_RUN_SUMMARY_FILE:-}" ]]; then + export DCE_RUN_SUMMARY_FILE="${LOG_FILE%.log}.summary.json" + fi fi fi @@ -291,7 +302,11 @@ main() { log_step "Enabled targets: $(enabled_targets | paste -sd, -)" fi if (( export_json_summary )); then - log_step "JSON summary file: ${DCE_RUN_SUMMARY_FILE:-}" + if (( per_target_summaries )); then + log_step "JSON summaries: per-target under $LOG_DIR" + else + log_step "JSON summary file: ${DCE_RUN_SUMMARY_FILE:-}" + fi fi if (( SYNC_GUI_FLAG )); then run_step "sync-token-from-gui" "$SYNC_GUI" --force || failures=$((failures + 1)) @@ -326,7 +341,7 @@ main() { } 2>&1 | tee -a "$LOG_FILE" local pipeline_status=${PIPESTATUS[0]} - if (( export_json_summary )) && [[ -n "${DCE_RUN_SUMMARY_FILE:-}" ]]; then + if (( export_json_summary )) && (( per_target_summaries == 0 )) && [[ -n "${DCE_RUN_SUMMARY_FILE:-}" ]]; then # shellcheck source=lib/scrape-summary-json.sh source "$SCRIPT_DIR/lib/scrape-summary-json.sh" if recover_json_summary_if_missing "$LOG_FILE" "$DCE_RUN_SUMMARY_FILE"; then diff --git a/scripts/tests/scrape-summary-json-smoke.sh b/scripts/tests/scrape-summary-json-smoke.sh index 0d2550bf..c65dfbb5 100755 --- a/scripts/tests/scrape-summary-json-smoke.sh +++ b/scripts/tests/scrape-summary-json-smoke.sh @@ -77,4 +77,16 @@ if extract_json_summary_from_log "$TMP_DIR/bad.log" "$OUT_FILE" 2>/dev/null; the exit 1 fi +path=$(per_target_summary_file "$TMP_DIR" operator-validation 'KotOR_discord_msgs') +[[ "$path" == "$TMP_DIR/operator-validation-KotOR_discord_msgs-"*.summary.json ]] || { + printf 'ERROR: unexpected per_target_summary_file path: %s\n' "$path" >&2 + exit 1 +} + +slug_path=$(per_target_summary_file "$TMP_DIR" operator-proof 'weird name!') +[[ "$slug_path" == "$TMP_DIR/operator-proof-weird_name_-"*.summary.json ]] || { + printf 'ERROR: expected sanitized slug in path: %s\n' "$slug_path" >&2 + exit 1 +} + printf 'scrape-summary-json-smoke: ok\n'