feat(scrape): run plan, channel ledger, and all-target proof

Log scrape plan/summary with per-file message deltas in the core script.
Host wrappers and operator entrypoints print target lists; operator-proof
defaults to all enabled targets when --target is omitted.
This commit is contained in:
Copilot 2026-05-29 20:34:22 -05:00
parent 0b242ddfc4
commit 71a443267e
10 changed files with 406 additions and 13 deletions

View file

@ -0,0 +1,97 @@
---
title: "feat: Scrape logging, run summary, and default-all-targets"
type: feat
status: complete
date: 2026-05-29
origin: /lfg — operator scripts need explicit server/file/message visibility and sane defaults
---
# feat: Scrape logging, run summary, and default-all-targets
## Summary
Make recurring scrape scripts print a upfront run plan (which guilds/servers, which output folders), per-channel file I/O with message deltas, and a final change summary. Operator entrypoints default to all enabled targets from `config/scrape-targets.json` without requiring repeated `--target` flags.
## Problem Frame
Operators cannot tell from current logs which Discord server was scraped, which archive files were touched, or how many messages were appended vs unchanged. `run-operator-proof.sh` still hardcodes `eod_discord`. The core engine (`run-discord-scrape.sh`) logs channel IDs but not guild names, paths, or before/after counts in one place.
## Requirements
| ID | Requirement |
|----|-------------|
| R1 | Before scrape/preflight, log config path and every selected target with `name`, resolved guild id/name(s), and `output_dir` |
| R2 | For each channel processed, log destination file path, action (`CREATED`, `MERGED`, `UNCHANGED`, `SKIPPED`), and message counts before → after (plus fetched batch size when merged) |
| R3 | After all targets complete, print a consolidated run summary with per-file deltas and totals |
| R4 | `run-documents-scrape.sh` and host wrapper print the same run-plan header before invoking the container |
| R5 | `run-operator-proof.sh` defaults to all enabled targets (loop handoff → scrape → prove) when `--target` is omitted |
| R6 | Offline smokes pass; scrape smoke asserts summary markers exist |
## Key Technical Decisions
- **KTD1: Ledger in `run-discord-scrape.sh`:** Keep summary state in bash arrays inside the core script rather than a new shared library — host wrappers only need jq-based target listing; the container owns channel-level detail.
- **KTD2: Guild labels from cache + export metadata:** Resolve guild names from `load_guild_cache` at target start; enrich per-channel lines from export JSON when available.
- **KTD3: No behavior change to merge semantics:** Logging only; append-only merge and skip rules stay unchanged.
## Implementation Units
### U1. Core scrape ledger and summary
**Goal:** Operator-visible run plan, per-channel I/O lines, and final summary in `run-discord-scrape.sh`.
**Requirements:** R1, R2, R3
**Files:** `scripts/run-discord-scrape.sh`, `scripts/tests/run-discord-scrape-smoke.sh`
**Approach:** Add `SCRAPE_SUMMARY_ENTRIES`, `guild_name_for_id`, `describe_target_resolution`, `log_run_plan`, `record_channel_result`, `print_scrape_summary`. Call from `run_target_mode` and `scrape_target`. Preflight reuses run plan header.
**Test scenarios:**
- Happy path: smoke run shows `Scrape run plan`, `MERGED`/`CREATED`/`UNCHANGED` lines, and `Scrape run summary`
- Edge: skipped channel appears as `SKIPPED` in summary
- Error path: failure before summary still leaves partial ledger in stderr
**Verification:** `./scripts/tests/run-discord-scrape-smoke.sh` passes with grep for summary markers.
### U2. Host and documents wrapper banners
**Goal:** Host-side run plan before container execution.
**Requirements:** R4
**Files:** `scripts/run-discord-scrape-host.sh`, `scripts/run-documents-scrape.sh`, `scripts/operator-handoff.sh`
**Approach:** Shared helper pattern: jq list enabled/selected targets with output_dir; print subcommand and config paths. `operator-handoff` lists enabled targets in handoff header.
**Test scenarios:**
- Happy path: documents-scrape dry-run output includes target list
- Integration: host smoke unchanged (no regression)
**Verification:** `./scripts/tests/documents-scrape-smoke.sh`, `./scripts/tests/run-discord-scrape-host-smoke.sh`
### U3. Operator proof defaults to all enabled targets
**Goal:** Remove hardcoded `eod_discord`; loop all enabled targets when `--target` omitted.
**Requirements:** R5
**Files:** `scripts/run-operator-proof.sh`, `scripts/tests/run-operator-proof-smoke.sh` (if present)
**Approach:** When `TARGET` empty, `mapfile` enabled names from config and run handoff once then scrape+prove per target; print per-target summary at end.
**Test scenarios:**
- Happy path: smoke with fake scripts verifies multi-target loop
- Edge: single `--target` still runs one target only
**Verification:** operator-proof smoke or documents smoke + manual grep.
## Scope Boundaries
### Deferred to Follow-Up Work
- Structured JSON run logs for machine parsing
- Changing `prove-incremental-append.sh` to require optional `--target`
### Out of scope
- Discord API or merge algorithm changes
- New CLI flags beyond existing `--target` narrowing

View file

@ -0,0 +1,37 @@
#!/usr/bin/env bash
# Host-side helpers: list selected/enabled targets before container runs.
print_scrape_config_plan() {
local config_path=$1
local action_label=$2
shift 2
local -a requested_targets=("$@")
local target_names_json line name output_dir enabled_count
[[ -f "$config_path" ]] || return 0
printf '%s\n' "=== $action_label run plan ==="
printf 'Config: %s\n' "$config_path"
if (( ${#requested_targets[@]} > 0 )); then
printf 'Targets (%s selected):\n' "${#requested_targets[@]}"
for name in "${requested_targets[@]}"; do
output_dir=$(jq -r --arg name "$name" '.targets[] | select(.name == $name) | .output_dir' "$config_path")
printf ' - %s → %s\n' "$name" "$output_dir"
done
return 0
fi
enabled_count=$(jq -r '[.targets[] | select(.enabled != false)] | length' "$config_path")
printf 'Targets (%s enabled, all will run):\n' "$enabled_count"
while IFS=$'\t' read -r name output_dir; do
[[ -n "$name" ]] || continue
printf ' - %s → %s\n' "$name" "$output_dir"
done < <(jq -r '.targets[] | select(.enabled != false) | [.name, .output_dir] | @tsv' "$config_path")
}
enabled_target_names() {
local config_path=$1
jq -r '.targets[] | select(.enabled != false) | .name' "$config_path"
}

View file

@ -4,6 +4,8 @@ set -Eeuo pipefail
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P) SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)
REPO_ROOT="${DCE_REPO_ROOT:-$(cd "$SCRIPT_DIR/.." && pwd -P)}" REPO_ROOT="${DCE_REPO_ROOT:-$(cd "$SCRIPT_DIR/.." && pwd -P)}"
# shellcheck source=lib/scrape-run-plan.sh
source "$SCRIPT_DIR/lib/scrape-run-plan.sh"
CONFIG_PATH="${DCE_CONFIG_FILE:-$REPO_ROOT/config/scrape-targets.json}" CONFIG_PATH="${DCE_CONFIG_FILE:-$REPO_ROOT/config/scrape-targets.json}"
VERIFY_READY="$REPO_ROOT/scripts/verify-operator-ready.sh" VERIFY_READY="$REPO_ROOT/scripts/verify-operator-ready.sh"
DOCUMENTS_SCRAPE="$REPO_ROOT/scripts/run-documents-scrape.sh" DOCUMENTS_SCRAPE="$REPO_ROOT/scripts/run-documents-scrape.sh"
@ -76,6 +78,8 @@ main() {
printf 'Operator handoff\n' printf 'Operator handoff\n'
printf '================\n' printf '================\n'
printf 'config: %s\n\n' "$CONFIG_PATH" printf 'config: %s\n\n' "$CONFIG_PATH"
print_scrape_config_plan "$CONFIG_PATH" "Operator handoff"
printf '\n'
if (( SKIP_DF == 0 )); then if (( SKIP_DF == 0 )); then
print_disk_summary print_disk_summary

View file

@ -4,6 +4,8 @@ set -Eeuo pipefail
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P) SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)
REPO_ROOT="${DCE_REPO_ROOT:-$(cd "$SCRIPT_DIR/.." && pwd -P)}" REPO_ROOT="${DCE_REPO_ROOT:-$(cd "$SCRIPT_DIR/.." && pwd -P)}"
# shellcheck source=lib/scrape-run-plan.sh
source "$SCRIPT_DIR/lib/scrape-run-plan.sh"
COMPOSE_FILE="${DCE_COMPOSE_FILE:-$REPO_ROOT/docker-compose.yml}" COMPOSE_FILE="${DCE_COMPOSE_FILE:-$REPO_ROOT/docker-compose.yml}"
ENV_FILE="${DCE_ENV_FILE:-$REPO_ROOT/scrape.env}" ENV_FILE="${DCE_ENV_FILE:-$REPO_ROOT/scrape.env}"
DOCKER_BIN="${DCE_DOCKER_BIN:-docker}" DOCKER_BIN="${DCE_DOCKER_BIN:-docker}"
@ -445,6 +447,18 @@ main() {
REAUTH_COMMAND="${DCE_REAUTH_COMMAND:-}" REAUTH_COMMAND="${DCE_REAUTH_COMMAND:-}"
run_disk_preflight_if_enabled "${passthrough_args[@]}" run_disk_preflight_if_enabled "${passthrough_args[@]}"
local host_config host_targets=() arg_idx=0
host_config=$(resolve_host_config_path "${passthrough_args[@]}")
while (( arg_idx < ${#passthrough_args[@]} )); do
if [[ "${passthrough_args[arg_idx]}" == "--target" ]]; then
host_targets+=("${passthrough_args[arg_idx + 1]:-}")
arg_idx=$((arg_idx + 2))
continue
fi
arg_idx=$((arg_idx + 1))
done
print_scrape_config_plan "$host_config" "Host $subcommand" "${host_targets[@]}"
case "$subcommand" in case "$subcommand" in
preflight|scrape) preflight|scrape)
run_subcommand_with_retry "$subcommand" "${passthrough_args[@]}" run_subcommand_with_retry "$subcommand" "${passthrough_args[@]}"

View file

@ -46,6 +46,163 @@ log() {
printf '[%s] %s\n' "$(timestamp)" "$*" >&2 printf '[%s] %s\n' "$(timestamp)" "$*" >&2
} }
SCRAPE_SUMMARY_ENTRIES=()
reset_scrape_summary() {
SCRAPE_SUMMARY_ENTRIES=()
}
record_channel_result() {
local target_name=$1 channel_id=$2 guild_label=$3 file_path=$4 action=$5
local before_count=$6 fetched_count=$7 after_count=$8
SCRAPE_SUMMARY_ENTRIES+=(
"$target_name"$'\t'"$channel_id"$'\t'"$guild_label"$'\t'"$file_path"$'\t'"$action"$'\t'"$before_count"$'\t'"$fetched_count"$'\t'"$after_count"
)
}
guild_name_for_id() {
local guild_id=$1
local cached_id cached_name
while IFS=$'\t' read -r cached_id cached_name; do
[[ "$cached_id" == "$guild_id" ]] || continue
printf '%s\n' "$cached_name"
return 0
done < <(load_guild_cache)
printf '%s\n' "$guild_id"
}
guild_label_from_export() {
local export_path=$1
local guild_id guild_name
[[ -n "$export_path" && -f "$export_path" ]] || {
printf 'unknown guild\n'
return 0
}
guild_id=$(jq -r '.guild.id // empty' "$export_path")
guild_name=$(jq -r '.guild.name // empty' "$export_path")
if [[ -n "$guild_id" && -n "$guild_name" ]]; then
printf '%s [%s]\n' "$guild_name" "$guild_id"
return 0
fi
if [[ -n "$guild_id" ]]; then
printf '%s [%s]\n' "$(guild_name_for_id "$guild_id")" "$guild_id"
return 0
fi
printf 'unknown guild\n'
}
describe_target_resolution() {
local target_json=$1
local kind target_name output_dir
local -a configured_channel_ids configured_guild_ids seeded_channel_ids guild_labels
local guild_id guild_name channel_count
target_name=$(jq -r '.name' <<<"$target_json")
output_dir=$(jq -r '.output_dir' <<<"$target_json")
kind=$(jq -r '.kind // "guild"' <<<"$target_json")
if [[ "$kind" == "dms" ]]; then
channel_count=$(load_dm_channel_cache | wc -l | tr -d ' ')
printf 'DM target (%s channel(s))' "${channel_count:-0}"
return 0
fi
mapfile -t configured_channel_ids < <(jq -r '.channel_ids[]? | tostring' <<<"$target_json")
if (( ${#configured_channel_ids[@]} > 0 )); then
printf '%s explicit channel id(s)' "${#configured_channel_ids[@]}"
return 0
fi
mapfile -t seeded_channel_ids < <(load_archive_seed_channel_ids "$output_dir" | sort -u)
if (( ${#seeded_channel_ids[@]} > 0 )); then
printf 'archive-seeded (%s channel file(s))' "${#seeded_channel_ids[@]}"
return 0
fi
mapfile -t configured_guild_ids < <(resolve_configured_guilds "$target_json")
if (( ${#configured_guild_ids[@]} == 0 )); then
printf 'no guild/channel resolution (check config or token)'
return 0
fi
for guild_id in "${configured_guild_ids[@]}"; do
guild_name=$(guild_name_for_id "$guild_id")
guild_labels+=("$guild_name [$guild_id]")
done
(IFS='; '; printf '%s' "${guild_labels[*]}")
}
log_run_plan() {
local mode=$1 config_path=$2
shift 2
local -a selected_targets=("$@")
local target_json target_name output_dir resolution
log '=== Scrape run plan ==='
log "Config: $config_path"
log "Mode: $mode"
for target_json in "${selected_targets[@]}"; do
target_name=$(jq -r '.name' <<<"$target_json")
output_dir=$(jq -r '.output_dir' <<<"$target_json")
resolution=$(describe_target_resolution "$target_json")
log " Target '$target_name' → $output_dir"
log " Server scope: $resolution"
done
}
print_scrape_summary() {
local entry target_name channel_id guild_label file_path action
local before_count fetched_count after_count delta appended=0
local created=0 merged=0 unchanged=0 skipped=0
log '=== Scrape run summary ==='
if (( ${#SCRAPE_SUMMARY_ENTRIES[@]} == 0 )); then
log ' No channel activity recorded.'
return 0
fi
for entry in "${SCRAPE_SUMMARY_ENTRIES[@]}"; do
IFS=$'\t' read -r target_name channel_id guild_label file_path action before_count fetched_count after_count <<<"$entry"
case "$action" in
CREATED)
created=$((created + 1))
delta=$((after_count - before_count))
appended=$((appended + delta))
log " CREATED $file_path +$delta messages (0 → $after_count) channel $channel_id $guild_label"
;;
MERGED)
merged=$((merged + 1))
delta=$((after_count - before_count))
appended=$((appended + delta))
log " MERGED $file_path +$delta messages ($before_count$after_count, fetched $fetched_count) channel $channel_id $guild_label"
;;
UNCHANGED)
unchanged=$((unchanged + 1))
log " UNCHANGED $file_path $after_count messages channel $channel_id $guild_label"
;;
SKIPPED)
skipped=$((skipped + 1))
log " SKIPPED channel $channel_id $guild_label (inaccessible or non-fatal export error)"
;;
*)
log " $action $file_path channel $channel_id $guild_label"
;;
esac
done
log "Totals: $created created, $merged merged, $unchanged unchanged, $skipped skipped; +$appended messages appended"
}
die() { die() {
log "ERROR: $*" log "ERROR: $*"
exit 1 exit 1
@ -761,7 +918,7 @@ scrape_target() {
local target_json=$1 local target_json=$1
local defaults_json=$2 local defaults_json=$2
local target_name output_dir destination_path after_id temp_dir temp_export temp_merged local target_name output_dir destination_path after_id temp_dir temp_export temp_merged
local latest_batch_count local latest_batch_count guild_label before_count after_count
local -a channel_ids local -a channel_ids
local export_status=0 local export_status=0
@ -776,11 +933,14 @@ scrape_target() {
fi fi
log "Target '$target_name': processing ${#channel_ids[@]} channel(s) into $output_dir." log "Target '$target_name': processing ${#channel_ids[@]} channel(s) into $output_dir."
log " Server scope: $(describe_target_resolution "$target_json")"
local channel_id local channel_id
local skipped_channels=0 local skipped_channels=0
for channel_id in "${channel_ids[@]}"; do for channel_id in "${channel_ids[@]}"; do
destination_path=$(resolve_destination_path "$output_dir" "$channel_id") destination_path=$(resolve_destination_path "$output_dir" "$channel_id")
before_count=0
guild_label="unknown guild"
if [[ -n "$destination_path" ]]; then if [[ -n "$destination_path" ]]; then
mkdir -p "$(dirname "$destination_path")" mkdir -p "$(dirname "$destination_path")"
fi fi
@ -788,6 +948,8 @@ scrape_target() {
if [[ -n "$destination_path" && -f "$destination_path" ]]; then if [[ -n "$destination_path" && -f "$destination_path" ]]; then
jq empty "$destination_path" >/dev/null 2>&1 || die "Existing export is not valid JSON: $destination_path" jq empty "$destination_path" >/dev/null 2>&1 || die "Existing export is not valid JSON: $destination_path"
assert_export_channel_identity "$destination_path" "$channel_id" assert_export_channel_identity "$destination_path" "$channel_id"
before_count=$(message_count "$destination_path")
guild_label=$(guild_label_from_export "$destination_path")
fi fi
after_id=$(last_message_id "$destination_path") after_id=$(last_message_id "$destination_path")
@ -796,7 +958,7 @@ scrape_target() {
temp_export="$temp_dir/export.json" temp_export="$temp_dir/export.json"
temp_merged="$temp_dir/merged.json" temp_merged="$temp_dir/merged.json"
log "Exporting channel $channel_id for target '$target_name'${after_id:+ after message $after_id}." log "Exporting channel $channel_id for target '$target_name'${after_id:+ after message $after_id}${destination_path:+ → $destination_path}."
export_status=0 export_status=0
export_channel_incremental "$channel_id" "$temp_export" "$after_id" || export_status=$? export_channel_incremental "$channel_id" "$temp_export" "$after_id" || export_status=$?
@ -805,6 +967,7 @@ scrape_target() {
2) 2)
rm -rf "$temp_dir" rm -rf "$temp_dir"
skipped_channels=$((skipped_channels + 1)) skipped_channels=$((skipped_channels + 1))
record_channel_result "$target_name" "$channel_id" "$guild_label" "${destination_path:-n/a}" SKIPPED "$before_count" 0 "$before_count"
continue continue
;; ;;
*) *)
@ -815,6 +978,7 @@ scrape_target() {
jq empty "$temp_export" >/dev/null 2>&1 || die "Incremental export is not valid JSON: $temp_export" jq empty "$temp_export" >/dev/null 2>&1 || die "Incremental export is not valid JSON: $temp_export"
assert_export_channel_identity "$temp_export" "$channel_id" assert_export_channel_identity "$temp_export" "$channel_id"
guild_label=$(guild_label_from_export "$temp_export")
if [[ -z "$destination_path" ]]; then if [[ -z "$destination_path" ]]; then
destination_path=$(resolve_destination_path "$output_dir" "$channel_id" "$temp_export") destination_path=$(resolve_destination_path "$output_dir" "$channel_id" "$temp_export")
@ -824,11 +988,16 @@ scrape_target() {
latest_batch_count=$(message_count "$temp_export") latest_batch_count=$(message_count "$temp_export")
if [[ ! -f "$destination_path" ]]; then if [[ ! -f "$destination_path" ]]; then
mv "$temp_export" "$destination_path" mv "$temp_export" "$destination_path"
after_count=$(message_count "$destination_path")
record_channel_result "$target_name" "$channel_id" "$guild_label" "$destination_path" CREATED 0 "$after_count" "$after_count"
log " CREATED $destination_path (+$after_count messages, new archive)"
rm -rf "$temp_dir" rm -rf "$temp_dir"
continue continue
fi fi
if (( latest_batch_count == 0 )); then if (( latest_batch_count == 0 )); then
record_channel_result "$target_name" "$channel_id" "$guild_label" "$destination_path" UNCHANGED "$before_count" 0 "$before_count"
log " UNCHANGED $destination_path ($before_count messages, no new export data)"
rm -rf "$temp_dir" rm -rf "$temp_dir"
continue continue
fi fi
@ -838,6 +1007,9 @@ scrape_target() {
jq empty "$temp_merged" >/dev/null 2>&1 || die "Merged export is not valid JSON: $temp_merged" jq empty "$temp_merged" >/dev/null 2>&1 || die "Merged export is not valid JSON: $temp_merged"
assert_export_channel_identity "$temp_merged" "$channel_id" assert_export_channel_identity "$temp_merged" "$channel_id"
commit_merged_export "$destination_path" "$temp_merged" commit_merged_export "$destination_path" "$temp_merged"
after_count=$(message_count "$destination_path")
record_channel_result "$target_name" "$channel_id" "$guild_label" "$destination_path" MERGED "$before_count" "$latest_batch_count" "$after_count"
log " MERGED $destination_path (+$((after_count - before_count)) messages, $before_count$after_count, fetched $latest_batch_count)"
rm -rf "$temp_dir" rm -rf "$temp_dir"
done done
@ -945,6 +1117,9 @@ run_target_mode() {
CACHE_ROOT=$(mktemp -d "${TMPDIR:-/tmp}/dce-scrape.XXXXXX") CACHE_ROOT=$(mktemp -d "${TMPDIR:-/tmp}/dce-scrape.XXXXXX")
trap 'rm -rf "$CACHE_ROOT"' EXIT trap 'rm -rf "$CACHE_ROOT"' EXIT
reset_scrape_summary
log_run_plan "$mode" "$config_path" "${selected_targets[@]}"
local target_json local target_json
for target_json in "${selected_targets[@]}"; do for target_json in "${selected_targets[@]}"; do
if [[ "$mode" == "preflight" ]]; then if [[ "$mode" == "preflight" ]]; then
@ -953,6 +1128,10 @@ run_target_mode() {
scrape_target "$target_json" "$defaults_json" scrape_target "$target_json" "$defaults_json"
fi fi
done done
if [[ "$mode" == "scrape" ]]; then
print_scrape_summary
fi
} }
main() { main() {

View file

@ -11,6 +11,8 @@ DISCOVER_TOKEN="$REPO_ROOT/scripts/discover-discord-token.sh"
VERIFY_SCRIPT="$REPO_ROOT/scripts/verify-documents-archives.sh" VERIFY_SCRIPT="$REPO_ROOT/scripts/verify-documents-archives.sh"
VERIFY_READY="$REPO_ROOT/scripts/verify-operator-ready.sh" VERIFY_READY="$REPO_ROOT/scripts/verify-operator-ready.sh"
SETUP_AUTH="$REPO_ROOT/scripts/setup-scrape-auth.sh" SETUP_AUTH="$REPO_ROOT/scripts/setup-scrape-auth.sh"
# shellcheck source=lib/scrape-run-plan.sh
source "$SCRIPT_DIR/lib/scrape-run-plan.sh"
usage() { usage() {
cat <<EOF cat <<EOF
@ -70,6 +72,12 @@ main() {
"$VERIFY_SCRIPT" --config "$CONFIG_PATH" "$VERIFY_SCRIPT" --config "$CONFIG_PATH"
local -a plan_targets=()
if [[ -n "$target" ]]; then
plan_targets=("$target")
fi
print_scrape_config_plan "$CONFIG_PATH" "Documents scrape" "${plan_targets[@]}"
if (( dry_run == 1 )); then if (( dry_run == 1 )); then
printf 'Dry run complete: archive paths verified. Export DISCORD_TOKEN or create a token file, then rerun without --dry-run.\n' printf 'Dry run complete: archive paths verified. Export DISCORD_TOKEN or create a token file, then rerun without --dry-run.\n'
exit 0 exit 0

View file

@ -10,8 +10,10 @@ DOCUMENTS="$REPO_ROOT/scripts/run-documents-scrape.sh"
PROVE="$REPO_ROOT/scripts/prove-incremental-append.sh" PROVE="$REPO_ROOT/scripts/prove-incremental-append.sh"
SYNC_GUI="$REPO_ROOT/scripts/sync-token-from-gui.sh" SYNC_GUI="$REPO_ROOT/scripts/sync-token-from-gui.sh"
LOG_DIR="$REPO_ROOT/logs" LOG_DIR="$REPO_ROOT/logs"
# shellcheck source=lib/scrape-run-plan.sh
source "$SCRIPT_DIR/lib/scrape-run-plan.sh"
TARGET="eod_discord" TARGET=""
SYNC_GUI_FLAG=0 SYNC_GUI_FLAG=0
DRY_RUN=0 DRY_RUN=0
@ -20,9 +22,11 @@ usage() {
Usage: Usage:
$(basename "$0") [--target NAME] [--config PATH] [--sync-gui] [--dry-run] $(basename "$0") [--target NAME] [--config PATH] [--sync-gui] [--dry-run]
End-to-end operator proof for one target: End-to-end operator proof:
operator-handoff → incremental scrape → prove-incremental-append operator-handoff → incremental scrape → prove-incremental-append
When --target is omitted, all enabled targets in the config are processed.
Logs append to logs/operator-proof-<timestamp>.log Logs append to logs/operator-proof-<timestamp>.log
EOF EOF
} }
@ -63,13 +67,30 @@ main() {
esac esac
done done
[[ -f "$CONFIG_PATH" ]] || die "Missing config: $CONFIG_PATH"
local -a targets=()
if [[ -n "$TARGET" ]]; then
targets=("$TARGET")
else
mapfile -t targets < <(enabled_target_names "$CONFIG_PATH")
((${#targets[@]} > 0)) || die "No enabled targets in $CONFIG_PATH"
fi
mkdir -p "$LOG_DIR" mkdir -p "$LOG_DIR"
local log_file local log_file
log_file="$LOG_DIR/operator-proof-$(date -u +%Y%m%dT%H%M%SZ).log" log_file="$LOG_DIR/operator-proof-$(date -u +%Y%m%dT%H%M%SZ).log"
local failed=0 succeeded=0 name
{ {
printf 'Operator proof for target %s\n' "$TARGET" if [[ -n "$TARGET" ]]; then
printf 'Operator proof for target %s\n' "$TARGET"
else
printf 'Operator proof for %s enabled target(s)\n' "${#targets[@]}"
fi
printf 'config: %s\n' "$CONFIG_PATH" printf 'config: %s\n' "$CONFIG_PATH"
print_scrape_config_plan "$CONFIG_PATH" "Operator proof" "${targets[@]}"
printf 'started: %s\n\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" printf 'started: %s\n\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
if (( SYNC_GUI_FLAG == 1 )); then if (( SYNC_GUI_FLAG == 1 )); then
@ -77,17 +98,27 @@ main() {
"$SYNC_GUI" --force "$SYNC_GUI" --force
fi fi
"$HANDOFF" --config "$CONFIG_PATH"
if (( DRY_RUN == 1 )); then if (( DRY_RUN == 1 )); then
"$HANDOFF" --config "$CONFIG_PATH"
printf '\nDry run complete (no Discord scrape).\n' printf '\nDry run complete (no Discord scrape).\n'
exit 0 exit 0
fi fi
"$HANDOFF" --config "$CONFIG_PATH" for name in "${targets[@]}"; do
"$DOCUMENTS" --config "$CONFIG_PATH" --target "$TARGET" printf '\n--- Target: %s ---\n' "$name"
"$PROVE" --config "$CONFIG_PATH" --target "$TARGET" if "$DOCUMENTS" --config "$CONFIG_PATH" --target "$name" && "$PROVE" --config "$CONFIG_PATH" --target "$name"; then
succeeded=$((succeeded + 1))
printf 'Operator proof passed for %s\n' "$name"
else
failed=$((failed + 1))
printf 'Operator proof FAILED for %s\n' "$name" >&2
fi
done
printf '\nOperator proof succeeded for %s\n' "$TARGET" printf '\nOperator proof summary: %s succeeded, %s failed (of %s)\n' \
"$succeeded" "$failed" "${#targets[@]}"
(( failed == 0 )) || exit 1
} 2>&1 | tee "$log_file" } 2>&1 | tee "$log_file"
printf 'Log: %s\n' "$log_file" printf 'Log: %s\n' "$log_file"

View file

@ -11,8 +11,9 @@ cleanup() {
trap cleanup EXIT trap cleanup EXIT
FAKE_REPO="$TMP_DIR/fake-repo" FAKE_REPO="$TMP_DIR/fake-repo"
mkdir -p "$FAKE_REPO/scripts" mkdir -p "$FAKE_REPO/scripts/lib"
cp "$REPO_ROOT/scripts/run-discord-scrape-host.sh" "$FAKE_REPO/scripts/" cp "$REPO_ROOT/scripts/run-discord-scrape-host.sh" "$FAKE_REPO/scripts/"
cp "$REPO_ROOT/scripts/lib/scrape-run-plan.sh" "$FAKE_REPO/scripts/lib/"
chmod +x "$FAKE_REPO/scripts/run-discord-scrape-host.sh" chmod +x "$FAKE_REPO/scripts/run-discord-scrape-host.sh"
COMPOSE_FILE="$TMP_DIR/docker-compose.yml" COMPOSE_FILE="$TMP_DIR/docker-compose.yml"
@ -73,7 +74,12 @@ DCE_REPO_ROOT="$REPO_ROOT" \
DISCORD_TOKEN=dummy \ DISCORD_TOKEN=dummy \
"$PROVE" --config "$TMP_DIR/config.json" --target demo >/dev/null "$PROVE" --config "$TMP_DIR/config.json" --target demo >/dev/null
"$REPO_ROOT/scripts/run-documents-scrape.sh" --dry-run --config "$TMP_DIR/config.json" >/dev/null DOC_OUT="$TMP_DIR/documents-dry-run.log"
"$REPO_ROOT/scripts/run-documents-scrape.sh" --dry-run --config "$TMP_DIR/config.json" >"$DOC_OUT" 2>&1
grep -q 'Documents scrape run plan' "$DOC_OUT" || {
echo "expected Documents scrape run plan in dry-run output" >&2
exit 1
}
DCE_MIN_FREE_MB=0 DCE_CONFIG_FILE="$TMP_DIR/config.json" \ DCE_MIN_FREE_MB=0 DCE_CONFIG_FILE="$TMP_DIR/config.json" \
"$REPO_ROOT/scripts/verify-operator-ready.sh" --disk-only --config "$TMP_DIR/config.json" \ "$REPO_ROOT/scripts/verify-operator-ready.sh" --disk-only --config "$TMP_DIR/config.json" \

View file

@ -210,7 +210,20 @@ run_wrapper() {
"$REPO_ROOT/scripts/run-discord-scrape.sh" scrape --target "$target_name" "$REPO_ROOT/scripts/run-discord-scrape.sh" scrape --target "$target_name"
} }
run_wrapper demo initial SCRAPE_LOG="$TMP_DIR/scrape.log"
run_wrapper demo initial 2>"$SCRAPE_LOG"
grep -q 'Scrape run plan' "$SCRAPE_LOG" || {
echo "expected Scrape run plan in scrape output" >&2
exit 1
}
grep -q 'Scrape run summary' "$SCRAPE_LOG" || {
echo "expected Scrape run summary in scrape output" >&2
exit 1
}
grep -qE 'CREATED|MERGED|UNCHANGED' "$SCRAPE_LOG" || {
echo "expected channel result line in scrape output" >&2
exit 1
}
DEST="$ARCHIVE_ROOT/demo/$DEFAULT_FILE_NAME" DEST="$ARCHIVE_ROOT/demo/$DEFAULT_FILE_NAME"
[[ -f "$DEST" ]] || { echo "expected destination archive missing" >&2; exit 1; } [[ -f "$DEST" ]] || { echo "expected destination archive missing" >&2; exit 1; }

View file

@ -49,5 +49,9 @@ if [[ "$status" -ne 0 ]] || ! grep -q 'Dry run complete' <<<"$output"; then
printf '%s\n' "$output" >&2 printf '%s\n' "$output" >&2
exit 1 exit 1
fi fi
grep -q 'Operator proof run plan' <<<"$output" || {
echo "expected Operator proof run plan in dry-run output" >&2
exit 1
}
printf 'run-operator-proof-smoke: ok\n' printf 'run-operator-proof-smoke: ok\n'