Commit graph

74 commits

Author SHA1 Message Date
Copilot c8ed19d26b feat(scrape): per-target JSON summaries in multi-target loops
Validation --per-target and multi-target proof now pass --summary-file
per scrape so each target gets its own operator-*-<target>-UTC summary.
2026-06-03 11:08:44 -05:00
Copilot 8c36fdbdda feat(scrape): auto JSON summary on documents scrape runs
Enable DCE_RUN_SUMMARY_JSON by default for live run-documents-scrape
paths with optional --summary-file override; skip on dry-run/salvage-only.
2026-06-03 10:57:32 -05:00
Copilot a929be48e8 feat(scrape): add print-scrape-summary CLI for JSON artifacts
Pretty-print version-1 scrape summary files with totals table, --oom-only
filter, and stdin support for operator validation/proof outputs.
2026-06-03 10:45:55 -05:00
Copilot dbc887d81c feat(scrape): JSON summary export for operator proof runs
Auto-enable DCE_RUN_SUMMARY_* when proof scrapes, support --log-file,
and recover summary JSON from the teed proof log when file write fails.
2026-06-03 10:35:48 -05:00
Copilot 35a7416d8f feat(scrape): recover JSON summary from host compose run log
Reuse shared recover helper before deleting the temp compose log when
DCE_RUN_SUMMARY_FILE is missing after a successful host scrape.
2026-06-03 10:30:14 -05:00
Copilot fcea842fe3 feat(scrape): recover JSON summary from teed validation log
When DCE_RUN_SUMMARY_FILE is missing after operator validation, extract
the last DCE_JSON_SUMMARY line from the log. Refresh KotOR operator docs.
2026-06-03 10:25:23 -05:00
Copilot 5cfb2ed144 feat(scrape): host compose passthrough for JSON summary
Mount logs/ in compose, map DCE_RUN_SUMMARY_FILE to /logs, and auto-enable
JSON summary beside operator-validation log files when scraping.
2026-06-03 10:18:33 -05:00
Copilot 1dda40ae1b feat(scrape): optional JSON run summary for automation
Emit DCE_JSON_SUMMARY log line and/or write DCE_RUN_SUMMARY_FILE
with per-channel actions and totals after scrape completes.
2026-06-03 10:08:44 -05:00
Copilot aa85fe50fa feat(verify): show per-target container_memory in operator checks
Archive verify table adds MEM column; verify-operator-ready lists
config target memory when global DCE_CONTAINER_MEMORY is unset.
2026-06-03 10:00:27 -05:00
Copilot 8ca55f299b feat(scrape): per-target container_memory in scrape config
Single --target runs apply optional container_memory from
scrape-targets.json when global DCE_CONTAINER_MEMORY is unset.
KotOR_discord_msgs defaults to 8g; scrape.env still overrides.
2026-06-03 09:55:33 -05:00
Copilot 3e96514f3e feat(prove): filter incremental snapshots by --channel
Channel-scoped proof runs snapshot and compare only selected archives,
so yes_general-focused validation ignores unrelated KotOR channels.
Smoke covers filtered snapshot-only mode; exclude .dce-temp from find.
2026-06-03 09:44:33 -05:00
Copilot a827e6b9bc feat(scrape): label OOM skips and hint container memory
Classify aborted/OOM export skips as SKIPPED (OOM/aborted) in the run
summary with salvage/memory guidance; verify-operator-ready shows
configured DCE_CONTAINER_MEMORY.
2026-06-03 09:38:45 -05:00
Copilot 69ce1ca539 feat(scrape): optional DCE_CONTAINER_MEMORY compose mem_limit
Operators can raise the scrape container memory cap for large channel
catch-up (e.g. yes_general) via scrape.env without changing default runs.
2026-06-03 09:23:37 -05:00
Copilot a88cd815f4 refactor(scrape): extract shared scrape-lock library
Centralize archive-root lock path, held checks, holder formatting, and
reclaim helpers in scripts/lib/scrape-lock.sh. Source it from the host
runner, lock status script, and operator wrappers to remove duplicated
logic. Update documents-scrape smoke fake repo to include the new lib.
2026-06-03 07:14:22 -05:00
Copilot e82007a2c5 feat(scrape): reclaim stale lock and proof salvage-before smoke
Add --reclaim-stale to scrape-lock-status.sh for dead-holder cleanup and
extend operator-proof smoke to cover --salvage-before-scrape end-to-end.
2026-06-03 07:08:45 -05:00
Copilot b883943e3a feat(scrape): lock gate and salvage-before on documents scrape
Add archive-root lock preflight and --salvage-before-scrape to
run-documents-scrape.sh so direct operator invocations match validation
safety and KotOR catch-up workflow.
2026-06-03 07:04:56 -05:00
Copilot 363749231d feat(scrape): handoff salvage-only mode and validation lock gate
Wire --salvage-only through operator-handoff and run-operator-proof, and
refuse run-operator-validation scrape steps while the archive-root lock
is actively held.
2026-06-03 07:01:33 -05:00
Copilot 682094c348 feat(scrape): add scrape-lock-status diagnostic for archive-root lock
Introduce read-only scrape-lock-status.sh and surface lock state during
operator-handoff so shared Documents archives show active or stale holders
before starting another scrape.
2026-06-03 06:56:56 -05:00
Copilot 22915770e6 feat(scrape): wire salvage flags through operator validation and proof
Expose --salvage-only and --salvage-before-scrape on run-operator-validation.sh
and --salvage-before-scrape on run-operator-proof.sh so operators can merge stale
.dce-temp exports before or instead of incremental Discord scrapes.
2026-06-03 06:52:53 -05:00
Copilot 8468e34e37 feat(scrape): add salvage-only mode for stale temp exports
Expose run-discord-scrape.sh salvage and run-documents-scrape.sh
--salvage-only so operators can merge quiescent .dce-temp partials without
re-downloading from Discord after stopping a long or OOM-aborted run.
2026-06-03 06:48:39 -05:00
Copilot 2c01b3a7b9 fix(scrape): flock on archive_root with lock holder diagnostics
Serialize scrapes across repo checkouts that share the same Documents
archive_root, write lock meta with pid/cmd, and reclaim when the holder
process is dead.
2026-06-03 06:39:46 -05:00
Copilot 8057a4443a feat(operators): pass --channel through handoff, prove, and proof
Complete the operator CLI chain so focused yes_general catch-up can run
end-to-end with --target and --channel on handoff dry-run, prove scrape,
and operator-proof orchestration.
2026-06-03 06:36:02 -05:00
Copilot ae120c916f fix(scrape): skip active stale temps and retry salvage merge
Avoid salvaging export.json while a channel export is still writing
(default: skip temps modified within 120s). Retry truncate+merge once
when merge fails on a quiescent partial temp. Adds active-skip smoke
and ages stale fixture mtime so salvage tests stay deterministic.
2026-06-03 06:31:02 -05:00
Copilot 14796e9c09 fix(host): stop forcing compose -T so operator logs stream live
podman-compose and docker compose allocate a pseudo-TTY by default;
always passing -T block-buffered export progress. Omit -T for operator
runs and set DCE_COMPOSE_TTY=0 only for cron log append. Adds compose
TTY smokes and cron job env assertion.
2026-06-03 06:23:12 -05:00
Copilot d8742c5c7b feat(operator): pass --channel through documents and validation wrappers
Operators can narrow KotOR yes_general catch-up to a single channel ID
without re-scraping every guild channel. Adds dry-run and compose arg
capture smokes for the passthrough path.
2026-06-03 06:18:08 -05:00
Copilot 5820f67caf feat(host): stream container scrape output during long runs
Use tee in run_subcommand_with_retry so validation logs show live export
progress instead of buffering until compose exits. Add streaming smoke
assertion and skip scrape lock in documents-scrape smoke when prove runs
against the real repo root during parallel validation.
2026-06-03 06:13:58 -05:00
Copilot ee62078f5b fix(scrape): skip SIGTERM/SIGINT export aborts like OOM
Stopping validation with kill/Ctrl+C returned exit 143/130 and failed
the whole target instead of SKIPPED + preserve partial. Added smoke for
exit 143; gitignore .dce-scrape.lock.
2026-06-03 06:06:15 -05:00
Copilot b9bb4bbe64 fix(host): flock scrape lock prevents concurrent container exports
Overlapping run-operator-validation invocations spawned twin yes_general
exports and repeated OOM skips. Host scrape now holds .dce-scrape.lock;
smokes bypass via DCE_SKIP_SCRAPE_LOCK. Added lock smoke (20/20 pass).
2026-06-03 06:03:47 -05:00
Copilot 928c0ef682 fix(audit): exclude .dce-temp partial exports from JSON audit
Operator validation failed when yes_general OOM left truncated exports
under .dce-temp. Audit and archive verification now skip in-progress temps;
smoke covers the partial-temp case. KotOR audit passes with temps present.
2026-06-03 05:59:54 -05:00
Copilot 8b54b6a498 test(scrape): preserve-partial smoke; fix host token-file precedence
Add offline regression for OOM skip preserving partial export temps.
Host wrapper now prefers DISCORD_TOKEN_FILE over inherited shell tokens
and always writes explicit compose env for auth-retry. All 19 smokes pass.
2026-06-03 05:52:39 -05:00
Copilot 87537eb8b0 fix(scrape): preserve partial temps on OOM; large-file salvage merge
OOM/aborted channel exports no longer delete partial temp downloads.
Salvage uses grep boundary repair with python merge/validate for files
over 64 MiB. Retain stale temps when merge fails instead of discarding.
2026-06-03 05:35:22 -05:00
Copilot c13c4167be fix(scrape): salvage stale temp exports before re-downloading
When a previous export crashes (OOM, abort, kill), the partially-
downloaded temp export under .dce-temp/ was orphaned. Subsequent
runs started the incremental from the archive's last message ID,
re-downloading everything the failed run had already fetched.

Now scrape_target() checks for orphaned temp exports before each
channel export, salvages truncated JSON (same marker-based repair
as salvage-truncated-export.sh), merges recovered messages into
the archive, and cleans up stale temp dirs. The incremental then
starts from the truly latest message.

Adds salvage-stale smoke test with truncated fixture.
2026-06-03 01:11:28 -05:00
Copilot 87284816d0 test(scrape): add abort exit 134 skip smoke; plan 041 closure
Extend run-discord-scrape-smoke with skip-abort target so OOM/abort
channel skip from plan 040 has offline regression coverage. Update
merge-readiness for 2026-05-30 and KotOR validation retry in progress.
2026-06-03 00:57:11 -05:00
Copilot 1608e7cfb0 fix(scrape): skip channels on OOM/abort export exit codes
Treat CLI exit 134/137/139 and abort/OOM log patterns as skippable
so KotOR yes_general core dump does not fail the entire target scrape.
2026-06-03 00:44:06 -05:00
Copilot bc1f727907 feat(scrape): complete validation resume (8/9 targets)
Resume per-target validation for five remaining servers; clarify
validation log labels (begin/done/failed). Document 8/9 pass in
merge-readiness; KotOR_discord_msgs fails on yes_general export.
2026-05-29 23:35:35 -05:00
Copilot 71a443267e feat(scrape): run plan, channel ledger, and all-target proof
Log scrape plan/summary with per-file message deltas in the core script.
Host wrappers and operator entrypoints print target lists; operator-proof
defaults to all enabled targets when --target is omitted.
2026-05-29 20:34:22 -05:00
Boden 1e35761dbb test(scrape): lock mixed-length snowflake cursor selection
Add cursor-mixed-length smoke where string max_by would pick the wrong
--after value; padded sort_by in last_message_id already picks the max.
2026-05-29 16:33:00 -05:00
Boden 65c9fb2206 feat(scrape): operator proof script and podman-compose smoke fix
Add run-operator-proof for one-target handoff/scrape/prove flows.
Prefer podman-compose on Podman hosts but honor DCE_DOCKER_BIN overrides
so offline smokes keep using fake compose shims.
2026-05-29 16:20:25 -05:00
Boden 3fd42d924e feat(scrape): sync GUI bridge doc into linux-x64 folder
Add sync-gui-bridge-doc.sh to copy the versioned bridge markdown beside
the GUI zip; includes offline smoke test (18 total via run-all-smokes).
2026-05-29 16:10:17 -05:00
Boden c4371c9f7d docs(scrape): LFG closure — operator handoff in setup guides
Point README and Recurring-Scrape-Setup at operator-handoff.sh, document
disk preflight, and harden operator-handoff smoke against pipefail flakes.
2026-05-29 16:05:42 -05:00
Boden c0818715a8 feat(scrape): add operator-handoff verification script
Single entrypoint runs disk summary, verify-operator-ready, and
run-documents-scrape --dry-run before cron or full scrapes.
2026-05-29 16:03:22 -05:00
Boden 44eadee634 feat(scrape): disk preflight on host runner for cron jobs
run-discord-scrape-host.sh runs verify --disk-only before preflight/scrape
so setup-cron monthly jobs fail fast when archive roots are low on space.
Harden bootstrap smoke to surface failures when dry-run fails.
2026-05-29 16:00:11 -05:00
Boden 32b7f47d45 feat(scrape): disk-only verify flag and documents scrape preflight
run-documents-scrape.sh checks archive free space before live Discord
calls; verify-operator-ready.sh adds --disk-only for lightweight checks.
2026-05-29 15:54:08 -05:00
Boden 1142e376b5 fix(scrape): disk preflight before compose and skippable disk errors
Fail fast when archive or repo paths lack free space (DCE_MIN_FREE_MB),
treat disk-full export failures as skippable channels, and add an offline
disk-space smoke. Smokes default DCE_MIN_FREE_MB=0 so CI stays portable.
2026-05-29 15:27:39 -05:00
Boden a4fd78275d fix(scrape): avoid stdin drain in per-target validation loop
Docker compose was consuming the while-read target list after the first
server. Load enabled target names with mapfile before any scrape runs.
2026-05-29 14:23:30 -05:00
Boden 76b4231d7a feat(scrape): per-target validation with continue-on-error
Run scrape and audit per enabled server independently; log summary
counts. Full host validation started via --per-target --continue-on-error.
2026-05-29 14:20:37 -05:00
Boden 1742a9d41e feat(scrape): add run-operator-validation orchestrator
Sync GUI token, verify readiness, run documents scrape, and audit JSON
with timestamped logs. Live eod_discord validation passed on host.
2026-05-29 14:19:04 -05:00
Boden 00bcbc5b21 feat(scrape): add verify-operator-ready host checks
One command validates compose, auth, config, and seeded archives before
bootstrap or cron. Includes offline smoke test (14 smokes total).
2026-05-29 14:16:10 -05:00
Boden 89091d76ef feat(scrape): add run-all-smokes.sh and align CI docs
Single entrypoint for offline smokes; CI uses it instead of listing each
script. Update setup guide and operator checklist with full smoke table.
2026-05-29 14:13:31 -05:00
Boden 10cd2a534d feat(scrape): offline prove smoke and snapshot-only mode
Add --snapshot-only and --compare-snapshots to prove-incremental-append,
fix EXIT trap status, wire prove smoke into CI, and document LFG closure plan.
2026-05-29 14:12:18 -05:00