Commit graph

84 commits

Author SHA1 Message Date
Copilot 18a22a6fcf docs(scrape): plan 082 PR body prune for stale Latest blocks
Record merge-readiness stamp before rebuilding PR #1538 body.
2026-06-03 11:39:13 -05:00
Copilot 3d65c0e8e5 feat(scrape): cron opt-in salvage-before-scrape
setup-cron.sh forwards --salvage-before-scrape to documents scrape for
operators recovering from OOM partials on scheduled runs.
2026-06-03 11:35:50 -05:00
Copilot 8684138363 docs(scrape): plan 080 PR body sync for plans 070-079
Record branch HEAD and offline gate in merge-readiness before updating
PR #1538 with the compact operator delta block.
2026-06-03 11:31:27 -05:00
Copilot b71c697530 feat(scrape): cron uses documents scrape with --log-file
Monthly cron now runs the unified documents workflow with teed logs
and paired JSON summaries instead of host scrape shell redirect.
2026-06-03 11:27:12 -05:00
Copilot 759e33efe9 feat(scrape): add --log-file tee to documents scrape
Live runs auto-write logs/documents-scrape-UTC.log and pair JSON
summary with the log basename; optional --log-file overrides the path.
2026-06-03 11:21:59 -05:00
Copilot 33faba74d6 docs(scrape): sync smoke inventory to 23 offline tests
Add print-scrape-summary and scrape-summary-json smokes to the setup
table and update merge-readiness gate count from 21/21 to 23/23.
2026-06-03 11:15:17 -05:00
Copilot c8ed19d26b feat(scrape): per-target JSON summaries in multi-target loops
Validation --per-target and multi-target proof now pass --summary-file
per scrape so each target gets its own operator-*-<target>-UTC summary.
2026-06-03 11:08:44 -05:00
Copilot 8c36fdbdda feat(scrape): auto JSON summary on documents scrape runs
Enable DCE_RUN_SUMMARY_JSON by default for live run-documents-scrape
paths with optional --summary-file override; skip on dry-run/salvage-only.
2026-06-03 10:57:32 -05:00
Copilot a929be48e8 feat(scrape): add print-scrape-summary CLI for JSON artifacts
Pretty-print version-1 scrape summary files with totals table, --oom-only
filter, and stdin support for operator validation/proof outputs.
2026-06-03 10:45:55 -05:00
Copilot dbc887d81c feat(scrape): JSON summary export for operator proof runs
Auto-enable DCE_RUN_SUMMARY_* when proof scrapes, support --log-file,
and recover summary JSON from the teed proof log when file write fails.
2026-06-03 10:35:48 -05:00
Copilot 35a7416d8f feat(scrape): recover JSON summary from host compose run log
Reuse shared recover helper before deleting the temp compose log when
DCE_RUN_SUMMARY_FILE is missing after a successful host scrape.
2026-06-03 10:30:14 -05:00
Copilot fcea842fe3 feat(scrape): recover JSON summary from teed validation log
When DCE_RUN_SUMMARY_FILE is missing after operator validation, extract
the last DCE_JSON_SUMMARY line from the log. Refresh KotOR operator docs.
2026-06-03 10:25:23 -05:00
Copilot 5cfb2ed144 feat(scrape): host compose passthrough for JSON summary
Mount logs/ in compose, map DCE_RUN_SUMMARY_FILE to /logs, and auto-enable
JSON summary beside operator-validation log files when scraping.
2026-06-03 10:18:33 -05:00
Copilot 1dda40ae1b feat(scrape): optional JSON run summary for automation
Emit DCE_JSON_SUMMARY log line and/or write DCE_RUN_SUMMARY_FILE
with per-channel actions and totals after scrape completes.
2026-06-03 10:08:44 -05:00
Copilot aa85fe50fa feat(verify): show per-target container_memory in operator checks
Archive verify table adds MEM column; verify-operator-ready lists
config target memory when global DCE_CONTAINER_MEMORY is unset.
2026-06-03 10:00:27 -05:00
Copilot 8ca55f299b feat(scrape): per-target container_memory in scrape config
Single --target runs apply optional container_memory from
scrape-targets.json when global DCE_CONTAINER_MEMORY is unset.
KotOR_discord_msgs defaults to 8g; scrape.env still overrides.
2026-06-03 09:55:33 -05:00
Copilot 3e96514f3e feat(prove): filter incremental snapshots by --channel
Channel-scoped proof runs snapshot and compare only selected archives,
so yes_general-focused validation ignores unrelated KotOR channels.
Smoke covers filtered snapshot-only mode; exclude .dce-temp from find.
2026-06-03 09:44:33 -05:00
Copilot a827e6b9bc feat(scrape): label OOM skips and hint container memory
Classify aborted/OOM export skips as SKIPPED (OOM/aborted) in the run
summary with salvage/memory guidance; verify-operator-ready shows
configured DCE_CONTAINER_MEMORY.
2026-06-03 09:38:45 -05:00
Copilot e9a3fea9d1 docs(scrape): add OOM, lock, and salvage troubleshooting
Document container OOM skips, scrape-lock contention, partial temp
salvage, and DCE_CONTAINER_MEMORY in the troubleshooting guide and
GUI bridge quick-start.
2026-06-03 09:32:31 -05:00
Copilot 69ce1ca539 feat(scrape): optional DCE_CONTAINER_MEMORY compose mem_limit
Operators can raise the scrape container memory cap for large channel
catch-up (e.g. yes_general) via scrape.env without changing default runs.
2026-06-03 09:23:37 -05:00
Copilot 88267c835c docs(scrape): complete offline smoke inventory in setup guide
Align Recurring-Scrape-Setup smoke table with all 21 offline scripts
and note plan 061 shared scrape-lock library in merge-readiness.
2026-06-03 09:14:04 -05:00
Copilot a88cd815f4 refactor(scrape): extract shared scrape-lock library
Centralize archive-root lock path, held checks, holder formatting, and
reclaim helpers in scripts/lib/scrape-lock.sh. Source it from the host
runner, lock status script, and operator wrappers to remove duplicated
logic. Update documents-scrape smoke fake repo to include the new lib.
2026-06-03 07:14:22 -05:00
Copilot ad5384ecc1 docs(scrape): add salvage and lock operator playbook
Document scrape-lock-status, reclaim-stale, and salvage-before flags in
operator checklist, merge-readiness, and GUI bridge guide.
2026-06-03 07:10:18 -05:00
Copilot e82007a2c5 feat(scrape): reclaim stale lock and proof salvage-before smoke
Add --reclaim-stale to scrape-lock-status.sh for dead-holder cleanup and
extend operator-proof smoke to cover --salvage-before-scrape end-to-end.
2026-06-03 07:08:45 -05:00
Copilot b883943e3a feat(scrape): lock gate and salvage-before on documents scrape
Add archive-root lock preflight and --salvage-before-scrape to
run-documents-scrape.sh so direct operator invocations match validation
safety and KotOR catch-up workflow.
2026-06-03 07:04:56 -05:00
Copilot 363749231d feat(scrape): handoff salvage-only mode and validation lock gate
Wire --salvage-only through operator-handoff and run-operator-proof, and
refuse run-operator-validation scrape steps while the archive-root lock
is actively held.
2026-06-03 07:01:33 -05:00
Copilot 682094c348 feat(scrape): add scrape-lock-status diagnostic for archive-root lock
Introduce read-only scrape-lock-status.sh and surface lock state during
operator-handoff so shared Documents archives show active or stale holders
before starting another scrape.
2026-06-03 06:56:56 -05:00
Copilot 22915770e6 feat(scrape): wire salvage flags through operator validation and proof
Expose --salvage-only and --salvage-before-scrape on run-operator-validation.sh
and --salvage-before-scrape on run-operator-proof.sh so operators can merge stale
.dce-temp exports before or instead of incremental Discord scrapes.
2026-06-03 06:52:53 -05:00
Copilot 8468e34e37 feat(scrape): add salvage-only mode for stale temp exports
Expose run-discord-scrape.sh salvage and run-documents-scrape.sh
--salvage-only so operators can merge quiescent .dce-temp partials without
re-downloading from Discord after stopping a long or OOM-aborted run.
2026-06-03 06:48:39 -05:00
Copilot 2c01b3a7b9 fix(scrape): flock on archive_root with lock holder diagnostics
Serialize scrapes across repo checkouts that share the same Documents
archive_root, write lock meta with pid/cmd, and reclaim when the holder
process is dead.
2026-06-03 06:39:46 -05:00
Copilot 8057a4443a feat(operators): pass --channel through handoff, prove, and proof
Complete the operator CLI chain so focused yes_general catch-up can run
end-to-end with --target and --channel on handoff dry-run, prove scrape,
and operator-proof orchestration.
2026-06-03 06:36:02 -05:00
Copilot ae120c916f fix(scrape): skip active stale temps and retry salvage merge
Avoid salvaging export.json while a channel export is still writing
(default: skip temps modified within 120s). Retry truncate+merge once
when merge fails on a quiescent partial temp. Adds active-skip smoke
and ages stale fixture mtime so salvage tests stay deterministic.
2026-06-03 06:31:02 -05:00
Copilot 14796e9c09 fix(host): stop forcing compose -T so operator logs stream live
podman-compose and docker compose allocate a pseudo-TTY by default;
always passing -T block-buffered export progress. Omit -T for operator
runs and set DCE_COMPOSE_TTY=0 only for cron log append. Adds compose
TTY smokes and cron job env assertion.
2026-06-03 06:23:12 -05:00
Copilot d8742c5c7b feat(operator): pass --channel through documents and validation wrappers
Operators can narrow KotOR yes_general catch-up to a single channel ID
without re-scraping every guild channel. Adds dry-run and compose arg
capture smokes for the passthrough path.
2026-06-03 06:18:08 -05:00
Copilot 5820f67caf feat(host): stream container scrape output during long runs
Use tee in run_subcommand_with_retry so validation logs show live export
progress instead of buffering until compose exits. Add streaming smoke
assertion and skip scrape lock in documents-scrape smoke when prove runs
against the real repo root during parallel validation.
2026-06-03 06:13:58 -05:00
Copilot ee62078f5b fix(scrape): skip SIGTERM/SIGINT export aborts like OOM
Stopping validation with kill/Ctrl+C returned exit 143/130 and failed
the whole target instead of SKIPPED + preserve partial. Added smoke for
exit 143; gitignore .dce-scrape.lock.
2026-06-03 06:06:15 -05:00
Copilot b9bb4bbe64 fix(host): flock scrape lock prevents concurrent container exports
Overlapping run-operator-validation invocations spawned twin yes_general
exports and repeated OOM skips. Host scrape now holds .dce-scrape.lock;
smokes bypass via DCE_SKIP_SCRAPE_LOCK. Added lock smoke (20/20 pass).
2026-06-03 06:03:47 -05:00
Copilot 928c0ef682 fix(audit): exclude .dce-temp partial exports from JSON audit
Operator validation failed when yes_general OOM left truncated exports
under .dce-temp. Audit and archive verification now skip in-progress temps;
smoke covers the partial-temp case. KotOR audit passes with temps present.
2026-06-03 05:59:54 -05:00
Copilot 8b54b6a498 test(scrape): preserve-partial smoke; fix host token-file precedence
Add offline regression for OOM skip preserving partial export temps.
Host wrapper now prefers DISCORD_TOKEN_FILE over inherited shell tokens
and always writes explicit compose env for auth-retry. All 19 smokes pass.
2026-06-03 05:52:39 -05:00
Copilot 87537eb8b0 fix(scrape): preserve partial temps on OOM; large-file salvage merge
OOM/aborted channel exports no longer delete partial temp downloads.
Salvage uses grep boundary repair with python merge/validate for files
over 64 MiB. Retain stale temps when merge fails instead of discarding.
2026-06-03 05:35:22 -05:00
Copilot c13c4167be fix(scrape): salvage stale temp exports before re-downloading
When a previous export crashes (OOM, abort, kill), the partially-
downloaded temp export under .dce-temp/ was orphaned. Subsequent
runs started the incremental from the archive's last message ID,
re-downloading everything the failed run had already fetched.

Now scrape_target() checks for orphaned temp exports before each
channel export, salvages truncated JSON (same marker-based repair
as salvage-truncated-export.sh), merges recovered messages into
the archive, and cleans up stale temp dirs. The incremental then
starts from the truly latest message.

Adds salvage-stale smoke test with truncated fixture.
2026-06-03 01:11:28 -05:00
Copilot 87284816d0 test(scrape): add abort exit 134 skip smoke; plan 041 closure
Extend run-discord-scrape-smoke with skip-abort target so OOM/abort
channel skip from plan 040 has offline regression coverage. Update
merge-readiness for 2026-05-30 and KotOR validation retry in progress.
2026-06-03 00:57:11 -05:00
Copilot 1608e7cfb0 fix(scrape): skip channels on OOM/abort export exit codes
Treat CLI exit 134/137/139 and abort/OOM log patterns as skippable
so KotOR yes_general core dump does not fail the entire target scrape.
2026-06-03 00:44:06 -05:00
Copilot bc1f727907 feat(scrape): complete validation resume (8/9 targets)
Resume per-target validation for five remaining servers; clarify
validation log labels (begin/done/failed). Document 8/9 pass in
merge-readiness; KotOR_discord_msgs fails on yes_general export.
2026-05-29 23:35:35 -05:00
Copilot b089137c52 docs(scrape): record per-target validation outcomes (plan 037)
Document full-validation-latest.log results in merge-readiness:
four targets scrape+audit pass; KotOR_discord_msgs and remainder
documented as pending while long-running validation continues.
2026-05-29 21:56:00 -05:00
Copilot 71a443267e feat(scrape): run plan, channel ledger, and all-target proof
Log scrape plan/summary with per-file message deltas in the core script.
Host wrappers and operator entrypoints print target lists; operator-proof
defaults to all enabled targets when --target is omitted.
2026-05-29 20:34:22 -05:00
Copilot 0b242ddfc4 docs(scrape): stamp merge-ready after host validation
Document offline/live/cron gates; align operator checklist with
run-operator-proof.
2026-05-29 16:37:57 -05:00
Boden a4f080e6d9 docs(scrape): record live operator proof on eod_discord
Host validation passed with podman-compose and GUI token sync; note disk
headroom before large archive merges.
2026-05-29 16:36:02 -05:00
Boden 1e35761dbb test(scrape): lock mixed-length snowflake cursor selection
Add cursor-mixed-length smoke where string max_by would pick the wrong
--after value; padded sort_by in last_message_id already picks the max.
2026-05-29 16:33:00 -05:00
Boden 396832b047 docs(scrape): LFG closure for operator proof path
Document run-operator-proof in setup and GUI bridge; record plan 033.
2026-05-29 16:28:11 -05:00