Commit graph

19 commits

Author SHA1 Message Date
Copilot 5cfb2ed144 feat(scrape): host compose passthrough for JSON summary
Mount logs/ in compose, map DCE_RUN_SUMMARY_FILE to /logs, and auto-enable
JSON summary beside operator-validation log files when scraping.
2026-06-03 10:18:33 -05:00
Copilot 8ca55f299b feat(scrape): per-target container_memory in scrape config
Single --target runs apply optional container_memory from
scrape-targets.json when global DCE_CONTAINER_MEMORY is unset.
KotOR_discord_msgs defaults to 8g; scrape.env still overrides.
2026-06-03 09:55:33 -05:00
Copilot 69ce1ca539 feat(scrape): optional DCE_CONTAINER_MEMORY compose mem_limit
Operators can raise the scrape container memory cap for large channel
catch-up (e.g. yes_general) via scrape.env without changing default runs.
2026-06-03 09:23:37 -05:00
Copilot a88cd815f4 refactor(scrape): extract shared scrape-lock library
Centralize archive-root lock path, held checks, holder formatting, and
reclaim helpers in scripts/lib/scrape-lock.sh. Source it from the host
runner, lock status script, and operator wrappers to remove duplicated
logic. Update documents-scrape smoke fake repo to include the new lib.
2026-06-03 07:14:22 -05:00
Copilot 8468e34e37 feat(scrape): add salvage-only mode for stale temp exports
Expose run-discord-scrape.sh salvage and run-documents-scrape.sh
--salvage-only so operators can merge quiescent .dce-temp partials without
re-downloading from Discord after stopping a long or OOM-aborted run.
2026-06-03 06:48:39 -05:00
Copilot 2c01b3a7b9 fix(scrape): flock on archive_root with lock holder diagnostics
Serialize scrapes across repo checkouts that share the same Documents
archive_root, write lock meta with pid/cmd, and reclaim when the holder
process is dead.
2026-06-03 06:39:46 -05:00
Copilot 14796e9c09 fix(host): stop forcing compose -T so operator logs stream live
podman-compose and docker compose allocate a pseudo-TTY by default;
always passing -T block-buffered export progress. Omit -T for operator
runs and set DCE_COMPOSE_TTY=0 only for cron log append. Adds compose
TTY smokes and cron job env assertion.
2026-06-03 06:23:12 -05:00
Copilot 5820f67caf feat(host): stream container scrape output during long runs
Use tee in run_subcommand_with_retry so validation logs show live export
progress instead of buffering until compose exits. Add streaming smoke
assertion and skip scrape lock in documents-scrape smoke when prove runs
against the real repo root during parallel validation.
2026-06-03 06:13:58 -05:00
Copilot b9bb4bbe64 fix(host): flock scrape lock prevents concurrent container exports
Overlapping run-operator-validation invocations spawned twin yes_general
exports and repeated OOM skips. Host scrape now holds .dce-scrape.lock;
smokes bypass via DCE_SKIP_SCRAPE_LOCK. Added lock smoke (20/20 pass).
2026-06-03 06:03:47 -05:00
Copilot 8b54b6a498 test(scrape): preserve-partial smoke; fix host token-file precedence
Add offline regression for OOM skip preserving partial export temps.
Host wrapper now prefers DISCORD_TOKEN_FILE over inherited shell tokens
and always writes explicit compose env for auth-retry. All 19 smokes pass.
2026-06-03 05:52:39 -05:00
Copilot 71a443267e feat(scrape): run plan, channel ledger, and all-target proof
Log scrape plan/summary with per-file message deltas in the core script.
Host wrappers and operator entrypoints print target lists; operator-proof
defaults to all enabled targets when --target is omitted.
2026-05-29 20:34:22 -05:00
Boden 65c9fb2206 feat(scrape): operator proof script and podman-compose smoke fix
Add run-operator-proof for one-target handoff/scrape/prove flows.
Prefer podman-compose on Podman hosts but honor DCE_DOCKER_BIN overrides
so offline smokes keep using fake compose shims.
2026-05-29 16:20:25 -05:00
Boden 44eadee634 feat(scrape): disk preflight on host runner for cron jobs
run-discord-scrape-host.sh runs verify --disk-only before preflight/scrape
so setup-cron monthly jobs fail fast when archive roots are low on space.
Harden bootstrap smoke to surface failures when dry-run fails.
2026-05-29 16:00:11 -05:00
Boden 57d472f8e8 fix(scrape): auth discovery, skip forbidden channels, mount host script
Discover Discord tokens from env, token files, GUI Settings.dat, and desktop
leveldb; bind-mount the host scrape script so container preflight uses
partition/--after cursors; skip inaccessible channels without aborting targets;
fix set -e and busybox mktemp for incremental exports under ~/Documents.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 14:57:16 -05:00
Boden 88e864c72a feat(scrape): add Documents scrape workflow and token discovery
Auto-discover token files, provide run-documents-scrape and
prove-incremental-append helpers, and document the end-to-end operator flow.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 02:23:34 -05:00
Boden 8c14dbbf45 fix(scrape): append safely under Documents with flexible auth
Bootstrap channel-map entries from existing archive filenames, reject merges
that would shrink large JSON exports, accept exported DISCORD_TOKEN when
scrape.env is missing, and disable the duplicate OpenKotOR target folder.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 00:58:03 -05:00
Boden df499568d9 fix: harden recurring scrape scripts from review residuals
Use max message ID for incremental exports, validate custom cron
expressions, drop eval from host/preflight paths, restrict reauth to
executable repo scripts, and run smoke tests in CI.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 00:08:22 -05:00
Your Name 0c92823061 fix(review): apply autofix feedback
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 22:19:57 -05:00
Your Name d63cc1ef2f fix(review): apply autofix feedback
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 21:07:20 -05:00