Commit graph

10 commits

Author SHA1 Message Date
Copilot 759e33efe9 feat(scrape): add --log-file tee to documents scrape
Live runs auto-write logs/documents-scrape-UTC.log and pair JSON
summary with the log basename; optional --log-file overrides the path.
2026-06-03 11:21:59 -05:00
Copilot 8c36fdbdda feat(scrape): auto JSON summary on documents scrape runs
Enable DCE_RUN_SUMMARY_JSON by default for live run-documents-scrape
paths with optional --summary-file override; skip on dry-run/salvage-only.
2026-06-03 10:57:32 -05:00
Copilot a88cd815f4 refactor(scrape): extract shared scrape-lock library
Centralize archive-root lock path, held checks, holder formatting, and
reclaim helpers in scripts/lib/scrape-lock.sh. Source it from the host
runner, lock status script, and operator wrappers to remove duplicated
logic. Update documents-scrape smoke fake repo to include the new lib.
2026-06-03 07:14:22 -05:00
Copilot b883943e3a feat(scrape): lock gate and salvage-before on documents scrape
Add archive-root lock preflight and --salvage-before-scrape to
run-documents-scrape.sh so direct operator invocations match validation
safety and KotOR catch-up workflow.
2026-06-03 07:04:56 -05:00
Copilot 8468e34e37 feat(scrape): add salvage-only mode for stale temp exports
Expose run-discord-scrape.sh salvage and run-documents-scrape.sh
--salvage-only so operators can merge quiescent .dce-temp partials without
re-downloading from Discord after stopping a long or OOM-aborted run.
2026-06-03 06:48:39 -05:00
Copilot d8742c5c7b feat(operator): pass --channel through documents and validation wrappers
Operators can narrow KotOR yes_general catch-up to a single channel ID
without re-scraping every guild channel. Adds dry-run and compose arg
capture smokes for the passthrough path.
2026-06-03 06:18:08 -05:00
Copilot 71a443267e feat(scrape): run plan, channel ledger, and all-target proof
Log scrape plan/summary with per-file message deltas in the core script.
Host wrappers and operator entrypoints print target lists; operator-proof
defaults to all enabled targets when --target is omitted.
2026-05-29 20:34:22 -05:00
Boden 32b7f47d45 feat(scrape): disk-only verify flag and documents scrape preflight
run-documents-scrape.sh checks archive free space before live Discord
calls; verify-operator-ready.sh adds --disk-only for lightweight checks.
2026-05-29 15:54:08 -05:00
Boden 57d472f8e8 fix(scrape): auth discovery, skip forbidden channels, mount host script
Discover Discord tokens from env, token files, GUI Settings.dat, and desktop
leveldb; bind-mount the host scrape script so container preflight uses
partition/--after cursors; skip inaccessible channels without aborting targets;
fix set -e and busybox mktemp for incremental exports under ~/Documents.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 14:57:16 -05:00
Boden 88e864c72a feat(scrape): add Documents scrape workflow and token discovery
Auto-discover token files, provide run-documents-scrape and
prove-incremental-append helpers, and document the end-to-end operator flow.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 02:23:34 -05:00