Centralize archive-root lock path, held checks, holder formatting, and
reclaim helpers in scripts/lib/scrape-lock.sh. Source it from the host
runner, lock status script, and operator wrappers to remove duplicated
logic. Update documents-scrape smoke fake repo to include the new lib.
Expose run-discord-scrape.sh salvage and run-documents-scrape.sh
--salvage-only so operators can merge quiescent .dce-temp partials without
re-downloading from Discord after stopping a long or OOM-aborted run.
Serialize scrapes across repo checkouts that share the same Documents
archive_root, write lock meta with pid/cmd, and reclaim when the holder
process is dead.
podman-compose and docker compose allocate a pseudo-TTY by default;
always passing -T block-buffered export progress. Omit -T for operator
runs and set DCE_COMPOSE_TTY=0 only for cron log append. Adds compose
TTY smokes and cron job env assertion.
Use tee in run_subcommand_with_retry so validation logs show live export
progress instead of buffering until compose exits. Add streaming smoke
assertion and skip scrape lock in documents-scrape smoke when prove runs
against the real repo root during parallel validation.
Log scrape plan/summary with per-file message deltas in the core script.
Host wrappers and operator entrypoints print target lists; operator-proof
defaults to all enabled targets when --target is omitted.
Add run-operator-proof for one-target handoff/scrape/prove flows.
Prefer podman-compose on Podman hosts but honor DCE_DOCKER_BIN overrides
so offline smokes keep using fake compose shims.
run-discord-scrape-host.sh runs verify --disk-only before preflight/scrape
so setup-cron monthly jobs fail fast when archive roots are low on space.
Harden bootstrap smoke to surface failures when dry-run fails.
Discover Discord tokens from env, token files, GUI Settings.dat, and desktop
leveldb; bind-mount the host scrape script so container preflight uses
partition/--after cursors; skip inaccessible channels without aborting targets;
fix set -e and busybox mktemp for incremental exports under ~/Documents.
Co-authored-by: Cursor <cursoragent@cursor.com>
Auto-discover token files, provide run-documents-scrape and
prove-incremental-append helpers, and document the end-to-end operator flow.
Co-authored-by: Cursor <cursoragent@cursor.com>
Bootstrap channel-map entries from existing archive filenames, reject merges
that would shrink large JSON exports, accept exported DISCORD_TOKEN when
scrape.env is missing, and disable the duplicate OpenKotOR target folder.
Co-authored-by: Cursor <cursoragent@cursor.com>
Use max message ID for incremental exports, validate custom cron
expressions, drop eval from host/preflight paths, restrict reauth to
executable repo scripts, and run smoke tests in CI.
Co-authored-by: Cursor <cursoragent@cursor.com>