Log scrape plan/summary with per-file message deltas in the core script.
Host wrappers and operator entrypoints print target lists; operator-proof
defaults to all enabled targets when --target is omitted.
Add run-operator-proof for one-target handoff/scrape/prove flows.
Prefer podman-compose on Podman hosts but honor DCE_DOCKER_BIN overrides
so offline smokes keep using fake compose shims.
run-discord-scrape-host.sh runs verify --disk-only before preflight/scrape
so setup-cron monthly jobs fail fast when archive roots are low on space.
Harden bootstrap smoke to surface failures when dry-run fails.
Fail fast when archive or repo paths lack free space (DCE_MIN_FREE_MB),
treat disk-full export failures as skippable channels, and add an offline
disk-space smoke. Smokes default DCE_MIN_FREE_MB=0 so CI stays portable.
Add --snapshot-only and --compare-snapshots to prove-incremental-append,
fix EXIT trap status, wire prove smoke into CI, and document LFG closure plan.
Discover Discord tokens from env, token files, GUI Settings.dat, and desktop
leveldb; bind-mount the host scrape script so container preflight uses
partition/--after cursors; skip inaccessible channels without aborting targets;
fix set -e and busybox mktemp for incremental exports under ~/Documents.
Co-authored-by: Cursor <cursoragent@cursor.com>
Auto-discover token files, provide run-documents-scrape and
prove-incremental-append helpers, and document the end-to-end operator flow.
Co-authored-by: Cursor <cursoragent@cursor.com>
Add scripts to confirm enabled targets map to seeded ~/Documents archives
and to create scrape.env from exported DISCORD_TOKEN credentials.
Co-authored-by: Cursor <cursoragent@cursor.com>
Bootstrap channel-map entries from existing archive filenames, reject merges
that would shrink large JSON exports, accept exported DISCORD_TOKEN when
scrape.env is missing, and disable the duplicate OpenKotOR target folder.
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds scripts/gh-approve-pr-runs.sh with GITHUB_TOKEN bootstrap, explicit
admin-rights policy classification, smoke coverage, and CI wiring. Marks
the remaining 2026-05-24 recurring scrape plans completed.
Co-authored-by: Cursor <cursoragent@cursor.com>
Use max message ID for incremental exports, validate custom cron
expressions, drop eval from host/preflight paths, restrict reauth to
executable repo scripts, and run smoke tests in CI.
Co-authored-by: Cursor <cursoragent@cursor.com>
IMPLEMENTATION UNITS (U1-U6):
U1: Append-only merge test coverage
- Enhanced run-discord-scrape-smoke.sh with additional test scenarios
- Created append-partial-write.json and append-concurrent-conflict.json fixtures
- Added assertions for message sorting, deduplication, and idempotency
- All 10 merge scenarios validated
U2: Error handling validation
- Created error-path-smoke.sh with 6 error scenario tests
- Added test configs for invalid paths, missing files, bad JSON
- Verified fail-closed behavior on all error paths
- No silent data loss on any failure
U3: Cron idempotency and lifecycle
- Created cron-idempotency-smoke.sh with full lifecycle testing
- Created fixture crontab with unrelated entries (preservation test)
- Verified idempotent install, update, and remove operations
- Confirmed dry-run and entry preservation
U4: Preflight and end-to-end setup
- Created end-to-end-preflight-smoke.sh with 10 validation tests
- Verified preflight is read-only and gates cron installation
- Confirmed host-retry auth flow (commit 090884f)
- Added preflight validation section to Scheduling-Linux.md
U5: Documentation completion
- Updated Readme.md with recurring-scraper link
- Created Recurring-Scrape-Setup.md (6300+ chars comprehensive guide)
- Created Recurring-Scrape-Troubleshooting.md (9200+ chars with 30+ scenarios)
- Enhanced .docs/Scheduling-Linux.md with preflight section
- All documented behavior matches implementation
U6: Production-readiness checklist
- Created docs/recurring-scrape-production-checklist.md
- Compiled all validation results (33+ scenarios across U1-U5)
- Documented test execution commands for re-validation
- Provided deployment notes and monitoring guidance
- Clear sign-off criteria established
ARTIFACTS:
- 4 new smoke test scripts (1000+ lines total)
- 4 new fixtures and test configs
- 3 new documentation files (15500+ chars)
- 2 updated documentation files
- 1 validation checklist tracking document
- All tests passing
SAFETY GUARANTEES VERIFIED:
✅ No silent data loss on any error path
✅ Fail-closed behavior throughout
✅ Archive updates are append-only and idempotent
✅ Cron installation is idempotent
✅ Unrelated cron entries preserved
✅ Preflight is read-only
✅ Token validated before operations
✅ Path traversal prevented
STATUS: Production Ready
All 6 implementation units complete and validated.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>