Preflight probes skip forbidden channels when seeded archives exist. Cron installer passes container config path and supports --config override. Compose and docs align with append-only ~/Documents scrape workflow.
2.9 KiB
feat: Documents recurring scrape verification and operator closure
Summary
Close the recurring Discord scrape vertical slice: source-built Docker image, compose mounts for config/scrape-targets.json and /home/brunner56/Documents archives, append-only JSON merge in scripts/run-discord-scrape.sh, monthly cron via scripts/setup-cron.sh, and runtime proof (preflight + incremental scrape on at least one enabled target).
Problem Frame
Operators need monthly (configurable) incremental exports into existing ~/Documents/*_discord* folders without re-downloading full history or overwriting archives when Discord deletes messages server-side. Infrastructure exists on feat/recurring-cli-scrape; this pass validates end-to-end behavior and documents the operator path.
Requirements
| ID | Requirement |
|---|---|
| R1 | Dockerfile builds DiscordChatExporter.Cli from source; compose mounts config, scripts, and archive_root |
| R2 | config/scrape-targets.json maps user Documents folders; empty channel_ids exports all accessible channels per target |
| R3 | run-discord-scrape.sh uses --after + merge-by-id; rejects shrink merges |
| R4 | setup-cron.sh defaults to monthly schedule; supports --target, --guild, --channel, --interval, --cron |
| R5 | scrape.env (gitignored) supplies token for compose; never commit secrets |
| R6 | Preflight and one-target scrape succeed against live Discord API |
| R7 | Smoke tests pass; operator docs list validation commands |
Scope Boundaries
- No changes to upstream C# merge API (wrapper-only append).
- Do not enable
discord_dmswithout user token. - Token stays in
scrape.envonly.
Implementation Units
U1. Harden bootstrap and compose paths
Requirements: R1, R2
Files: scripts/run-discord-scrape.sh, docker-compose.yml, Dockerfile
Test scenarios: Archive seed files bootstrap channel-map; compose bind-mount resolves host Documents path.
U2. Cron installer and docs alignment
Requirements: R4, R7
Files: scripts/setup-cron.sh, .docs/Recurring-Scrape-Setup.md, Readme.md
Test scenarios: setup-cron.sh --dry-run emits monthly block; --remove idempotent.
U3. Runtime verification
Requirements: R5, R6
Commands: docker compose build, run-discord-scrape-host.sh preflight, scrape --target with smallest enabled archive.
Test scenarios: Message count non-decreasing after scrape; logs show --after when archive non-empty.
Verification Ladder
bash -non changed shell scriptsscripts/tests/setup-cron-smoke.sh,run-discord-scrape-smoke.shdocker compose build+ preflight + single-target scrape