DiscordChatExporter/docs/plans/2026-05-29-011-feat-documents-recurring-scrape-verify-plan.md
Boden 90bd9da143 feat(scrape): harden preflight and cron config for Documents archives
Preflight probes skip forbidden channels when seeded archives exist.
Cron installer passes container config path and supports --config override.
Compose and docs align with append-only ~/Documents scrape workflow.
2026-05-29 13:49:09 -05:00

2.9 KiB

title: feat: Documents recurring scrape verification and operator closure type: feat status: completed date: 2026-05-29 origin: LFG — Docker/cron append-only Discord scrape for ~/Documents archive folders

feat: Documents recurring scrape verification and operator closure

Summary

Close the recurring Discord scrape vertical slice: source-built Docker image, compose mounts for config/scrape-targets.json and /home/brunner56/Documents archives, append-only JSON merge in scripts/run-discord-scrape.sh, monthly cron via scripts/setup-cron.sh, and runtime proof (preflight + incremental scrape on at least one enabled target).

Problem Frame

Operators need monthly (configurable) incremental exports into existing ~/Documents/*_discord* folders without re-downloading full history or overwriting archives when Discord deletes messages server-side. Infrastructure exists on feat/recurring-cli-scrape; this pass validates end-to-end behavior and documents the operator path.

Requirements

ID Requirement
R1 Dockerfile builds DiscordChatExporter.Cli from source; compose mounts config, scripts, and archive_root
R2 config/scrape-targets.json maps user Documents folders; empty channel_ids exports all accessible channels per target
R3 run-discord-scrape.sh uses --after + merge-by-id; rejects shrink merges
R4 setup-cron.sh defaults to monthly schedule; supports --target, --guild, --channel, --interval, --cron
R5 scrape.env (gitignored) supplies token for compose; never commit secrets
R6 Preflight and one-target scrape succeed against live Discord API
R7 Smoke tests pass; operator docs list validation commands

Scope Boundaries

  • No changes to upstream C# merge API (wrapper-only append).
  • Do not enable discord_dms without user token.
  • Token stays in scrape.env only.

Implementation Units

U1. Harden bootstrap and compose paths

Requirements: R1, R2

Files: scripts/run-discord-scrape.sh, docker-compose.yml, Dockerfile

Test scenarios: Archive seed files bootstrap channel-map; compose bind-mount resolves host Documents path.

U2. Cron installer and docs alignment

Requirements: R4, R7

Files: scripts/setup-cron.sh, .docs/Recurring-Scrape-Setup.md, Readme.md

Test scenarios: setup-cron.sh --dry-run emits monthly block; --remove idempotent.

U3. Runtime verification

Requirements: R5, R6

Commands: docker compose build, run-discord-scrape-host.sh preflight, scrape --target with smallest enabled archive.

Test scenarios: Message count non-decreasing after scrape; logs show --after when archive non-empty.

Verification Ladder

  1. bash -n on changed shell scripts
  2. scripts/tests/setup-cron-smoke.sh, run-discord-scrape-smoke.sh
  3. docker compose build + preflight + single-target scrape