DiscordChatExporter/docs/plans/2026-05-29-011-feat-documents-recurring-scrape-verify-plan.md
Boden 90bd9da143 feat(scrape): harden preflight and cron config for Documents archives
Preflight probes skip forbidden channels when seeded archives exist.
Cron installer passes container config path and supports --config override.
Compose and docs align with append-only ~/Documents scrape workflow.
2026-05-29 13:49:09 -05:00

68 lines
2.9 KiB
Markdown

---
title: feat: Documents recurring scrape verification and operator closure
type: feat
status: completed
date: 2026-05-29
origin: LFG — Docker/cron append-only Discord scrape for ~/Documents archive folders
---
# feat: Documents recurring scrape verification and operator closure
## Summary
Close the recurring Discord scrape vertical slice: source-built Docker image, compose mounts for `config/scrape-targets.json` and `/home/brunner56/Documents` archives, append-only JSON merge in `scripts/run-discord-scrape.sh`, monthly cron via `scripts/setup-cron.sh`, and runtime proof (preflight + incremental scrape on at least one enabled target).
## Problem Frame
Operators need monthly (configurable) incremental exports into existing `~/Documents/*_discord*` folders without re-downloading full history or overwriting archives when Discord deletes messages server-side. Infrastructure exists on `feat/recurring-cli-scrape`; this pass validates end-to-end behavior and documents the operator path.
## Requirements
| ID | Requirement |
|----|-------------|
| R1 | `Dockerfile` builds `DiscordChatExporter.Cli` from source; compose mounts config, scripts, and `archive_root` |
| R2 | `config/scrape-targets.json` maps user Documents folders; empty `channel_ids` exports all accessible channels per target |
| R3 | `run-discord-scrape.sh` uses `--after` + merge-by-id; rejects shrink merges |
| R4 | `setup-cron.sh` defaults to monthly schedule; supports `--target`, `--guild`, `--channel`, `--interval`, `--cron` |
| R5 | `scrape.env` (gitignored) supplies token for compose; never commit secrets |
| R6 | Preflight and one-target scrape succeed against live Discord API |
| R7 | Smoke tests pass; operator docs list validation commands |
## Scope Boundaries
- No changes to upstream C# merge API (wrapper-only append).
- Do not enable `discord_dms` without user token.
- Token stays in `scrape.env` only.
## Implementation Units
### U1. Harden bootstrap and compose paths
**Requirements:** R1, R2
**Files:** `scripts/run-discord-scrape.sh`, `docker-compose.yml`, `Dockerfile`
**Test scenarios:** Archive seed files bootstrap channel-map; compose bind-mount resolves host Documents path.
### U2. Cron installer and docs alignment
**Requirements:** R4, R7
**Files:** `scripts/setup-cron.sh`, `.docs/Recurring-Scrape-Setup.md`, `Readme.md`
**Test scenarios:** `setup-cron.sh --dry-run` emits monthly block; `--remove` idempotent.
### U3. Runtime verification
**Requirements:** R5, R6
**Commands:** `docker compose build`, `run-discord-scrape-host.sh preflight`, scrape `--target` with smallest enabled archive.
**Test scenarios:** Message count non-decreasing after scrape; logs show `--after` when archive non-empty.
## Verification Ladder
1. `bash -n` on changed shell scripts
2. `scripts/tests/setup-cron-smoke.sh`, `run-discord-scrape-smoke.sh`
3. `docker compose build` + preflight + single-target scrape