DiscordChatExporter/docs/plans/2026-05-28-006-fix-documents-append-auth-plan.md
Boden 8c14dbbf45 fix(scrape): append safely under Documents with flexible auth
Bootstrap channel-map entries from existing archive filenames, reject merges
that would shrink large JSON exports, accept exported DISCORD_TOKEN when
scrape.env is missing, and disable the duplicate OpenKotOR target folder.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 00:58:03 -05:00

45 lines
2.3 KiB
Markdown

---
title: fix: Ensure Documents archive paths append safely with auth
type: fix
status: completed
date: 2026-05-28
origin: User request — extract to ~/Documents/** per server, append not overwrite, proper CLI auth
---
# fix: Ensure Documents archive paths append safely with auth
## Summary
Recurring scrapes must update the user's existing large JSON archives under `~/Documents/<server>/` in place using DiscordChatExporter incremental export (`--after`) and merge-by-id, never replacing a file with a fresh full export when an archive already exists. Auth must work without fragile manual setup.
## Problem Frame
| Gap | Impact |
|-----|--------|
| `scrape.env` required even when `DISCORD_TOKEN` is already exported | Preflight/scrape fail before auth is attempted |
| Channel map not bootstrapped from existing `* [id].json` files | Risk of creating parallel files instead of updating in place |
| Merge replaces destination via direct `mv` without monotonic guard | Large archives could shrink on bad merge |
| `OpenKotOR_discord_msgs` target points at missing folder | Target resolves zero channels while `openkotor_discord_msgs` holds data |
## Requirements
| ID | Requirement | Files |
|----|-------------|-------|
| U1 | Make host runner accept exported `DISCORD_TOKEN` / `DISCORD_TOKEN_FILE` when `scrape.env` is absent | `scripts/run-discord-scrape-host.sh`, smoke test |
| U2 | Bootstrap `output_dir/.dce-meta/channel-map.json` from existing `* [channel_id].json` archives before scrape/preflight | `scripts/run-discord-scrape.sh`, smoke test |
| U3 | Safe merge: verify merged message count ≥ existing; replace via temp file in target directory | `scripts/run-discord-scrape.sh`, smoke test |
| U4 | Align config with on-disk folders (disable missing OpenKotOR target) | `config/scrape-targets.json` |
| U5 | Document auth + in-place append contract | `.docs/Recurring-Scrape-Setup.md` |
## Test Scenarios
- Host runner succeeds with only `DISCORD_TOKEN` in environment (no scrape.env)
- Bootstrap writes channel-map entries for seeded archives without overwriting map entries
- Merge rejects shrinkage (fixture with fewer messages after merge)
- Existing smoke suite still passes
## Success Criteria
- `./scripts/tests/run-discord-scrape-smoke.sh` and host smoke pass
- Preflight can run once user exports token (even without scrape.env file)