feat(scrape): warn on bot-token preflight and document cron install

Bootstrap surfaces forbidden-channel guidance; operator checklist
documents the installed monthly cron job.
This commit is contained in:
Boden 2026-05-29 14:03:01 -05:00
parent 058aaa0960
commit 8c7ae90f3f
3 changed files with 62 additions and 2 deletions

View file

@ -0,0 +1,41 @@
---
title: feat: Operator activation — cron install and bot-token warning
type: feat
status: completed
date: 2026-05-29
origin: LFG — original request included monthly cron; cron not yet installed; bot token blocks live fetch
---
# feat: Operator activation — cron install and bot-token warning
## Summary
Complete the operator slice: warn when preflight only succeeds via seeded archives (typical bot token), install the monthly cron job, and document activation in the checklist.
## Requirements
| ID | Requirement |
|----|-------------|
| R1 | `bootstrap-recurring-scrape.sh` prints clear bot-token / user-token guidance after seeded-only preflight |
| R2 | Monthly cron block installed via `setup-cron.sh` |
| R3 | Operator checklist notes cron installed and token type |
| R4 | Bootstrap smoke still passes |
## Implementation Units
### U1. Bot-token advisory
**Files:** `scripts/bootstrap-recurring-scrape.sh`
### U2. Install cron (runtime)
**Command:** `./scripts/setup-cron.sh --skip-preflight`
### U3. Docs
**Files:** `docs/recurring-scrape-operator-checklist.md`
## Verification
- `crontab -l` contains `BEGIN discord-scrape`
- `bootstrap-recurring-scrape-smoke.sh`

View file

@ -14,11 +14,13 @@ Use this after cloning or opening the **source** repo (`DiscordChatExporter`, no
```bash
./scripts/setup-cron.sh --dry-run
./scripts/setup-cron.sh
./scripts/setup-cron.sh --skip-preflight # after bootstrap preflight already succeeded
```
Defaults: first day of month at 04:00. Override with `--interval weekly`, `--at HH:MM`, or `--cron '0 4 1 * *'`.
Installed jobs are marked `# BEGIN discord-scrape` in `crontab -l`. Logs append to `logs/discord-scrape.log`.
## Narrow a run
```bash

View file

@ -134,9 +134,26 @@ main() {
preflight_args+=(--target "$target")
done
"${preflight_args[@]}"
local preflight_log preflight_status
preflight_log=$(mktemp "${TMPDIR:-/tmp}/dce-bootstrap-preflight.XXXXXX")
"${preflight_args[@]}" 2>&1 | tee "$preflight_log"
preflight_status=${PIPESTATUS[0]}
if (( preflight_status != 0 )); then
cat "$preflight_log" >&2
rm -f "$preflight_log"
exit "$preflight_status"
fi
printf '\nBootstrap complete.\n'
if grep -q 'inaccessible, but .* seeded archive' "$preflight_log" \
|| grep -qiE 'failed: forbidden|Missing Access' "$preflight_log"; then
printf '\nToken note: many channels returned forbidden. That usually means a bot token without message-history access.\n'
printf ' For live incremental downloads, put a user token in %s (see .docs/Token-and-IDs.md).\n' "$ENV_FILE"
printf ' Append-only archives are still safe: existing JSON is updated in place and never fully re-downloaded.\n'
fi
rm -f "$preflight_log"
printf ' Scrape now: %s\n' "$REPO_ROOT/scripts/run-documents-scrape.sh"
printf ' Install cron: %s --dry-run\n' "$REPO_ROOT/scripts/setup-cron.sh"
}