mirror of
https://github.com/Tyrrrz/DiscordChatExporter.git
synced 2026-06-10 08:12:38 -06:00
Log scrape plan/summary with per-file message deltas in the core script. Host wrappers and operator entrypoints print target lists; operator-proof defaults to all enabled targets when --target is omitted.
98 lines
4.7 KiB
Markdown
98 lines
4.7 KiB
Markdown
---
|
|
title: "feat: Scrape logging, run summary, and default-all-targets"
|
|
type: feat
|
|
status: complete
|
|
date: 2026-05-29
|
|
origin: /lfg — operator scripts need explicit server/file/message visibility and sane defaults
|
|
---
|
|
|
|
# feat: Scrape logging, run summary, and default-all-targets
|
|
|
|
## Summary
|
|
|
|
Make recurring scrape scripts print a upfront run plan (which guilds/servers, which output folders), per-channel file I/O with message deltas, and a final change summary. Operator entrypoints default to all enabled targets from `config/scrape-targets.json` without requiring repeated `--target` flags.
|
|
|
|
## Problem Frame
|
|
|
|
Operators cannot tell from current logs which Discord server was scraped, which archive files were touched, or how many messages were appended vs unchanged. `run-operator-proof.sh` still hardcodes `eod_discord`. The core engine (`run-discord-scrape.sh`) logs channel IDs but not guild names, paths, or before/after counts in one place.
|
|
|
|
## Requirements
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| R1 | Before scrape/preflight, log config path and every selected target with `name`, resolved guild id/name(s), and `output_dir` |
|
|
| R2 | For each channel processed, log destination file path, action (`CREATED`, `MERGED`, `UNCHANGED`, `SKIPPED`), and message counts before → after (plus fetched batch size when merged) |
|
|
| R3 | After all targets complete, print a consolidated run summary with per-file deltas and totals |
|
|
| R4 | `run-documents-scrape.sh` and host wrapper print the same run-plan header before invoking the container |
|
|
| R5 | `run-operator-proof.sh` defaults to all enabled targets (loop handoff → scrape → prove) when `--target` is omitted |
|
|
| R6 | Offline smokes pass; scrape smoke asserts summary markers exist |
|
|
|
|
## Key Technical Decisions
|
|
|
|
- **KTD1: Ledger in `run-discord-scrape.sh`:** Keep summary state in bash arrays inside the core script rather than a new shared library — host wrappers only need jq-based target listing; the container owns channel-level detail.
|
|
- **KTD2: Guild labels from cache + export metadata:** Resolve guild names from `load_guild_cache` at target start; enrich per-channel lines from export JSON when available.
|
|
- **KTD3: No behavior change to merge semantics:** Logging only; append-only merge and skip rules stay unchanged.
|
|
|
|
## Implementation Units
|
|
|
|
### U1. Core scrape ledger and summary
|
|
|
|
**Goal:** Operator-visible run plan, per-channel I/O lines, and final summary in `run-discord-scrape.sh`.
|
|
|
|
**Requirements:** R1, R2, R3
|
|
|
|
**Files:** `scripts/run-discord-scrape.sh`, `scripts/tests/run-discord-scrape-smoke.sh`
|
|
|
|
**Approach:** Add `SCRAPE_SUMMARY_ENTRIES`, `guild_name_for_id`, `describe_target_resolution`, `log_run_plan`, `record_channel_result`, `print_scrape_summary`. Call from `run_target_mode` and `scrape_target`. Preflight reuses run plan header.
|
|
|
|
**Test scenarios:**
|
|
- Happy path: smoke run shows `Scrape run plan`, `MERGED`/`CREATED`/`UNCHANGED` lines, and `Scrape run summary`
|
|
- Edge: skipped channel appears as `SKIPPED` in summary
|
|
- Error path: failure before summary still leaves partial ledger in stderr
|
|
|
|
**Verification:** `./scripts/tests/run-discord-scrape-smoke.sh` passes with grep for summary markers.
|
|
|
|
### U2. Host and documents wrapper banners
|
|
|
|
**Goal:** Host-side run plan before container execution.
|
|
|
|
**Requirements:** R4
|
|
|
|
**Files:** `scripts/run-discord-scrape-host.sh`, `scripts/run-documents-scrape.sh`, `scripts/operator-handoff.sh`
|
|
|
|
**Approach:** Shared helper pattern: jq list enabled/selected targets with output_dir; print subcommand and config paths. `operator-handoff` lists enabled targets in handoff header.
|
|
|
|
**Test scenarios:**
|
|
- Happy path: documents-scrape dry-run output includes target list
|
|
- Integration: host smoke unchanged (no regression)
|
|
|
|
**Verification:** `./scripts/tests/documents-scrape-smoke.sh`, `./scripts/tests/run-discord-scrape-host-smoke.sh`
|
|
|
|
### U3. Operator proof defaults to all enabled targets
|
|
|
|
**Goal:** Remove hardcoded `eod_discord`; loop all enabled targets when `--target` omitted.
|
|
|
|
**Requirements:** R5
|
|
|
|
**Files:** `scripts/run-operator-proof.sh`, `scripts/tests/run-operator-proof-smoke.sh` (if present)
|
|
|
|
**Approach:** When `TARGET` empty, `mapfile` enabled names from config and run handoff once then scrape+prove per target; print per-target summary at end.
|
|
|
|
**Test scenarios:**
|
|
- Happy path: smoke with fake scripts verifies multi-target loop
|
|
- Edge: single `--target` still runs one target only
|
|
|
|
**Verification:** operator-proof smoke or documents smoke + manual grep.
|
|
|
|
## Scope Boundaries
|
|
|
|
### Deferred to Follow-Up Work
|
|
|
|
- Structured JSON run logs for machine parsing
|
|
- Changing `prove-incremental-append.sh` to require optional `--target`
|
|
|
|
### Out of scope
|
|
|
|
- Discord API or merge algorithm changes
|
|
- New CLI flags beyond existing `--target` narrowing
|