DiscordChatExporter/docs/plans/2026-06-04-075-feat-documents-scrape-json-summary-plan.md
Copilot 8c36fdbdda feat(scrape): auto JSON summary on documents scrape runs
Enable DCE_RUN_SUMMARY_JSON by default for live run-documents-scrape
paths with optional --summary-file override; skip on dry-run/salvage-only.
2026-06-03 10:57:32 -05:00

54 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "feat: Auto JSON summary on documents scrape"
type: feat
status: complete
date: 2026-06-04
origin: /lfg — plan 074 deferred auto-enable JSON summary on bare scrape entrypoints; cron uses run-documents-scrape.sh
---
# feat: Auto JSON summary on documents scrape
## Summary
When `run-documents-scrape.sh` performs a live Discord scrape, auto-enable `DCE_RUN_SUMMARY_JSON=1` and write `logs/documents-scrape-<UTC>.summary.json` unless the operator already set `DCE_RUN_SUMMARY_FILE` or passes `--summary-file`.
## Problem Frame
Validation and proof auto-export JSON summaries (plans 070073). The primary incremental path — `run-documents-scrape.sh` and monthly cron — still requires manual env vars for machine-readable totals. Host runner recovery (plan 072) can populate the file from compose logs when env is set.
## Requirements
| ID | Requirement |
|----|-------------|
| R1 | Live scrape path exports `DCE_RUN_SUMMARY_JSON=1` when not dry-run/salvage-only |
| R2 | Default `DCE_RUN_SUMMARY_FILE` to `logs/documents-scrape-<UTC>.summary.json` when unset |
| R3 | Optional `--summary-file PATH` overrides default destination |
| R4 | Prints `JSON summary file:` before preflight/scrape |
| R5 | Dry-run and salvage-only do not enable JSON export |
| R6 | `documents-scrape-smoke.sh` asserts summary path on live scrape and absence on dry-run |
| R7 | `DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh` → 23/23 |
## Implementation Units
### U1. run-documents-scrape.sh
**Files:** `scripts/run-documents-scrape.sh`, `scripts/tests/documents-scrape-smoke.sh`
### U2. Docs
**Files:** `docs/recurring-scrape-merge-readiness.md`, `scrape.env.example`
## Verification
```bash
DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh
```
## Scope Boundaries
### Deferred
- Live KotOR catch-up on host
- Per-target separate summary files in multi-target proof/validation loops
- Tee full documents-scrape stdout to a log file