DiscordChatExporter/docs/plans/2026-06-04-078-feat-documents-scrape-log-file-plan.md
Copilot 759e33efe9 feat(scrape): add --log-file tee to documents scrape
Live runs auto-write logs/documents-scrape-UTC.log and pair JSON
summary with the log basename; optional --log-file overrides the path.
2026-06-03 11:21:59 -05:00

2 KiB
Raw Blame History

title type status date origin
feat: Documents scrape --log-file with tee feat complete 2026-06-04 /lfg — plan 077 deferred tee full documents-scrape stdout to persistent log

feat: Documents scrape --log-file with tee

Summary

Add --log-file PATH to run-documents-scrape.sh. Live scrapes auto-tee to logs/documents-scrape-<UTC>.log and pair JSON summary with <log-basename>.summary.json (parity with operator validation).

Problem Frame

Validation and proof persist teed logs with recoverable JSON summaries. The primary cron/operator entry run-documents-scrape.sh only prints to stdout; long KotOR catch-up runs leave no durable log unless the operator wraps the command manually.

Requirements

ID Requirement
R1 --log-file PATH appends all workflow output via tee -a
R2 Live scrape (not dry-run/salvage-only) auto-defaults log to logs/documents-scrape-<UTC>.log when unset
R3 Live scrape pairs summary with ${LOG_FILE%.log}.summary.json unless --summary-file or DCE_RUN_SUMMARY_FILE set
R4 Prints Log file: before scrape; Log: after tee completes
R5 Recovers missing summary from teed log via recover_json_summary_if_missing
R6 Dry-run and salvage-only skip auto log/summary unless --log-file explicitly passed
R7 documents-scrape-smoke.sh asserts teed log file on live --log-file run
R8 DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh → 23/23

Implementation Units

U1. run-documents-scrape.sh

Files: scripts/run-documents-scrape.sh, scripts/tests/documents-scrape-smoke.sh

U2. Docs

Files: docs/recurring-scrape-merge-readiness.md, docs/recurring-scrape-operator-checklist.md, scrape.env.example

Verification

DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh

Scope Boundaries

Deferred

  • Live KotOR catch-up on host
  • Refresh PR #1538 body with plans 070078 stamps
  • Wire --log-file into setup-cron crontab line