Commit graph

40 commits

Author SHA1 Message Date
Copilot 8057a4443a feat(operators): pass --channel through handoff, prove, and proof
Complete the operator CLI chain so focused yes_general catch-up can run
end-to-end with --target and --channel on handoff dry-run, prove scrape,
and operator-proof orchestration.
2026-06-03 06:36:02 -05:00
Copilot ae120c916f fix(scrape): skip active stale temps and retry salvage merge
Avoid salvaging export.json while a channel export is still writing
(default: skip temps modified within 120s). Retry truncate+merge once
when merge fails on a quiescent partial temp. Adds active-skip smoke
and ages stale fixture mtime so salvage tests stay deterministic.
2026-06-03 06:31:02 -05:00
Copilot 14796e9c09 fix(host): stop forcing compose -T so operator logs stream live
podman-compose and docker compose allocate a pseudo-TTY by default;
always passing -T block-buffered export progress. Omit -T for operator
runs and set DCE_COMPOSE_TTY=0 only for cron log append. Adds compose
TTY smokes and cron job env assertion.
2026-06-03 06:23:12 -05:00
Copilot d8742c5c7b feat(operator): pass --channel through documents and validation wrappers
Operators can narrow KotOR yes_general catch-up to a single channel ID
without re-scraping every guild channel. Adds dry-run and compose arg
capture smokes for the passthrough path.
2026-06-03 06:18:08 -05:00
Copilot 5820f67caf feat(host): stream container scrape output during long runs
Use tee in run_subcommand_with_retry so validation logs show live export
progress instead of buffering until compose exits. Add streaming smoke
assertion and skip scrape lock in documents-scrape smoke when prove runs
against the real repo root during parallel validation.
2026-06-03 06:13:58 -05:00
Copilot ee62078f5b fix(scrape): skip SIGTERM/SIGINT export aborts like OOM
Stopping validation with kill/Ctrl+C returned exit 143/130 and failed
the whole target instead of SKIPPED + preserve partial. Added smoke for
exit 143; gitignore .dce-scrape.lock.
2026-06-03 06:06:15 -05:00
Copilot b9bb4bbe64 fix(host): flock scrape lock prevents concurrent container exports
Overlapping run-operator-validation invocations spawned twin yes_general
exports and repeated OOM skips. Host scrape now holds .dce-scrape.lock;
smokes bypass via DCE_SKIP_SCRAPE_LOCK. Added lock smoke (20/20 pass).
2026-06-03 06:03:47 -05:00
Copilot 928c0ef682 fix(audit): exclude .dce-temp partial exports from JSON audit
Operator validation failed when yes_general OOM left truncated exports
under .dce-temp. Audit and archive verification now skip in-progress temps;
smoke covers the partial-temp case. KotOR audit passes with temps present.
2026-06-03 05:59:54 -05:00
Copilot 8b54b6a498 test(scrape): preserve-partial smoke; fix host token-file precedence
Add offline regression for OOM skip preserving partial export temps.
Host wrapper now prefers DISCORD_TOKEN_FILE over inherited shell tokens
and always writes explicit compose env for auth-retry. All 19 smokes pass.
2026-06-03 05:52:39 -05:00
Copilot c13c4167be fix(scrape): salvage stale temp exports before re-downloading
When a previous export crashes (OOM, abort, kill), the partially-
downloaded temp export under .dce-temp/ was orphaned. Subsequent
runs started the incremental from the archive's last message ID,
re-downloading everything the failed run had already fetched.

Now scrape_target() checks for orphaned temp exports before each
channel export, salvages truncated JSON (same marker-based repair
as salvage-truncated-export.sh), merges recovered messages into
the archive, and cleans up stale temp dirs. The incremental then
starts from the truly latest message.

Adds salvage-stale smoke test with truncated fixture.
2026-06-03 01:11:28 -05:00
Copilot 87284816d0 test(scrape): add abort exit 134 skip smoke; plan 041 closure
Extend run-discord-scrape-smoke with skip-abort target so OOM/abort
channel skip from plan 040 has offline regression coverage. Update
merge-readiness for 2026-05-30 and KotOR validation retry in progress.
2026-06-03 00:57:11 -05:00
Copilot 71a443267e feat(scrape): run plan, channel ledger, and all-target proof
Log scrape plan/summary with per-file message deltas in the core script.
Host wrappers and operator entrypoints print target lists; operator-proof
defaults to all enabled targets when --target is omitted.
2026-05-29 20:34:22 -05:00
Boden 1e35761dbb test(scrape): lock mixed-length snowflake cursor selection
Add cursor-mixed-length smoke where string max_by would pick the wrong
--after value; padded sort_by in last_message_id already picks the max.
2026-05-29 16:33:00 -05:00
Boden 65c9fb2206 feat(scrape): operator proof script and podman-compose smoke fix
Add run-operator-proof for one-target handoff/scrape/prove flows.
Prefer podman-compose on Podman hosts but honor DCE_DOCKER_BIN overrides
so offline smokes keep using fake compose shims.
2026-05-29 16:20:25 -05:00
Boden 3fd42d924e feat(scrape): sync GUI bridge doc into linux-x64 folder
Add sync-gui-bridge-doc.sh to copy the versioned bridge markdown beside
the GUI zip; includes offline smoke test (18 total via run-all-smokes).
2026-05-29 16:10:17 -05:00
Boden c4371c9f7d docs(scrape): LFG closure — operator handoff in setup guides
Point README and Recurring-Scrape-Setup at operator-handoff.sh, document
disk preflight, and harden operator-handoff smoke against pipefail flakes.
2026-05-29 16:05:42 -05:00
Boden c0818715a8 feat(scrape): add operator-handoff verification script
Single entrypoint runs disk summary, verify-operator-ready, and
run-documents-scrape --dry-run before cron or full scrapes.
2026-05-29 16:03:22 -05:00
Boden 44eadee634 feat(scrape): disk preflight on host runner for cron jobs
run-discord-scrape-host.sh runs verify --disk-only before preflight/scrape
so setup-cron monthly jobs fail fast when archive roots are low on space.
Harden bootstrap smoke to surface failures when dry-run fails.
2026-05-29 16:00:11 -05:00
Boden 32b7f47d45 feat(scrape): disk-only verify flag and documents scrape preflight
run-documents-scrape.sh checks archive free space before live Discord
calls; verify-operator-ready.sh adds --disk-only for lightweight checks.
2026-05-29 15:54:08 -05:00
Boden 1142e376b5 fix(scrape): disk preflight before compose and skippable disk errors
Fail fast when archive or repo paths lack free space (DCE_MIN_FREE_MB),
treat disk-full export failures as skippable channels, and add an offline
disk-space smoke. Smokes default DCE_MIN_FREE_MB=0 so CI stays portable.
2026-05-29 15:27:39 -05:00
Boden 76b4231d7a feat(scrape): per-target validation with continue-on-error
Run scrape and audit per enabled server independently; log summary
counts. Full host validation started via --per-target --continue-on-error.
2026-05-29 14:20:37 -05:00
Boden 1742a9d41e feat(scrape): add run-operator-validation orchestrator
Sync GUI token, verify readiness, run documents scrape, and audit JSON
with timestamped logs. Live eod_discord validation passed on host.
2026-05-29 14:19:04 -05:00
Boden 00bcbc5b21 feat(scrape): add verify-operator-ready host checks
One command validates compose, auth, config, and seeded archives before
bootstrap or cron. Includes offline smoke test (14 smokes total).
2026-05-29 14:16:10 -05:00
Boden 10cd2a534d feat(scrape): offline prove smoke and snapshot-only mode
Add --snapshot-only and --compare-snapshots to prove-incremental-append,
fix EXIT trap status, wire prove smoke into CI, and document LFG closure plan.
2026-05-29 14:12:18 -05:00
Boden a2aeaaab9c fix(scrape): audit and salvage corrupt archive JSON
Add audit-archive-json and salvage-truncated-export helpers, skip invalid
JSON during prove snapshots, and wire an audit smoke test into CI.
2026-05-29 14:09:46 -05:00
Boden 058aaa0960 feat(scrape): add bootstrap CI smoke and operator checklist
Document bootstrap-first workflow; verify dry-run in CI; live
bootstrap preflight validated against Documents archives.
2026-05-29 14:02:06 -05:00
Boden 2b39a721a9 feat(scrape): add bootstrap-recurring-scrape one-shot operator flow
Verify archives, build compose image, and preflight in one script.
Forward scrape-here --help; add scrape-here-smoke to CI.
2026-05-29 13:59:04 -05:00
Boden 57d472f8e8 fix(scrape): auth discovery, skip forbidden channels, mount host script
Discover Discord tokens from env, token files, GUI Settings.dat, and desktop
leveldb; bind-mount the host scrape script so container preflight uses
partition/--after cursors; skip inaccessible channels without aborting targets;
fix set -e and busybox mktemp for incremental exports under ~/Documents.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 14:57:16 -05:00
Boden 88e864c72a feat(scrape): add Documents scrape workflow and token discovery
Auto-discover token files, provide run-documents-scrape and
prove-incremental-append helpers, and document the end-to-end operator flow.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 02:23:34 -05:00
Boden 9e55378c84 feat(scrape): add Documents verify and auth bootstrap helpers
Add scripts to confirm enabled targets map to seeded ~/Documents archives
and to create scrape.env from exported DISCORD_TOKEN credentials.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 01:18:05 -05:00
Boden 8c14dbbf45 fix(scrape): append safely under Documents with flexible auth
Bootstrap channel-map entries from existing archive filenames, reject merges
that would shrink large JSON exports, accept exported DISCORD_TOKEN when
scrape.env is missing, and disable the duplicate OpenKotOR target folder.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 00:58:03 -05:00
Boden 7cab7280c4 feat: add gh PR run approval helper for fork CI unblock
Adds scripts/gh-approve-pr-runs.sh with GITHUB_TOKEN bootstrap, explicit
admin-rights policy classification, smoke coverage, and CI wiring. Marks
the remaining 2026-05-24 recurring scrape plans completed.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 00:30:49 -05:00
Boden df499568d9 fix: harden recurring scrape scripts from review residuals
Use max message ID for incremental exports, validate custom cron
expressions, drop eval from host/preflight paths, restrict reauth to
executable repo scripts, and run smoke tests in CI.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 00:08:22 -05:00
Boden ebc153868f fix(review): apply autofix feedback
Strengthen recurring-scrape smoke tests to exercise real setup-cron
lifecycle, duplicate-config validation, guild resolution failures, and
preflight failure crontab safety. Mark validation plan completed.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 14:24:09 -05:00
Boden a445b60481 fix(review): apply autofix feedback 2026-05-27 13:20:09 -05:00
Boden d66b9dab63 feat(validation): comprehensive recurring scraper validation suite and documentation
IMPLEMENTATION UNITS (U1-U6):

U1: Append-only merge test coverage
- Enhanced run-discord-scrape-smoke.sh with additional test scenarios
- Created append-partial-write.json and append-concurrent-conflict.json fixtures
- Added assertions for message sorting, deduplication, and idempotency
- All 10 merge scenarios validated

U2: Error handling validation
- Created error-path-smoke.sh with 6 error scenario tests
- Added test configs for invalid paths, missing files, bad JSON
- Verified fail-closed behavior on all error paths
- No silent data loss on any failure

U3: Cron idempotency and lifecycle
- Created cron-idempotency-smoke.sh with full lifecycle testing
- Created fixture crontab with unrelated entries (preservation test)
- Verified idempotent install, update, and remove operations
- Confirmed dry-run and entry preservation

U4: Preflight and end-to-end setup
- Created end-to-end-preflight-smoke.sh with 10 validation tests
- Verified preflight is read-only and gates cron installation
- Confirmed host-retry auth flow (commit 090884f)
- Added preflight validation section to Scheduling-Linux.md

U5: Documentation completion
- Updated Readme.md with recurring-scraper link
- Created Recurring-Scrape-Setup.md (6300+ chars comprehensive guide)
- Created Recurring-Scrape-Troubleshooting.md (9200+ chars with 30+ scenarios)
- Enhanced .docs/Scheduling-Linux.md with preflight section
- All documented behavior matches implementation

U6: Production-readiness checklist
- Created docs/recurring-scrape-production-checklist.md
- Compiled all validation results (33+ scenarios across U1-U5)
- Documented test execution commands for re-validation
- Provided deployment notes and monitoring guidance
- Clear sign-off criteria established

ARTIFACTS:
- 4 new smoke test scripts (1000+ lines total)
- 4 new fixtures and test configs
- 3 new documentation files (15500+ chars)
- 2 updated documentation files
- 1 validation checklist tracking document
- All tests passing

SAFETY GUARANTEES VERIFIED:
 No silent data loss on any error path
 Fail-closed behavior throughout
 Archive updates are append-only and idempotent
 Cron installation is idempotent
 Unrelated cron entries preserved
 Preflight is read-only
 Token validated before operations
 Path traversal prevented

STATUS: Production Ready
All 6 implementation units complete and validated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-27 12:57:32 -05:00
Your Name 0c92823061 fix(review): apply autofix feedback
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 22:19:57 -05:00
Your Name 090884fed6 fix(auth): add host retry flow for discord scrape
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 21:07:55 -05:00
Your Name 07151924cf fix(review): apply autofix feedback
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 20:30:37 -05:00
Your Name 43f5fa3b71 Add recurring CLI scrape automation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 17:04:07 -05:00