DiscordChatExporter/docs/plans/2026-05-24-002-fix-live-path-update-verification-plan.md
Boden 7cab7280c4 feat: add gh PR run approval helper for fork CI unblock
Adds scripts/gh-approve-pr-runs.sh with GITHUB_TOKEN bootstrap, explicit
admin-rights policy classification, smoke coverage, and CI wiring. Marks
the remaining 2026-05-24 recurring scrape plans completed.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 00:30:49 -05:00

14 KiB

title: fix: Verify live archive path updates type: fix status: completed date: 2026-05-24

fix: Verify live archive path updates

Summary

Verify that the recurring scrape wrapper updates the user's existing ~/Documents archives in place, then tighten destination-path fallback behavior so new unmapped channels default to stable, human-readable archive names instead of channels/<channel-id>.json.

This plan keeps the append-only merge contract and custom output_dir roots intact while closing the gap between current archive layout expectations and the wrapper's fallback naming behavior.


Problem Frame

The recurring wrapper already preserves existing JSON exports when it can map a channel ID back to the right file, but the current fallback path for an unmapped channel is output_dir/channels/<channel-id>.json. The user's real archive set under ~/Documents uses custom per-target roots with human-readable filenames, so this path fallback must be verified against live data and corrected so first-time channel exports do not drift into a different directory structure.


Assumptions

This plan was authored without synchronous user confirmation. The items below are agent inferences that fill gaps in the input — un-validated bets that should be reviewed before implementation proceeds.

  • The existing archive files already present under ~/Documents/<target> are the source of truth for where future updates should land for those channels.
  • When a channel has no prior archive and no stored mapping, the preferred default is an intuitive human-readable filename that still includes the channel ID for stable remapping.
  • Live verification should use the current checked-in custom target config rather than introducing a second temporary config shape.

Requirements

  • R1. Running the recurring scraper against the current config/scrape-targets.json must continue to target the user's custom ~/Documents/<target> roots and must not redirect updates into unrelated directories.
  • R2. Existing archives already present under those custom roots must be updated in place through the append-only merge path rather than overwritten from scratch or duplicated into a new fallback path.
  • R3. When a channel has no prior archive or stored mapping, the wrapper's default destination naming must be stable and human-readable, aligned with the CLI's guild/category/channel naming conventions instead of channels/<channel-id>.json.
  • R4. Destination-path resolution must remain fail-closed: ambiguous matches, invalid JSON, wrong-channel archives, or paths outside the configured target root must abort without mutating the existing archive.
  • R5. The behavior must be covered by fixture-based smoke tests and documented so operators understand both custom output roots and default naming for newly discovered channels.

Scope Boundaries

  • No change to the top-level custom target roots already configured in config/scrape-targets.json.
  • No change to the core C# exporter append semantics; archive preservation remains wrapper-layer behavior.
  • No attempt to make inaccessible Discord targets succeed with the current token; auth/access blockers remain external runtime constraints.

Context & Research

Relevant Code and Patterns

  • scripts/run-discord-scrape.sh already enforces append-only updates via --after, merge_exports, channel identity checks, and target-local temp files.
  • scripts/run-discord-scrape.sh persists channel-to-path mappings in output_dir/.dce-meta/channel-map.json and currently falls back to output_dir/channels/<channel-id>.json for unmapped channels.
  • scripts/tests/run-discord-scrape-smoke.sh is the established shell-level safety test for append, dedupe, and wrong-channel no-clobber behavior.
  • config/scrape-targets.json is the checked-in custom-root contract for the user's ~/Documents archive tree.
  • DiscordChatExporter.Core/Exporting/ExportRequest.cs contains the upstream CLI's human-readable default output naming logic and is the best reference for new wrapper fallback naming.
  • .docs/Docker.md and .docs/Scheduling-Linux.md are the operator-facing docs for the recurring wrapper.

Institutional Learnings

  • No docs/solutions/ directory exists in this repo.
  • Existing plan and wrapper behavior intentionally keep archive safety in the shell layer and treat fail-closed preflight and path validation as load-bearing safety guarantees.

External References

  • None used. Repo-local patterns are strong enough for this fix.

Key Technical Decisions

  • Treat the current custom output_dir values as authoritative: updates must remain under the configured ~/Documents/<target> roots; fixes should improve filename resolution inside those roots rather than invent a new directory layout.
  • Reuse human-readable archive names for first-write defaults: new unmapped channels should adopt a stable guild/category/channel-based filename that still embeds the channel ID, matching the repo's existing archive style and the upstream CLI's naming conventions.
  • Preserve channel-to-path mapping as the long-term source of stability: once a channel is resolved to a destination file, future runs should continue updating that same file regardless of later naming changes elsewhere.
  • Prove path behavior with both fixture coverage and a real runtime pass: shell smoke tests should lock the path-resolution contract, and implementation should also run the wrapper against the real ~/Documents config to confirm in-place updates or to surface external blockers without writing to alternate paths.

Open Questions

Resolved During Planning

  • Should the custom roots be changed? No. The existing per-target ~/Documents/<target> directories remain the contract.
  • What should replace the channels/<channel-id>.json fallback? A human-readable default filename derived from guild/category/channel naming, with the channel ID preserved for stable remapping.
  • What is the success condition for live verification? The run must either update the existing archive file in place or fail before creating a parallel destination path outside the expected custom root layout.

Deferred to Implementation

  • How much live verification can succeed with the current token? The implementer must determine this at runtime; if access is blocked, the verification outcome should still prove that no alternate path was created.
  • Which exact naming helper shape is least duplicative? Implementation should decide whether to shell out to CLI naming-friendly metadata already present in exports or mirror the upstream naming rules directly in the wrapper after inspecting the simplest safe reuse path.

Implementation Units

U1. Fix destination-path fallback and archive seeding

Goal: Ensure the wrapper resolves channel destinations to the existing custom archive files when present and uses intuitive human-readable defaults for first-time channels.

Requirements: R1, R2, R3, R4

Dependencies: None

Files:

  • Modify: scripts/run-discord-scrape.sh
  • Modify: config/scrape-targets.json
  • Test: scripts/tests/run-discord-scrape-smoke.sh

Approach:

  • Review and tighten resolve_destination_path() so it first prefers persisted channel mappings, then existing archive files under the configured output_dir, and only then falls back to a new default path.
  • Replace the current output_dir/channels/<channel-id>.json fallback with a stable human-readable filename that matches the archive naming style already present under ~/Documents and the upstream CLI's default naming semantics.
  • Preserve the rule that every resolved destination stays inside the configured target root and that ambiguous matches hard-fail instead of guessing.

Patterns to follow:

  • scripts/run-discord-scrape.sh
  • DiscordChatExporter.Core/Exporting/ExportRequest.cs
  • config/scrape-targets.json

Test scenarios:

  • Happy path: an existing archive with a human-readable filename containing [channel-id] is discovered, mapped, and reused for the update.
  • Happy path: an unmapped first-time channel resolves to a human-readable filename under the target root and records that mapping for later runs.
  • Edge case: a target root containing multiple matching files for the same channel ID fails closed and does not guess.
  • Error path: a mapped path outside the configured target root is rejected before export.
  • Integration: a rerun after the new mapping is written updates the exact same file path rather than creating a second archive path.

Verification:

  • Unmapped channels no longer default to channels/<channel-id>.json.
  • Existing archives under the configured custom roots continue to resolve back to their current file paths.

U2. Expand append-only and path-safety smoke coverage

Goal: Add fixture coverage that proves path resolution and append-only updates do not create parallel archives or overwrite unrelated files.

Requirements: R2, R3, R4, R5

Dependencies: U1

Files:

  • Modify: scripts/tests/run-discord-scrape-smoke.sh
  • Create: scripts/tests/test-fixtures/path-existing.json
  • Create: scripts/tests/test-fixtures/path-incremental.json

Approach:

  • Extend the existing smoke script with a case where a preexisting human-readable archive under a target root has no stored map yet and must still be updated in place.
  • Add coverage for the first-run fallback path so the test can assert both the filename pattern and that the file lands directly under the configured target root.
  • Keep the current wrong-channel and invalid-json hard-fail expectations as the guardrail against archive corruption.

Execution note: Start by adding/adjusting fixture coverage before changing the fallback logic so the path regression is pinned by tests.

Patterns to follow:

  • scripts/tests/run-discord-scrape-smoke.sh
  • scripts/tests/test-fixtures/append-existing.json
  • scripts/tests/test-fixtures/append-incremental.json
  • scripts/tests/test-fixtures/wrong-channel.json

Test scenarios:

  • Happy path: an existing human-readable archive with no prior channel-map.json entry is updated in place and retains prior messages.
  • Happy path: a first-time export creates one human-readable file directly under the target root and writes a matching channel-map entry.
  • Edge case: an incremental export with zero new messages leaves the existing human-readable archive untouched and does not create a second file.
  • Error path: wrong-channel incremental data fails without replacing the existing archive or writing a new fallback file.
  • Integration: two consecutive runs against the same channel keep the same destination path and merge by message ID.

Verification:

  • Smoke coverage fails on the old channels/<channel-id>.json fallback and passes with the new default naming behavior.
  • Existing append-only protections still pass after the path-resolution changes.

U3. Run real-config verification and document the contract

Goal: Validate the fixed wrapper against the user's checked-in ~/Documents targets and document how custom roots and default naming interact.

Requirements: R1, R2, R5

Dependencies: U1, U2

Files:

  • Modify: .docs/Docker.md
  • Modify: .docs/Scheduling-Linux.md
  • Modify: STRATEGY.md

Approach:

  • Run the wrapper with the real checked-in target config and current token/runtime setup to confirm that accessible channels update their existing archive paths in place and that blocked targets fail without creating alternate destination paths.
  • Capture the operator contract in docs: custom output_dir roots remain authoritative, existing archives are reused, and first-time channels adopt human-readable defaults inside the target root.
  • Refresh strategy/docs language only where needed so the runtime promise matches the implemented destination behavior.

Patterns to follow:

  • .docs/Docker.md
  • .docs/Scheduling-Linux.md
  • STRATEGY.md

Test scenarios:

  • Test expectation: none -- this unit is runtime verification and documentation work backed by the fixture coverage in U2.

Verification:

  • A real wrapper run against the configured ~/Documents targets either updates an existing archive path in place or fails before creating a conflicting path.
  • Operator docs explain both custom-root behavior and the default naming used for new channels within those roots.

System-Wide Impact

  • Interaction graph: destination-path changes affect archive seeding, channel-map persistence, append-only merge flow, Docker runtime expectations, and cron-driven recurring runs.
  • Error propagation: path ambiguity or invalid archive state must still abort the affected channel/target before any destination replacement.
  • State lifecycle risks: incorrect fallback naming could silently split one logical channel across two files; the plan explicitly tests and prevents that.
  • API surface parity: the user-facing contract spans the shell wrapper, checked-in target config, and operator docs, so all three need to stay aligned.
  • Integration coverage: fixture tests cover deterministic path and merge semantics; runtime verification covers interaction with the real ~/Documents archive tree and current token access.

Risks & Dependencies

Risk Mitigation
Live token access is still blocked for some targets Treat runtime verification as success only when it proves no alternate path was created; document blocked targets explicitly rather than guessing
New filename generation diverges from existing archive style Follow the upstream CLI naming reference and add smoke coverage for first-write defaults
Path-fix logic accidentally stops recognizing existing archives Extend seeded-archive tests before changing fallback behavior and keep channel-map persistence authoritative

Documentation / Operational Notes

  • Keep the docs explicit that custom output_dir targets remain authoritative and that new default naming only applies within those roots when no prior archive path exists.
  • Runtime verification should be performed through the source-built wrapper path, not the downloaded binary bundle, so the docs and behavior stay aligned.

Sources & References

  • Strategy: STRATEGY.md
  • Prior plan: docs/plans/2026-05-24-001-feat-recurring-cli-scrape-automation-plan.md
  • Related code: scripts/run-discord-scrape.sh
  • Related tests: scripts/tests/run-discord-scrape-smoke.sh
  • Related config: config/scrape-targets.json
  • Naming reference: DiscordChatExporter.Core/Exporting/ExportRequest.cs
  • Operator docs: .docs/Docker.md, .docs/Scheduling-Linux.md