DiscordChatExporter/docs/plans/2026-06-04-053-fix-archive-root-scrape-lock-plan.md
Copilot 2c01b3a7b9 fix(scrape): flock on archive_root with lock holder diagnostics
Serialize scrapes across repo checkouts that share the same Documents
archive_root, write lock meta with pid/cmd, and reclaim when the holder
process is dead.
2026-06-03 06:39:46 -05:00

3.4 KiB
Raw Blame History

title type status date origin
fix: Archive-root scrape lock with holder diagnostics fix complete 2026-06-04 /lfg — stale KotOR validation runs from MyBook checkout while Downloads checkout has a separate `.dce-scrape.lock`; both can touch the same `~/Documents` archives

fix: Archive-root scrape lock with holder diagnostics

Summary

Move the host scrape flock from per-repo .dce-scrape.lock to {archive_root}/.dce-scrape.lock (from scrape config), write a sidecar .meta file with holder PID/command, and improve the lock-held error so operators know what is blocking a run.

Problem Frame

Plan 046 added scrape serialization via flock, but the lock path defaults to $REPO_ROOT/.dce-scrape.lock. The operator has two checkouts (~/Downloads/DiscordChatExporter and /run/media/.../MyBook/...) sharing the same archive_root in config/scrape-targets.json. A long-running validation from the MyBook path does not block a new scrape from the Downloads path, risking twin exports and OOM loops on yes_general.

Requirements

ID Requirement
R1 Default lock file is {archive_root}/.dce-scrape.lock resolved from the host config used for the scrape
R2 DCE_SCRAPE_LOCK_FILE override continues to work unchanged
R3 On acquire, write {lock}.meta with pid, UTC started timestamp, and command summary
R4 On lock failure, error cites meta (pid, started, cmd) when present
R5 If meta pid is not running, reclaim lock automatically with a warning log
R6 Release removes .meta alongside releasing flock
R7 Lock smoke covers archive-root path; run-all-smokes.sh passes

Key Technical Decisions

  • Lock at archive_root: Matches the shared resource being mutated (Documents archives), not the git checkout path.
  • Fallback: If config lacks archive_root, keep $REPO_ROOT/.dce-scrape.lock fallback for tests/minimal configs.
  • Reclaim only when pid dead: Do not force-break a live holder; kernel releases flock when the holder exits.

Implementation Units

U1. Resolve lock path and meta lifecycle

Goal: Host runner acquires archive-root lock with meta sidecar.

Requirements: R1R6

Files:

  • scripts/run-discord-scrape-host.sh

Approach: Add resolve_scrape_lock_file, write_scrape_lock_meta, format_scrape_lock_holder, try_reclaim_stale_scrape_lock; pass host_config into acquire_scrape_lock; call from scrape branch after config resolution.

Test scenarios:

  • Config with archive_root=/tmp/x uses /tmp/x/.dce-scrape.lock when override unset.
  • DCE_SCRAPE_LOCK_FILE still wins over archive_root.
  • Dead pid in meta allows second acquire after reclaim.

Verification: Lock smoke passes.

U2. Extend lock smoke

Goal: Regression for archive-root default and informative lock-held message.

Requirements: R7

Files:

  • scripts/tests/run-discord-scrape-host-lock-smoke.sh

Test scenarios:

  • Two processes: first holds flock on {archive_root}/.dce-scrape.lock, second fails with holder hint.
  • After killing holder, second scrape succeeds.

Verification: DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh

Scope Boundaries

In scope

  • Lock path, meta, reclaim, smoke

Deferred to Follow-Up Work

  • Killing the stale MyBook validation process on the host
  • Live yes_general channel catch-up inside LFG
  • Container memory limits