DiscordChatExporter/docs/plans/2026-06-04-053-fix-archive-root-scrape-lock-plan.md
Copilot 2c01b3a7b9 fix(scrape): flock on archive_root with lock holder diagnostics
Serialize scrapes across repo checkouts that share the same Documents
archive_root, write lock meta with pid/cmd, and reclaim when the holder
process is dead.
2026-06-03 06:39:46 -05:00

83 lines
3.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "fix: Archive-root scrape lock with holder diagnostics"
type: fix
status: complete
date: 2026-06-04
origin: /lfg — stale KotOR validation runs from MyBook checkout while Downloads checkout has a separate `.dce-scrape.lock`; both can touch the same `~/Documents` archives
---
# fix: Archive-root scrape lock with holder diagnostics
## Summary
Move the host scrape flock from per-repo `.dce-scrape.lock` to `{archive_root}/.dce-scrape.lock` (from scrape config), write a sidecar `.meta` file with holder PID/command, and improve the lock-held error so operators know what is blocking a run.
## Problem Frame
Plan 046 added scrape serialization via `flock`, but the lock path defaults to `$REPO_ROOT/.dce-scrape.lock`. The operator has two checkouts (`~/Downloads/DiscordChatExporter` and `/run/media/.../MyBook/...`) sharing the same `archive_root` in `config/scrape-targets.json`. A long-running validation from the MyBook path does not block a new scrape from the Downloads path, risking twin exports and OOM loops on `yes_general`.
## Requirements
| ID | Requirement |
|----|-------------|
| R1 | Default lock file is `{archive_root}/.dce-scrape.lock` resolved from the host config used for the scrape |
| R2 | `DCE_SCRAPE_LOCK_FILE` override continues to work unchanged |
| R3 | On acquire, write `{lock}.meta` with pid, UTC started timestamp, and command summary |
| R4 | On lock failure, error cites meta (pid, started, cmd) when present |
| R5 | If meta pid is not running, reclaim lock automatically with a warning log |
| R6 | Release removes `.meta` alongside releasing flock |
| R7 | Lock smoke covers archive-root path; `run-all-smokes.sh` passes |
## Key Technical Decisions
- **Lock at archive_root**: Matches the shared resource being mutated (Documents archives), not the git checkout path.
- **Fallback**: If config lacks `archive_root`, keep `$REPO_ROOT/.dce-scrape.lock` fallback for tests/minimal configs.
- **Reclaim only when pid dead**: Do not force-break a live holder; kernel releases flock when the holder exits.
## Implementation Units
### U1. Resolve lock path and meta lifecycle
**Goal:** Host runner acquires archive-root lock with meta sidecar.
**Requirements:** R1R6
**Files:**
- `scripts/run-discord-scrape-host.sh`
**Approach:** Add `resolve_scrape_lock_file`, `write_scrape_lock_meta`, `format_scrape_lock_holder`, `try_reclaim_stale_scrape_lock`; pass `host_config` into `acquire_scrape_lock`; call from scrape branch after config resolution.
**Test scenarios:**
- Config with `archive_root=/tmp/x` uses `/tmp/x/.dce-scrape.lock` when override unset.
- `DCE_SCRAPE_LOCK_FILE` still wins over archive_root.
- Dead pid in meta allows second acquire after reclaim.
**Verification:** Lock smoke passes.
### U2. Extend lock smoke
**Goal:** Regression for archive-root default and informative lock-held message.
**Requirements:** R7
**Files:**
- `scripts/tests/run-discord-scrape-host-lock-smoke.sh`
**Test scenarios:**
- Two processes: first holds flock on `{archive_root}/.dce-scrape.lock`, second fails with holder hint.
- After killing holder, second scrape succeeds.
**Verification:** `DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh`
## Scope Boundaries
### In scope
- Lock path, meta, reclaim, smoke
### Deferred to Follow-Up Work
- Killing the stale MyBook validation process on the host
- Live yes_general channel catch-up inside LFG
- Container memory limits