DiscordChatExporter/docs/plans/2026-06-04-046-fix-scrape-run-lock-plan.md
Copilot b9bb4bbe64 fix(host): flock scrape lock prevents concurrent container exports
Overlapping run-operator-validation invocations spawned twin yes_general
exports and repeated OOM skips. Host scrape now holds .dce-scrape.lock;
smokes bypass via DCE_SKIP_SCRAPE_LOCK. Added lock smoke (20/20 pass).
2026-06-03 06:03:47 -05:00

38 lines
1.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "fix: Scrape run lock prevents concurrent container exports"
type: fix
status: complete
date: 2026-06-04
origin: /lfg — duplicate KotOR validation runs left two yes_general exports OOM-looping
---
# fix: Scrape run lock prevents concurrent container exports
## Problem
Two overlapping `run-operator-validation.sh --target KotOR_discord_msgs` processes each started a full container scrape. Both exported `yes_general` (`221726893064454144`) with the same `--after` cursor, creating twin `.dce-temp/export.*` dirs (~2934 MiB each) and repeated OOM skips.
Cron uses `flock`, but manual/host validation does not — overlapping runs are unguarded.
## Requirements
| ID | Requirement |
|----|-------------|
| R1 | `run-discord-scrape-host.sh scrape` acquires non-blocking `flock` on `$REPO_ROOT/.dce-scrape.lock` |
| R2 | `DCE_SKIP_SCRAPE_LOCK=1` bypasses lock (smoke tests) |
| R3 | Clear error when lock held; preflight unaffected |
| R4 | Offline smoke asserts second scrape fails while lock held |
| R5 | `run-all-smokes.sh` passes (19/19); docs note concurrent-run hazard |
## Verification
```bash
./scripts/tests/run-discord-scrape-host-lock-smoke.sh
DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh
```
## Out of scope
- Completing yes_general multi-hour catch-up inside LFG
- Container memory limits / tuning