mirror of
https://github.com/Tyrrrz/DiscordChatExporter.git
synced 2026-06-09 15:52:37 -06:00
docs(scrape): add OOM, lock, and salvage troubleshooting
Document container OOM skips, scrape-lock contention, partial temp salvage, and DCE_CONTAINER_MEMORY in the troubleshooting guide and GUI bridge quick-start.
This commit is contained in:
parent
69ce1ca539
commit
e9a3fea9d1
|
|
@ -322,6 +322,7 @@ Space requirements:
|
|||
- **Typical channel**: 1-10 MB per year of messages
|
||||
- **Large channels**: 50-100 MB per year
|
||||
- **Full guild**: 500 MB - several GB depending on activity
|
||||
- **Multi-year catch-up in container:** may OOM on first export; set `DCE_CONTAINER_MEMORY=8g` in `scrape.env` and use `--salvage-before-scrape` (see [Troubleshooting](Recurring-Scrape-Troubleshooting.md#channel-export-skipped-oom--aborted--killed))
|
||||
|
||||
## Smoke test validation
|
||||
|
||||
|
|
|
|||
|
|
@ -310,6 +310,89 @@ Not this:
|
|||
|
||||
---
|
||||
|
||||
### Channel Export SKIPPED (OOM / Aborted / Killed)
|
||||
|
||||
**Symptoms:** Log shows `SKIPPED` for one channel, `Aborted (core dumped)`, `Killed`, or `out of memory`; other channels in the target may still succeed.
|
||||
|
||||
**Cause:** Large multi-year catch-up (for example KotOR `yes_general`) builds a big in-memory JSON export inside the container. Partial progress is kept under `output_dir/.dce-temp/` for salvage on the next run.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. **Salvage partial temps before re-scraping** (avoids re-downloading from the archive cursor):
|
||||
```bash
|
||||
./scripts/scrape-lock-status.sh
|
||||
./scripts/operator-handoff.sh --salvage-only --target KotOR_discord_msgs --channel 221726893064454144
|
||||
```
|
||||
|
||||
2. **Raise container memory** in `scrape.env` (default `0` = no compose cap):
|
||||
```bash
|
||||
# scrape.env
|
||||
DCE_CONTAINER_MEMORY=8g
|
||||
```
|
||||
Then run a channel-scoped catch-up:
|
||||
```bash
|
||||
DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \
|
||||
--salvage-before-scrape \
|
||||
--target KotOR_discord_msgs \
|
||||
--channel 221726893064454144 \
|
||||
--log-file logs/kotor-yes-general.log
|
||||
```
|
||||
|
||||
3. **Ensure only one scrape** holds `{archive_root}/.dce-scrape.lock` (see next section).
|
||||
|
||||
4. **Confirm host disk headroom** — merges need temporary space on the archive volume (`df -h ~/Documents`).
|
||||
|
||||
---
|
||||
|
||||
### Scrape Lock Already Held
|
||||
|
||||
**Symptoms:** `Scrape lock is held` or `Another scrape is already running` when starting validation or documents scrape.
|
||||
|
||||
**Cause:** Only one scrape should run per `archive_root`. A long validation, cron job, or a second checkout (for example Downloads vs MyBook) can hold `{archive_root}/.dce-scrape.lock`.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. **Inspect lock state:**
|
||||
```bash
|
||||
./scripts/scrape-lock-status.sh
|
||||
```
|
||||
|
||||
2. **Wait** for the active scrape to finish if PID is live.
|
||||
|
||||
3. **Reclaim stale lock** after a crash (only when status shows stale/free):
|
||||
```bash
|
||||
./scripts/scrape-lock-status.sh --reclaim-stale
|
||||
```
|
||||
|
||||
4. **Do not delete the lock** while a scrape is still running — twin exports can OOM-loop on the same channel.
|
||||
|
||||
---
|
||||
|
||||
### Partial Export Stuck in `.dce-temp`
|
||||
|
||||
**Symptoms:** Large folder under `output_dir/.dce-temp/export.<channel_id>.*`; archive cursor not advancing; audit excludes `.dce-temp` (expected).
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. **Stop any active export** writing that temp (check lock status and running `podman`/`docker` processes).
|
||||
|
||||
2. **Salvage quiescent temps** (default skips temps modified in the last ~120s):
|
||||
```bash
|
||||
./scripts/run-documents-scrape.sh --salvage-only --target NAME [--channel ID]
|
||||
```
|
||||
|
||||
3. **Force salvage of an active temp** only after confirming nothing is writing:
|
||||
```bash
|
||||
DCE_SALVAGE_ACTIVE_TEMPS=1 ./scripts/run-documents-scrape.sh --salvage-only --target NAME --channel ID
|
||||
```
|
||||
|
||||
4. **Truncated JSON in the archive file itself** (not `.dce-temp`):
|
||||
```bash
|
||||
./scripts/salvage-truncated-export.sh path/to/archive.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### "Failed to write archive" or Permission Denied
|
||||
|
||||
**Symptoms:** Export fails with write permission errors
|
||||
|
|
|
|||
|
|
@ -44,6 +44,7 @@ After stopping a long run, merge quiescent partial exports before re-downloading
|
|||
./scripts/operator-handoff.sh --salvage-only --target KotOR_discord_msgs --channel 221726893064454144
|
||||
|
||||
# Salvage then incremental catch-up (with audit + log)
|
||||
# For large yes_general catch-up, set DCE_CONTAINER_MEMORY=8g in scrape.env first.
|
||||
DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \
|
||||
--salvage-before-scrape \
|
||||
--target KotOR_discord_msgs \
|
||||
|
|
|
|||
|
|
@ -0,0 +1,56 @@
|
|||
---
|
||||
title: "docs: OOM, scrape lock, and salvage troubleshooting"
|
||||
type: docs
|
||||
status: complete
|
||||
date: 2026-06-04
|
||||
origin: /lfg — plan 063 added DCE_CONTAINER_MEMORY; operator checklist and GUI bridge cover salvage/lock but Recurring-Scrape-Troubleshooting.md still lacks these runbook sections
|
||||
---
|
||||
|
||||
# docs: OOM, scrape lock, and salvage troubleshooting
|
||||
|
||||
## Summary
|
||||
|
||||
Extend operator-facing docs so OOM skips, scrape-lock contention, partial `.dce-temp` salvage, and `DCE_CONTAINER_MEMORY` are documented in the troubleshooting guide and GUI bridge quick-start.
|
||||
|
||||
## Requirements
|
||||
|
||||
| ID | Requirement |
|
||||
|----|-------------|
|
||||
| R1 | `.docs/Recurring-Scrape-Troubleshooting.md` documents OOM/skipped channels and `DCE_CONTAINER_MEMORY` |
|
||||
| R2 | Same file documents scrape lock held, twin runs, and `--reclaim-stale` |
|
||||
| R3 | Same file documents partial `.dce-temp`, `--salvage-only`, and `--salvage-before-scrape` |
|
||||
| R4 | `docs/gui-zip-recurring-scrape-bridge.md` mentions `DCE_CONTAINER_MEMORY=8g` for yes_general catch-up |
|
||||
| R5 | `.docs/Recurring-Scrape-Setup.md` links or notes container memory for large channels |
|
||||
| R6 | `sync-gui-bridge-doc-smoke.sh` still passes; `run-all-smokes.sh` → 21/21 |
|
||||
|
||||
## Implementation Units
|
||||
|
||||
### U1. Troubleshooting runbook sections
|
||||
|
||||
**Files:** `.docs/Recurring-Scrape-Troubleshooting.md`
|
||||
|
||||
Add under Export Issues (or new Runtime section):
|
||||
- Channel SKIPPED / OOM / Aborted
|
||||
- Scrape lock already held
|
||||
- Stale partial exports under `.dce-temp`
|
||||
|
||||
### U2. Setup and GUI bridge cross-links
|
||||
|
||||
**Files:** `.docs/Recurring-Scrape-Setup.md`, `docs/gui-zip-recurring-scrape-bridge.md`
|
||||
|
||||
- Setup: disk/memory note + pointer to troubleshooting
|
||||
- GUI bridge: `DCE_CONTAINER_MEMORY` in yes_general salvage block
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
./scripts/tests/sync-gui-bridge-doc-smoke.sh
|
||||
DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh
|
||||
```
|
||||
|
||||
## Scope Boundaries
|
||||
|
||||
### Deferred
|
||||
|
||||
- Live KotOR catch-up on host
|
||||
- Per-target memory in `scrape-targets.json`
|
||||
|
|
@ -154,6 +154,8 @@ DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \
|
|||
|
||||
**Plan 063 (2026-06-04):** Optional `DCE_CONTAINER_MEMORY` compose `mem_limit` for large channel catch-up (default 0 = unlimited).
|
||||
|
||||
**Plan 064 (2026-06-04):** OOM, scrape-lock, and partial-temp salvage runbooks in `.docs/Recurring-Scrape-Troubleshooting.md`; GUI bridge notes `DCE_CONTAINER_MEMORY` for yes_general.
|
||||
|
||||
**Disk:** ~65 GiB free on `/home` (2026-05-30); large channel merges still need headroom.
|
||||
|
||||
## CI note (fork PRs)
|
||||
|
|
|
|||
Loading…
Reference in a new issue