# Recurring Discord Scrape Automation - Troubleshooting Guide This guide covers common issues and their solutions. ## Setup Issues ### "Required file not found" Error **Symptoms:** Setup fails with "Required file not found: /path/to/config.json" **Solutions:** 1. Verify config file exists: `ls -la config/scrape-targets.json` 2. Check file permissions: `chmod 644 config/scrape-targets.json` 3. Use absolute path in setup command: `./scripts/setup-cron.sh --config $(pwd)/config/scrape-targets.json` --- ### "Invalid JSON config" Error **Symptoms:** Setup fails with "Invalid JSON config: ..." **Solutions:** 1. Validate JSON syntax: `jq empty config/scrape-targets.json` 2. Common mistakes: - Trailing commas in arrays/objects - Unquoted keys - Missing closing braces 3. Use an online JSON validator if needed --- ### "DISCORD_TOKEN must be set" Error **Symptoms:** Preflight or scrape fails with token error **Solutions:** 1. Set token in current session: ```bash export DISCORD_TOKEN="your-token-here" ./scripts/run-discord-scrape.sh preflight ``` 2. Or set in scrape.env and source it: ```bash source scrape.env ./scripts/run-discord-scrape.sh preflight ``` 3. Or use DISCORD_TOKEN_FILE for file-based tokens: ```bash export DISCORD_TOKEN_FILE="/path/to/token/file" chmod 600 /path/to/token/file ``` --- ### "Target output_dir is outside archive_root" Error **Symptoms:** Setup fails with path validation error **Solution:** Update config to ensure output_dir is under archive_root: ```json { "archive_root": "/home/user/discord-archives", "targets": [ { "output_dir": "/home/user/discord-archives/target1" // ✓ Under archive_root } ] } ``` Not this: ```json { "archive_root": "/home/user/discord-archives", "targets": [ { "output_dir": "/tmp/exports" // ✗ Outside archive_root } ] } ``` --- ## Authentication Issues ### "Guild discovery failed" Error **Symptoms:** Preflight or scrape fails with guild discovery message **Causes:** - Using a bot token (cannot enumerate guilds) - Invalid token - Token lacks required permissions **Solutions:** 1. **For bot tokens:** Provide explicit guild and channel IDs: ```json { "name": "my-target", "guild_ids": ["123456789"], "channel_ids": ["111222333"] } ``` 2. **For user tokens:** Ensure the token is valid: - Generate a new token from Discord Developer Portal - Test token validity: `DISCORD_TOKEN=xxx ./scripts/run-discord-scrape.sh list-targets` 3. **Check permissions:** - Bot needs at least "Read Messages/View Channels" and "Read Message History" - User token needs access to the target guilds/channels --- ### "Export ... belongs to channel XXX, expected YYY" Error **Symptoms:** Scrape fails when updating an existing archive **Cause:** Archive's embedded channel ID doesn't match the configured channel **Solutions:** 1. **Update config to match archive:** - Check the existing archive file for the correct channel ID - Update channel_ids in config 2. **Or move the archive:** ```bash mv archive/old-location.json archive/target1/ ``` 3. **Or update the channel mapping manually:** ```bash jq '.["111"] = "path/to/archive.json"' archive/.dce-meta/channel-map.json > tmp.json && mv tmp.json archive/.dce-meta/channel-map.json ``` --- ## Cron Schedule Issues ### Cron Job Not Running **Symptoms:** Cron job installed but exports aren't happening **Diagnostic steps:** 1. Verify cron is installed: ```bash crontab -l | grep discord-scrape ``` 2. Check if cron daemon is running: ```bash sudo systemctl status cron # or on macOS: sudo launchctl list | grep cron ``` 3. Check system logs: ```bash # Linux sudo grep CRON /var/log/syslog # or sudo grep discord-scrape /var/log/cron # macOS log stream --predicate 'eventMessage contains[c] "cron"' ``` 4. Test the script manually: ```bash source scrape.env bash scripts/run-discord-scrape-host.sh scrape ``` --- ### "No such file or directory" in Cron Logs **Symptoms:** Cron log shows script not found even though it exists **Causes:** - Path in crontab uses relative paths - Directory changed since cron was installed - Script permissions changed **Solutions:** 1. Re-install cron with absolute paths: ```bash cd /path/to/DiscordChatExporter ./scripts/setup-cron.sh --config $(pwd)/config/scrape-targets.json ``` 2. Ensure script is executable: ```bash chmod +x scripts/run-discord-scrape-host.sh chmod +x scripts/run-discord-scrape.sh chmod +x scripts/setup-cron.sh ``` --- ### Cron Jobs Running at Wrong Time **Symptoms:** Export runs at unexpected times **Solutions:** 1. Check timezone setting: ```bash date # System time timedatectl # System timezone ``` 2. Verify crontab schedule: ```bash crontab -l ``` 3. Update schedule: ```bash ./scripts/setup-cron.sh --interval "daily" --at "2:00" ``` 4. Validate cron expression at [crontab.guru](https://crontab.guru) --- ## Export Issues ### Exports Complete but Produce Empty Files **Symptoms:** Archive files created but contain minimal/no messages **Solutions:** 1. Verify channels are accessible: ```bash export DISCORD_TOKEN="your-token" ./scripts/run-discord-scrape.sh preflight ``` 2. Check channel permissions: - Ensure token has "Read Message History" - Verify channel is not archived/deleted 3. Manual test export: ```bash ./scripts/run-discord-scrape.sh scrape --target target-name ``` --- ### "Archive is not valid JSON" Error **Symptoms:** Existing archive file becomes corrupted **Solutions:** 1. **Audit all archives for a target:** ```bash ./scripts/audit-archive-json.sh --target target-name ``` 2. **Validate one file:** ```bash jq empty archive-file.json ``` 3. **Truncated export (parse error mid-message):** salvage drops the incomplete tail and keeps earlier messages. A timestamped `.bak.*` backup is created first: ```bash ./scripts/salvage-truncated-export.sh path/to/export.json ``` 4. **If corrupted beyond salvage, restore from backup** (if available) 5. **If no backup, move the archive aside and re-export:** ```bash mv archive-file.json archive-file.json.bak ./scripts/run-discord-scrape.sh scrape --target target-name ``` --- ### Incremental Exports Are Too Slow **Symptoms:** Each scheduled export takes several minutes **Solutions:** 1. **Check API rate limiting:** - Discord limits API calls per user - Too many frequent exports can trigger rate limiting - Increase interval between exports: `--interval "weekly"` 2. **Reduce scope:** - Export only recent messages: configure `after` date in export - Split large channels into separate targets 3. **Check system resources:** - Disk I/O bottleneck: `iostat -x 1` - CPU usage: `top` - Memory: `free -h` --- ### Channel Export SKIPPED (OOM / Aborted / Killed) **Symptoms:** Log shows `SKIPPED` for one channel, `Aborted (core dumped)`, `Killed`, or `out of memory`; other channels in the target may still succeed. **Cause:** Large multi-year catch-up (for example KotOR `yes_general`) builds a big in-memory JSON export inside the container. Partial progress is kept under `output_dir/.dce-temp/` for salvage on the next run. **Solutions:** 1. **Salvage partial temps before re-scraping** (avoids re-downloading from the archive cursor): ```bash ./scripts/scrape-lock-status.sh ./scripts/operator-handoff.sh --salvage-only --target KotOR_discord_msgs --channel 221726893064454144 ``` 2. **Raise container memory** in `scrape.env` if needed (default `0` = no compose cap; `KotOR_discord_msgs` already sets `container_memory: "8g"` for single-target runs): ```bash # scrape.env — optional global override DCE_CONTAINER_MEMORY=8g ``` Then run the one-command catch-up: ```bash ./scripts/run-kotor-yes-general-catchup.sh # Inspect totals: ./scripts/print-scrape-summary.sh logs/kotor-yes-general.summary.json ``` 3. **Ensure only one scrape** holds `{archive_root}/.dce-scrape.lock` (see next section). 4. **Confirm host disk headroom** — merges need temporary space on the archive volume (`df -h ~/Documents`). --- ### Scrape Lock Already Held **Symptoms:** `Scrape lock is held` or `Another scrape is already running` when starting validation or documents scrape. **Cause:** Only one scrape should run per `archive_root`. A long validation, cron job, or a second checkout (for example Downloads vs MyBook) can hold `{archive_root}/.dce-scrape.lock`. **Solutions:** 1. **Inspect lock state:** ```bash ./scripts/scrape-lock-status.sh ``` 2. **Wait** for the active scrape to finish if PID is live. 3. **Reclaim stale lock** after a crash (only when status shows stale/free): ```bash ./scripts/scrape-lock-status.sh --reclaim-stale ``` 4. **Do not delete the lock** while a scrape is still running — twin exports can OOM-loop on the same channel. --- ### Partial Export Stuck in `.dce-temp` **Symptoms:** Large folder under `output_dir/.dce-temp/export..*`; archive cursor not advancing; audit excludes `.dce-temp` (expected). **Solutions:** 1. **Stop any active export** writing that temp (check lock status and running `podman`/`docker` processes). 2. **Salvage quiescent temps** (default skips temps modified in the last ~120s): ```bash ./scripts/run-documents-scrape.sh --salvage-only --target NAME [--channel ID] ``` 3. **Force salvage of an active temp** only after confirming nothing is writing: ```bash DCE_SALVAGE_ACTIVE_TEMPS=1 ./scripts/run-documents-scrape.sh --salvage-only --target NAME --channel ID ``` 4. **Truncated JSON in the archive file itself** (not `.dce-temp`): ```bash ./scripts/salvage-truncated-export.sh path/to/archive.json ``` --- ### "Failed to write archive" or Permission Denied **Symptoms:** Export fails with write permission errors **Solutions:** 1. **Check directory permissions:** ```bash ls -la archive/target-name/ chmod 755 archive/target-name/ chmod 644 archive/target-name/*.json ``` 2. **If using Docker/Podman, set user mode:** ```bash # For rootless podman export DCE_USERNS_MODE=keep-id export DCE_UID=$(id -u) export DCE_GID=$(id -g) ``` 3. **Check SELinux (if enabled):** ```bash getenforce # If "Enforcing", add `:z` to mount options: # docker-compose.yml should already have this ``` --- ## Docker/Container Issues ### "Failed to build image" Error **Symptoms:** Docker build fails during setup **Solutions:** 1. **Verify Docker is running:** ```bash docker ps docker version ``` 2. **Check disk space:** ```bash docker system df ``` 3. **Clean up and retry:** ```bash docker system prune -a docker-compose build --no-cache ``` 4. **If using Podman:** ```bash podman system prune -a podman-compose build --no-cache ``` --- ### "Cannot connect to Docker daemon" Error **Symptoms:** Setup fails to reach Docker **Solutions:** 1. **For Docker:** ```bash sudo systemctl start docker sudo usermod -aG docker $USER newgrp docker ``` 2. **For Podman (rootless):** ```bash systemctl --user start podman systemctl --user enable podman ``` --- ## Authorization / Token Refresh Issues ### Host Retry Auth Flow Not Working **Symptoms:** Export fails with 401/403 errors even with DISCORD_TOKEN_FILE set **Solutions:** 1. **Verify token file is readable:** ```bash cat $DISCORD_TOKEN_FILE ``` 2. **Ensure proper permissions:** ```bash chmod 600 $DISCORD_TOKEN_FILE ``` 3. **Check token is fresh:** - Tokens can expire - Generate a new token from Discord Developer Portal - Update the token file 4. **Verify host wrapper is being used:** ```bash grep run-discord-scrape-host scripts/run-discord-scrape-host.sh ``` --- ## Getting Help If you're still stuck: 1. **Check existing issues:** https://github.com/Tyrrrz/DiscordChatExporter/issues 2. **Run preflight in verbose mode:** ```bash set -x # Enable debug output ./scripts/run-discord-scrape.sh preflight ``` 3. **Check logs:** ```bash # Docker logs docker-compose logs --tail 50 # Cron logs (on Linux) sudo journalctl -u cron --since "1 hour ago" ``` 4. **Collect error details** for reporting issues: - Config (sanitize token) - Full error message - OS/Docker version - Steps to reproduce