mirror of https://github.com/Tyrrrz/DiscordChatExporter.git synced 2026-06-10 00:02:37 -06:00

Copilot 27e133f692 docs(scrape): sync KotOR wrapper across GUI bridge docs (plan 084)

GUI bridge and troubleshooting lead with run-kotor-yes-general-catchup.sh;
merge-readiness HEAD updated; bridge sync smoke asserts wrapper and 24/24 gate.

2026-06-03 12:02:03 -05:00

12 KiB

Raw Blame History

Recurring Discord Scrape Automation - Troubleshooting Guide

This guide covers common issues and their solutions.

Setup Issues

"Required file not found" Error

Symptoms: Setup fails with "Required file not found: /path/to/config.json"

Solutions:

Verify config file exists: ls -la config/scrape-targets.json
Check file permissions: chmod 644 config/scrape-targets.json
Use absolute path in setup command: ./scripts/setup-cron.sh --config $(pwd)/config/scrape-targets.json

"Invalid JSON config" Error

Symptoms: Setup fails with "Invalid JSON config: ..."

Solutions:

Validate JSON syntax: jq empty config/scrape-targets.json
Common mistakes:
- Trailing commas in arrays/objects
- Unquoted keys
- Missing closing braces
Use an online JSON validator if needed

"DISCORD_TOKEN must be set" Error

Symptoms: Preflight or scrape fails with token error

Solutions:

Set token in current session:

export DISCORD_TOKEN="your-token-here"
./scripts/run-discord-scrape.sh preflight

Or set in scrape.env and source it:

source scrape.env
./scripts/run-discord-scrape.sh preflight

Or use DISCORD_TOKEN_FILE for file-based tokens:

export DISCORD_TOKEN_FILE="/path/to/token/file"
chmod 600 /path/to/token/file

"Target output_dir is outside archive_root" Error

Symptoms: Setup fails with path validation error

Solution: Update config to ensure output_dir is under archive_root:

{
  "archive_root": "/home/user/discord-archives",
  "targets": [
    {
      "output_dir": "/home/user/discord-archives/target1"  // ✓ Under archive_root
    }
  ]
}

Not this:

{
  "archive_root": "/home/user/discord-archives",
  "targets": [
    {
      "output_dir": "/tmp/exports"  // ✗ Outside archive_root
    }
  ]
}

Authentication Issues

"Guild discovery failed" Error

Symptoms: Preflight or scrape fails with guild discovery message

Causes:

Using a bot token (cannot enumerate guilds)
Invalid token
Token lacks required permissions

Solutions:

For bot tokens: Provide explicit guild and channel IDs:

{
  "name": "my-target",
  "guild_ids": ["123456789"],
  "channel_ids": ["111222333"]
}

For user tokens: Ensure the token is valid:
- Generate a new token from Discord Developer Portal
- Test token validity: DISCORD_TOKEN=xxx ./scripts/run-discord-scrape.sh list-targets
Check permissions:
- Bot needs at least "Read Messages/View Channels" and "Read Message History"
- User token needs access to the target guilds/channels

"Export ... belongs to channel XXX, expected YYY" Error

Symptoms: Scrape fails when updating an existing archive

Cause: Archive's embedded channel ID doesn't match the configured channel

Solutions:

Update config to match archive:
- Check the existing archive file for the correct channel ID
- Update channel_ids in config

Or move the archive:

mv archive/old-location.json archive/target1/

Or update the channel mapping manually:

jq '.["111"] = "path/to/archive.json"' archive/.dce-meta/channel-map.json > tmp.json && mv tmp.json archive/.dce-meta/channel-map.json

Cron Schedule Issues

Cron Job Not Running

Symptoms: Cron job installed but exports aren't happening

Diagnostic steps:

Verify cron is installed:
```
crontab -l | grep discord-scrape
```

Check if cron daemon is running:

sudo systemctl status cron
# or on macOS:
sudo launchctl list | grep cron

Check system logs:

# Linux
sudo grep CRON /var/log/syslog
# or
sudo grep discord-scrape /var/log/cron

# macOS
log stream --predicate 'eventMessage contains[c] "cron"'

Test the script manually:

source scrape.env
bash scripts/run-discord-scrape-host.sh scrape

"No such file or directory" in Cron Logs

Symptoms: Cron log shows script not found even though it exists

Causes:

Path in crontab uses relative paths
Directory changed since cron was installed
Script permissions changed

Solutions:

Re-install cron with absolute paths:

cd /path/to/DiscordChatExporter
./scripts/setup-cron.sh --config $(pwd)/config/scrape-targets.json

Ensure script is executable:

chmod +x scripts/run-discord-scrape-host.sh
chmod +x scripts/run-discord-scrape.sh
chmod +x scripts/setup-cron.sh

Cron Jobs Running at Wrong Time

Symptoms: Export runs at unexpected times

Solutions:

Check timezone setting:

date  # System time
timedatectl  # System timezone

Verify crontab schedule:
```
crontab -l
```

Update schedule:

./scripts/setup-cron.sh --interval "daily" --at "2:00"

Validate cron expression at crontab.guru

Export Issues

Exports Complete but Produce Empty Files

Symptoms: Archive files created but contain minimal/no messages

Solutions:

Verify channels are accessible:

export DISCORD_TOKEN="your-token"
./scripts/run-discord-scrape.sh preflight

Check channel permissions:
- Ensure token has "Read Message History"
- Verify channel is not archived/deleted

Manual test export:

./scripts/run-discord-scrape.sh scrape --target target-name

"Archive is not valid JSON" Error

Symptoms: Existing archive file becomes corrupted

Solutions:

Audit all archives for a target:

./scripts/audit-archive-json.sh --target target-name

Validate one file:
```
jq empty archive-file.json
```
Truncated export (parse error mid-message): salvage drops the incomplete tail and keeps earlier messages. A timestamped .bak.* backup is created first:
```
./scripts/salvage-truncated-export.sh path/to/export.json
```
If corrupted beyond salvage, restore from backup (if available)

If no backup, move the archive aside and re-export:

mv archive-file.json archive-file.json.bak
./scripts/run-discord-scrape.sh scrape --target target-name

Incremental Exports Are Too Slow

Symptoms: Each scheduled export takes several minutes

Solutions:

Check API rate limiting:
- Discord limits API calls per user
- Too many frequent exports can trigger rate limiting
- Increase interval between exports: --interval "weekly"
Reduce scope:
- Export only recent messages: configure after date in export
- Split large channels into separate targets
Check system resources:
- Disk I/O bottleneck: iostat -x 1
- CPU usage: top
- Memory: free -h

Channel Export SKIPPED (OOM / Aborted / Killed)

Symptoms: Log shows SKIPPED for one channel, Aborted (core dumped), Killed, or out of memory; other channels in the target may still succeed.

Cause: Large multi-year catch-up (for example KotOR yes_general) builds a big in-memory JSON export inside the container. Partial progress is kept under output_dir/.dce-temp/ for salvage on the next run.

Solutions:

Salvage partial temps before re-scraping (avoids re-downloading from the archive cursor):

./scripts/scrape-lock-status.sh
./scripts/operator-handoff.sh --salvage-only --target KotOR_discord_msgs --channel 221726893064454144

Raise container memory in scrape.env if needed (default 0 = no compose cap; KotOR_discord_msgs already sets container_memory: "8g" for single-target runs):

# scrape.env — optional global override
DCE_CONTAINER_MEMORY=8g

Then run the one-command catch-up:

./scripts/run-kotor-yes-general-catchup.sh
# Inspect totals: ./scripts/print-scrape-summary.sh logs/kotor-yes-general.summary.json

Ensure only one scrape holds {archive_root}/.dce-scrape.lock (see next section).
Confirm host disk headroom — merges need temporary space on the archive volume (df -h ~/Documents).

Scrape Lock Already Held

Symptoms: Scrape lock is held or Another scrape is already running when starting validation or documents scrape.

Cause: Only one scrape should run per archive_root. A long validation, cron job, or a second checkout (for example Downloads vs MyBook) can hold {archive_root}/.dce-scrape.lock.

Solutions:

Inspect lock state:
```
./scripts/scrape-lock-status.sh
```
Wait for the active scrape to finish if PID is live.
Reclaim stale lock after a crash (only when status shows stale/free):
```
./scripts/scrape-lock-status.sh --reclaim-stale
```
Do not delete the lock while a scrape is still running — twin exports can OOM-loop on the same channel.

Partial Export Stuck in `.dce-temp`

Symptoms: Large folder under output_dir/.dce-temp/export.<channel_id>.*; archive cursor not advancing; audit excludes .dce-temp (expected).

Solutions:

Stop any active export writing that temp (check lock status and running podman/docker processes).

Salvage quiescent temps (default skips temps modified in the last ~120s):

./scripts/run-documents-scrape.sh --salvage-only --target NAME [--channel ID]

Force salvage of an active temp only after confirming nothing is writing:

DCE_SALVAGE_ACTIVE_TEMPS=1 ./scripts/run-documents-scrape.sh --salvage-only --target NAME --channel ID

Truncated JSON in the archive file itself (not .dce-temp):

./scripts/salvage-truncated-export.sh path/to/archive.json

"Failed to write archive" or Permission Denied

Symptoms: Export fails with write permission errors

Solutions:

Check directory permissions:

ls -la archive/target-name/
chmod 755 archive/target-name/
chmod 644 archive/target-name/*.json

If using Docker/Podman, set user mode:

# For rootless podman
export DCE_USERNS_MODE=keep-id
export DCE_UID=$(id -u)
export DCE_GID=$(id -g)

Check SELinux (if enabled):

getenforce
# If "Enforcing", add `:z` to mount options:
# docker-compose.yml should already have this

Docker/Container Issues

"Failed to build image" Error

Symptoms: Docker build fails during setup

Solutions:

Verify Docker is running:
```
docker ps
docker version
```
Check disk space:
```
docker system df
```

Clean up and retry:

docker system prune -a
docker-compose build --no-cache

If using Podman:

podman system prune -a
podman-compose build --no-cache

"Cannot connect to Docker daemon" Error

Symptoms: Setup fails to reach Docker

Solutions:

For Docker:

sudo systemctl start docker
sudo usermod -aG docker $USER
newgrp docker

For Podman (rootless):

systemctl --user start podman
systemctl --user enable podman

Authorization / Token Refresh Issues

Host Retry Auth Flow Not Working

Symptoms: Export fails with 401/403 errors even with DISCORD_TOKEN_FILE set

Solutions:

Verify token file is readable:
```
cat $DISCORD_TOKEN_FILE
```
Ensure proper permissions:
```
chmod 600 $DISCORD_TOKEN_FILE
```
Check token is fresh:
- Tokens can expire
- Generate a new token from Discord Developer Portal
- Update the token file

Verify host wrapper is being used:

grep run-discord-scrape-host scripts/run-discord-scrape-host.sh

Getting Help

If you're still stuck:

Check existing issues: https://github.com/Tyrrrz/DiscordChatExporter/issues

Run preflight in verbose mode:

set -x  # Enable debug output
./scripts/run-discord-scrape.sh preflight

Check logs:

# Docker logs
docker-compose logs --tail 50

# Cron logs (on Linux)
sudo journalctl -u cron --since "1 hour ago"

Collect error details for reporting issues:
- Config (sanitize token)
- Full error message
- OS/Docker version
- Steps to reproduce

12 KiB Raw Blame History

Recurring Discord Scrape Automation - Troubleshooting Guide

Setup Issues

"Required file not found" Error

"Invalid JSON config" Error

"DISCORD_TOKEN must be set" Error

"Target output_dir is outside archive_root" Error

Authentication Issues

"Guild discovery failed" Error

"Export ... belongs to channel XXX, expected YYY" Error

Cron Schedule Issues

Cron Job Not Running

"No such file or directory" in Cron Logs

Cron Jobs Running at Wrong Time

Export Issues

Exports Complete but Produce Empty Files

"Archive is not valid JSON" Error

Incremental Exports Are Too Slow

Channel Export SKIPPED (OOM / Aborted / Killed)

Scrape Lock Already Held

Partial Export Stuck in .dce-temp

"Failed to write archive" or Permission Denied

Docker/Container Issues

"Failed to build image" Error

"Cannot connect to Docker daemon" Error

Authorization / Token Refresh Issues

Host Retry Auth Flow Not Working

Getting Help

12 KiB

Raw Blame History

Partial Export Stuck in `.dce-temp`