fix(review): apply autofix feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Your Name 2026-05-24 20:30:37 -05:00
parent 96873b5e7d
commit 07151924cf
12 changed files with 460 additions and 16 deletions

View file

@ -96,3 +96,7 @@ If you authenticate with a **bot token**, do not rely on guild-name or DM discov
`preflight` now probes one resolved channel per selected target with the source-built CLI before cron is installed. If the token cannot read that channel, setup fails closed and leaves the existing crontab untouched.
If you run the recurring flow through podman on an SELinux-enabled host, keep the bind mounts relabeled (`:z`). The checked-in `docker-compose.yml` already applies this to the recurring wrapper mounts.
For rootless podman, set `DCE_USERNS_MODE=keep-id` in `scrape.env` so the mounted archive roots stay writable as your host user instead of appearing as `root:root` inside the container. Keep `DCE_UID` and `DCE_GID` matched to your host user as well.
When the recurring wrapper sees an existing archive file whose name already embeds the channel ID (for example, `Deadly Stream - kotor-general [770102657795751976].json`), future runs keep updating that same path. When a channel is new and has no stored mapping yet, the wrapper now defaults to a human-readable filename based on the export metadata (`Guild - Category - Channel [id].json`) inside the configured target root instead of writing to `channels/<id>.json`.

View file

@ -26,6 +26,10 @@ If any selected target fails that authenticated probe, `setup-cron.sh` stops wit
If you are running the recurring wrapper through podman on an SELinux-enabled host, keep the bind mounts relabeled (`:z`). The checked-in `docker-compose.yml` already includes that for the recurring config and archive mounts.
For rootless podman, set `DCE_USERNS_MODE=keep-id` in `scrape.env` so the mounted `Documents` archive roots stay writable as your host user during scheduled runs. Keep `DCE_UID` and `DCE_GID` matched to your host user as well.
Existing archive files remain the source of truth for recurring updates. If a channel already has a local JSON export whose filename embeds the channel ID, the wrapper updates that exact file in place. If the channel is new and has no stored mapping yet, the first export now lands in the configured target root with a human-readable default filename (`Guild - Category - Channel [id].json`) instead of `channels/<id>.json`.
## Creating the script
1. Open Terminal and create a new text file with `nano /path/to/DiscordChatExporter/cron.sh`

View file

@ -1,6 +1,6 @@
---
name: Recurring Discord scrape automation
last_updated: 2026-05-24
last_updated: 2026-05-25
---
# Recurring Discord scrape automation Strategy
@ -11,7 +11,7 @@ The user needs multiple Discord archives refreshed on a schedule without losing
## Our approach
Wrap the source-built CLI in a self-hosted Docker + cron workflow that defaults to safe recurring operation: append-only archive updates, conservative target resolution, and fail-closed preflight before anything touches cron or existing exports. We win by treating data preservation and operator clarity as load-bearing, not optional polish.
Wrap the source-built CLI in a self-hosted Docker + cron workflow that defaults to safe recurring operation: append-only archive updates, conservative target resolution, human-readable first-write defaults inside each configured archive root, and fail-closed preflight before anything touches cron or existing exports. We win by treating data preservation and operator clarity as load-bearing, not optional polish.
## Who it's for

View file

@ -6,6 +6,7 @@ services:
image: discordchatexporter-cron:local
init: true
user: "${DCE_UID:-1000}:${DCE_GID:-1000}"
userns_mode: "${DCE_USERNS_MODE:-}"
working_dir: /workspace
environment:
DISCORD_TOKEN: ${DISCORD_TOKEN:?Set DISCORD_TOKEN in scrape.env or your shell environment.}

View file

@ -0,0 +1,242 @@
---
title: fix: Verify live archive path updates
type: fix
status: active
date: 2026-05-24
---
# fix: Verify live archive path updates
## Summary
Verify that the recurring scrape wrapper updates the user's existing `~/Documents` archives in place, then tighten destination-path fallback behavior so new unmapped channels default to stable, human-readable archive names instead of `channels/<channel-id>.json`.
This plan keeps the append-only merge contract and custom `output_dir` roots intact while closing the gap between current archive layout expectations and the wrapper's fallback naming behavior.
---
## Problem Frame
The recurring wrapper already preserves existing JSON exports when it can map a channel ID back to the right file, but the current fallback path for an unmapped channel is `output_dir/channels/<channel-id>.json`. The user's real archive set under `~/Documents` uses custom per-target roots with human-readable filenames, so this path fallback must be verified against live data and corrected so first-time channel exports do not drift into a different directory structure.
---
## Assumptions
*This plan was authored without synchronous user confirmation. The items below are agent inferences that fill gaps in the input — un-validated bets that should be reviewed before implementation proceeds.*
- The existing archive files already present under `~/Documents/<target>` are the source of truth for where future updates should land for those channels.
- When a channel has no prior archive and no stored mapping, the preferred default is an intuitive human-readable filename that still includes the channel ID for stable remapping.
- Live verification should use the current checked-in custom target config rather than introducing a second temporary config shape.
---
## Requirements
- R1. Running the recurring scraper against the current `config/scrape-targets.json` must continue to target the user's custom `~/Documents/<target>` roots and must not redirect updates into unrelated directories.
- R2. Existing archives already present under those custom roots must be updated in place through the append-only merge path rather than overwritten from scratch or duplicated into a new fallback path.
- R3. When a channel has no prior archive or stored mapping, the wrapper's default destination naming must be stable and human-readable, aligned with the CLI's guild/category/channel naming conventions instead of `channels/<channel-id>.json`.
- R4. Destination-path resolution must remain fail-closed: ambiguous matches, invalid JSON, wrong-channel archives, or paths outside the configured target root must abort without mutating the existing archive.
- R5. The behavior must be covered by fixture-based smoke tests and documented so operators understand both custom output roots and default naming for newly discovered channels.
---
## Scope Boundaries
- No change to the top-level custom target roots already configured in `config/scrape-targets.json`.
- No change to the core C# exporter append semantics; archive preservation remains wrapper-layer behavior.
- No attempt to make inaccessible Discord targets succeed with the current token; auth/access blockers remain external runtime constraints.
---
## Context & Research
### Relevant Code and Patterns
- `scripts/run-discord-scrape.sh` already enforces append-only updates via `--after`, `merge_exports`, channel identity checks, and target-local temp files.
- `scripts/run-discord-scrape.sh` persists channel-to-path mappings in `output_dir/.dce-meta/channel-map.json` and currently falls back to `output_dir/channels/<channel-id>.json` for unmapped channels.
- `scripts/tests/run-discord-scrape-smoke.sh` is the established shell-level safety test for append, dedupe, and wrong-channel no-clobber behavior.
- `config/scrape-targets.json` is the checked-in custom-root contract for the user's `~/Documents` archive tree.
- `DiscordChatExporter.Core/Exporting/ExportRequest.cs` contains the upstream CLI's human-readable default output naming logic and is the best reference for new wrapper fallback naming.
- `.docs/Docker.md` and `.docs/Scheduling-Linux.md` are the operator-facing docs for the recurring wrapper.
### Institutional Learnings
- No `docs/solutions/` directory exists in this repo.
- Existing plan and wrapper behavior intentionally keep archive safety in the shell layer and treat fail-closed preflight and path validation as load-bearing safety guarantees.
### External References
- None used. Repo-local patterns are strong enough for this fix.
---
## Key Technical Decisions
- **Treat the current custom `output_dir` values as authoritative:** updates must remain under the configured `~/Documents/<target>` roots; fixes should improve filename resolution inside those roots rather than invent a new directory layout.
- **Reuse human-readable archive names for first-write defaults:** new unmapped channels should adopt a stable guild/category/channel-based filename that still embeds the channel ID, matching the repo's existing archive style and the upstream CLI's naming conventions.
- **Preserve channel-to-path mapping as the long-term source of stability:** once a channel is resolved to a destination file, future runs should continue updating that same file regardless of later naming changes elsewhere.
- **Prove path behavior with both fixture coverage and a real runtime pass:** shell smoke tests should lock the path-resolution contract, and implementation should also run the wrapper against the real `~/Documents` config to confirm in-place updates or to surface external blockers without writing to alternate paths.
---
## Open Questions
### Resolved During Planning
- **Should the custom roots be changed?** No. The existing per-target `~/Documents/<target>` directories remain the contract.
- **What should replace the `channels/<channel-id>.json` fallback?** A human-readable default filename derived from guild/category/channel naming, with the channel ID preserved for stable remapping.
- **What is the success condition for live verification?** The run must either update the existing archive file in place or fail before creating a parallel destination path outside the expected custom root layout.
### Deferred to Implementation
- **How much live verification can succeed with the current token?** The implementer must determine this at runtime; if access is blocked, the verification outcome should still prove that no alternate path was created.
- **Which exact naming helper shape is least duplicative?** Implementation should decide whether to shell out to CLI naming-friendly metadata already present in exports or mirror the upstream naming rules directly in the wrapper after inspecting the simplest safe reuse path.
---
## Implementation Units
### U1. Fix destination-path fallback and archive seeding
**Goal:** Ensure the wrapper resolves channel destinations to the existing custom archive files when present and uses intuitive human-readable defaults for first-time channels.
**Requirements:** R1, R2, R3, R4
**Dependencies:** None
**Files:**
- Modify: `scripts/run-discord-scrape.sh`
- Modify: `config/scrape-targets.json`
- Test: `scripts/tests/run-discord-scrape-smoke.sh`
**Approach:**
- Review and tighten `resolve_destination_path()` so it first prefers persisted channel mappings, then existing archive files under the configured `output_dir`, and only then falls back to a new default path.
- Replace the current `output_dir/channels/<channel-id>.json` fallback with a stable human-readable filename that matches the archive naming style already present under `~/Documents` and the upstream CLI's default naming semantics.
- Preserve the rule that every resolved destination stays inside the configured target root and that ambiguous matches hard-fail instead of guessing.
**Patterns to follow:**
- `scripts/run-discord-scrape.sh`
- `DiscordChatExporter.Core/Exporting/ExportRequest.cs`
- `config/scrape-targets.json`
**Test scenarios:**
- Happy path: an existing archive with a human-readable filename containing `[channel-id]` is discovered, mapped, and reused for the update.
- Happy path: an unmapped first-time channel resolves to a human-readable filename under the target root and records that mapping for later runs.
- Edge case: a target root containing multiple matching files for the same channel ID fails closed and does not guess.
- Error path: a mapped path outside the configured target root is rejected before export.
- Integration: a rerun after the new mapping is written updates the exact same file path rather than creating a second archive path.
**Verification:**
- Unmapped channels no longer default to `channels/<channel-id>.json`.
- Existing archives under the configured custom roots continue to resolve back to their current file paths.
---
### U2. Expand append-only and path-safety smoke coverage
**Goal:** Add fixture coverage that proves path resolution and append-only updates do not create parallel archives or overwrite unrelated files.
**Requirements:** R2, R3, R4, R5
**Dependencies:** U1
**Files:**
- Modify: `scripts/tests/run-discord-scrape-smoke.sh`
- Create: `scripts/tests/test-fixtures/path-existing.json`
- Create: `scripts/tests/test-fixtures/path-incremental.json`
**Approach:**
- Extend the existing smoke script with a case where a preexisting human-readable archive under a target root has no stored map yet and must still be updated in place.
- Add coverage for the first-run fallback path so the test can assert both the filename pattern and that the file lands directly under the configured target root.
- Keep the current wrong-channel and invalid-json hard-fail expectations as the guardrail against archive corruption.
**Execution note:** Start by adding/adjusting fixture coverage before changing the fallback logic so the path regression is pinned by tests.
**Patterns to follow:**
- `scripts/tests/run-discord-scrape-smoke.sh`
- `scripts/tests/test-fixtures/append-existing.json`
- `scripts/tests/test-fixtures/append-incremental.json`
- `scripts/tests/test-fixtures/wrong-channel.json`
**Test scenarios:**
- Happy path: an existing human-readable archive with no prior `channel-map.json` entry is updated in place and retains prior messages.
- Happy path: a first-time export creates one human-readable file directly under the target root and writes a matching channel-map entry.
- Edge case: an incremental export with zero new messages leaves the existing human-readable archive untouched and does not create a second file.
- Error path: wrong-channel incremental data fails without replacing the existing archive or writing a new fallback file.
- Integration: two consecutive runs against the same channel keep the same destination path and merge by message ID.
**Verification:**
- Smoke coverage fails on the old `channels/<channel-id>.json` fallback and passes with the new default naming behavior.
- Existing append-only protections still pass after the path-resolution changes.
---
### U3. Run real-config verification and document the contract
**Goal:** Validate the fixed wrapper against the user's checked-in `~/Documents` targets and document how custom roots and default naming interact.
**Requirements:** R1, R2, R5
**Dependencies:** U1, U2
**Files:**
- Modify: `.docs/Docker.md`
- Modify: `.docs/Scheduling-Linux.md`
- Modify: `STRATEGY.md`
**Approach:**
- Run the wrapper with the real checked-in target config and current token/runtime setup to confirm that accessible channels update their existing archive paths in place and that blocked targets fail without creating alternate destination paths.
- Capture the operator contract in docs: custom `output_dir` roots remain authoritative, existing archives are reused, and first-time channels adopt human-readable defaults inside the target root.
- Refresh strategy/docs language only where needed so the runtime promise matches the implemented destination behavior.
**Patterns to follow:**
- `.docs/Docker.md`
- `.docs/Scheduling-Linux.md`
- `STRATEGY.md`
**Test scenarios:**
- Test expectation: none -- this unit is runtime verification and documentation work backed by the fixture coverage in U2.
**Verification:**
- A real wrapper run against the configured `~/Documents` targets either updates an existing archive path in place or fails before creating a conflicting path.
- Operator docs explain both custom-root behavior and the default naming used for new channels within those roots.
---
## System-Wide Impact
- **Interaction graph:** destination-path changes affect archive seeding, channel-map persistence, append-only merge flow, Docker runtime expectations, and cron-driven recurring runs.
- **Error propagation:** path ambiguity or invalid archive state must still abort the affected channel/target before any destination replacement.
- **State lifecycle risks:** incorrect fallback naming could silently split one logical channel across two files; the plan explicitly tests and prevents that.
- **API surface parity:** the user-facing contract spans the shell wrapper, checked-in target config, and operator docs, so all three need to stay aligned.
- **Integration coverage:** fixture tests cover deterministic path and merge semantics; runtime verification covers interaction with the real `~/Documents` archive tree and current token access.
---
## Risks & Dependencies
| Risk | Mitigation |
|------|------------|
| Live token access is still blocked for some targets | Treat runtime verification as success only when it proves no alternate path was created; document blocked targets explicitly rather than guessing |
| New filename generation diverges from existing archive style | Follow the upstream CLI naming reference and add smoke coverage for first-write defaults |
| Path-fix logic accidentally stops recognizing existing archives | Extend seeded-archive tests before changing fallback behavior and keep channel-map persistence authoritative |
---
## Documentation / Operational Notes
- Keep the docs explicit that custom `output_dir` targets remain authoritative and that new default naming only applies within those roots when no prior archive path exists.
- Runtime verification should be performed through the source-built wrapper path, not the downloaded binary bundle, so the docs and behavior stay aligned.
---
## Sources & References
- Strategy: `STRATEGY.md`
- Prior plan: `docs/plans/2026-05-24-001-feat-recurring-cli-scrape-automation-plan.md`
- Related code: `scripts/run-discord-scrape.sh`
- Related tests: `scripts/tests/run-discord-scrape-smoke.sh`
- Related config: `config/scrape-targets.json`
- Naming reference: `DiscordChatExporter.Core/Exporting/ExportRequest.cs`
- Operator docs: `.docs/Docker.md`, `.docs/Scheduling-Linux.md`

View file

@ -5,3 +5,7 @@ TZ=UTC
# Match these to the host user that should own created files.
DCE_UID=1000
DCE_GID=1000
# For rootless podman, set this to keep-id so mounted archive roots stay writable.
# Leave it empty on Docker unless you explicitly need a user namespace mode there.
DCE_USERNS_MODE=

View file

@ -71,6 +71,27 @@ normalize_name() {
printf '%s' "$1" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]'
}
escape_file_name_component() {
printf '%s' "$1" \
| tr '\r\n' ' ' \
| sed -E 's#[/\\]+#_#g; s/[[:cntrl:]]+/ /g; s/[[:space:]]+/ /g; s/^ //; s/ $//'
}
join_name_parts() {
local result=""
local part
for part in "$@"; do
[[ -n "$part" ]] || continue
if [[ -n "$result" ]]; then
result+=" - "
fi
result+="$part"
done
printf '%s\n' "$result"
}
json_array_from_args() {
jq -cn '$ARGS.positional' --args "$@"
}
@ -102,6 +123,14 @@ path_is_within_root() {
local root=$1
local path=$2
[[ "$path" == /* ]] || return 1
case "$path" in
*"/../"*|../*|*/..|..|*"/./"*|./*|*/.)
return 1
;;
esac
case "$path" in
"$root"|"${root}/"*)
return 0
@ -201,7 +230,7 @@ update_channel_map() {
local temp_file
mkdir -p "$(dirname "$map_file")"
temp_file=$(mktemp "$(dirname "$map_file")/channel-map.XXXXXX.json")
temp_file=$(mktemp "$(dirname "$map_file")/channel-map.XXXXXX")
jq --arg channel_id "$channel_id" --arg destination_path "$destination_path" \
'.[$channel_id] = $destination_path' \
"$map_file" >"$temp_file"
@ -213,13 +242,37 @@ get_channel_map_path() {
printf '%s/.dce-meta/channel-map.json' "$output_dir"
}
default_destination_path_from_export() {
local output_dir=$1
local export_path=$2
local guild_name category_name channel_name channel_id
local escaped_guild_name escaped_category_name escaped_channel_name base_name
guild_name=$(jq -r '.guild.name // empty' "$export_path")
category_name=$(jq -r '.channel.category // empty' "$export_path")
channel_name=$(jq -r '.channel.name // empty' "$export_path")
channel_id=$(jq -r '.channel.id // empty' "$export_path")
[[ -n "$channel_id" ]] || die "Export '$export_path' is missing channel.id metadata."
escaped_guild_name=$(escape_file_name_component "$guild_name")
escaped_category_name=$(escape_file_name_component "$category_name")
escaped_channel_name=$(escape_file_name_component "$channel_name")
base_name=$(join_name_parts "$escaped_guild_name" "$escaped_category_name" "$escaped_channel_name")
[[ -n "$base_name" ]] || base_name="channel"
printf '%s/%s [%s].json\n' "$output_dir" "$base_name" "$channel_id"
}
resolve_destination_path() {
local output_dir=$1
local channel_id=$2
local export_path=${3:-}
local map_file mapped_path
local -a existing_candidates
mkdir -p "$output_dir/.dce-meta" "$output_dir/channels"
mkdir -p "$output_dir/.dce-meta"
map_file=$(get_channel_map_path "$output_dir")
ensure_json_file "$map_file"
@ -241,12 +294,19 @@ resolve_destination_path() {
fi
if (( ${#existing_candidates[@]} == 1 )); then
jq empty "${existing_candidates[0]}" >/dev/null 2>&1 \
|| die "Existing export is not valid JSON: ${existing_candidates[0]}"
assert_export_channel_identity "${existing_candidates[0]}" "$channel_id"
update_channel_map "$map_file" "$channel_id" "${existing_candidates[0]}"
printf '%s\n' "${existing_candidates[0]}"
return 0
fi
mapped_path="$output_dir/channels/$channel_id.json"
[[ -n "$export_path" ]] || return 0
mapped_path=$(default_destination_path_from_export "$output_dir" "$export_path")
path_is_within_root "$output_dir" "$mapped_path" \
|| die "Derived destination '$mapped_path' for channel $channel_id is outside target root '$output_dir'."
update_channel_map "$map_file" "$channel_id" "$mapped_path"
printf '%s\n' "$mapped_path"
}
@ -530,9 +590,11 @@ scrape_target() {
local channel_id
for channel_id in "${channel_ids[@]}"; do
destination_path=$(resolve_destination_path "$output_dir" "$channel_id")
if [[ -n "$destination_path" ]]; then
mkdir -p "$(dirname "$destination_path")"
fi
if [[ -f "$destination_path" ]]; then
if [[ -n "$destination_path" && -f "$destination_path" ]]; then
jq empty "$destination_path" >/dev/null 2>&1 || die "Existing export is not valid JSON: $destination_path"
assert_export_channel_identity "$destination_path" "$channel_id"
fi
@ -558,6 +620,11 @@ scrape_target() {
jq empty "$temp_export" >/dev/null 2>&1 || die "Incremental export is not valid JSON: $temp_export"
assert_export_channel_identity "$temp_export" "$channel_id"
if [[ -z "$destination_path" ]]; then
destination_path=$(resolve_destination_path "$output_dir" "$channel_id" "$temp_export")
mkdir -p "$(dirname "$destination_path")"
fi
latest_batch_count=$(message_count "$temp_export")
if [[ ! -f "$destination_path" ]]; then
mv "$temp_export" "$destination_path"

View file

@ -4,9 +4,14 @@ set -Eeuo pipefail
REPO_ROOT=$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd -P)
TMP_ENV=$(mktemp "${TMPDIR:-/tmp}/dce-container-smoke.XXXXXX.env")
TMP_PODMAN_ENV=$(mktemp "${TMPDIR:-/tmp}/dce-container-smoke.XXXXXX.podman.env")
ARCHIVE_ROOT=$(jq -r '.archive_root' "$REPO_ROOT/config/scrape-targets.json")
WRITE_TEST_DIR="$ARCHIVE_ROOT/.dce-container-smoke-$$"
cleanup() {
rm -f "$TMP_ENV"
rm -f "$TMP_PODMAN_ENV"
rm -rf "$WRITE_TEST_DIR"
}
trap cleanup EXIT
@ -17,9 +22,17 @@ DCE_GID=$(id -g)
TZ=UTC
EOF
cp "$TMP_ENV" "$TMP_PODMAN_ENV"
printf 'DCE_USERNS_MODE=keep-id\n' >>"$TMP_PODMAN_ENV"
cd "$REPO_ROOT"
docker compose --env-file "$TMP_ENV" build
docker compose --env-file "$TMP_ENV" run --rm discord-scraper help >/dev/null
docker compose --env-file "$TMP_ENV" run --rm discord-scraper list-targets >/dev/null
if docker version 2>&1 | grep -qi podman || docker info 2>&1 | grep -qi podman; then
mkdir -p "$WRITE_TEST_DIR"
docker compose --env-file "$TMP_PODMAN_ENV" run -T --rm --entrypoint /bin/sh discord-scraper -lc "mkdir -p '$WRITE_TEST_DIR/from-container' && rmdir '$WRITE_TEST_DIR/from-container'" >/dev/null
fi
echo "container smoke test passed"

View file

@ -8,6 +8,7 @@ TMP_DIR=$(mktemp -d "${TMPDIR:-/tmp}/dce-run-smoke.XXXXXX")
ARCHIVE_ROOT="$TMP_DIR/archive"
CONFIG_PATH="$TMP_DIR/config.json"
FAKE_CLI="$TMP_DIR/fake-cli.sh"
DEFAULT_FILE_NAME="Fixture Guild - Testing Grounds - fixture-room [111].json"
cleanup() {
rm -rf "$TMP_DIR"
@ -29,6 +30,46 @@ cat >"$CONFIG_PATH" <<JSON
"channel_ids": ["111"],
"guild_ids": [],
"guild_name_patterns": []
},
{
"name": "seeded",
"kind": "guild",
"output_dir": "$ARCHIVE_ROOT/seeded",
"channel_ids": ["111"],
"guild_ids": [],
"guild_name_patterns": []
},
{
"name": "duplicate",
"kind": "guild",
"output_dir": "$ARCHIVE_ROOT/duplicate",
"channel_ids": ["111"],
"guild_ids": [],
"guild_name_patterns": []
},
{
"name": "invalid",
"kind": "guild",
"output_dir": "$ARCHIVE_ROOT/invalid",
"channel_ids": ["111"],
"guild_ids": [],
"guild_name_patterns": []
},
{
"name": "mapped-outside-root",
"kind": "guild",
"output_dir": "$ARCHIVE_ROOT/mapped-outside-root",
"channel_ids": ["111"],
"guild_ids": [],
"guild_name_patterns": []
},
{
"name": "seeded-wrong-channel",
"kind": "guild",
"output_dir": "$ARCHIVE_ROOT/seeded-wrong-channel",
"channel_ids": ["111"],
"guild_ids": [],
"guild_name_patterns": []
}
]
}
@ -77,31 +118,81 @@ EOF
chmod +x "$FAKE_CLI"
run_wrapper() {
local target_name=$1
local mode=$2
DISCORD_TOKEN=dummy \
DCE_CLI_BIN="$FAKE_CLI" \
DCE_PRIMARY_CONFIG="$CONFIG_PATH" \
DCE_FALLBACK_CONFIG="$CONFIG_PATH" \
FAKE_DCE_FIXTURE_DIR="$FIXTURE_DIR" \
FAKE_DCE_MODE="$1" \
"$REPO_ROOT/scripts/run-discord-scrape.sh" scrape --target demo
FAKE_DCE_MODE="$mode" \
"$REPO_ROOT/scripts/run-discord-scrape.sh" scrape --target "$target_name"
}
run_wrapper initial
run_wrapper demo initial
DEST="$ARCHIVE_ROOT/demo/channels/111.json"
DEST="$ARCHIVE_ROOT/demo/$DEFAULT_FILE_NAME"
[[ -f "$DEST" ]] || { echo "expected destination archive missing" >&2; exit 1; }
[[ "$(jq -r '.messages | length' "$DEST")" == "2" ]] || { echo "expected initial message count of 2" >&2; exit 1; }
[[ ! -d "$ARCHIVE_ROOT/demo/channels" ]] || { echo "unexpected channels directory created for default fallback" >&2; exit 1; }
run_wrapper append
run_wrapper demo append
[[ "$(jq -r '.messages | length' "$DEST")" == "3" ]] || { echo "expected appended message count of 3" >&2; exit 1; }
[[ "$(jq -r '.messages[-1].id' "$DEST")" == "3" ]] || { echo "expected last message id 3 after append" >&2; exit 1; }
mapped_dest=$(jq -r '."111"' "$ARCHIVE_ROOT/demo/.dce-meta/channel-map.json")
[[ "$mapped_dest" == "$DEST" ]] || { echo "expected channel map to point to human-readable destination" >&2; exit 1; }
before_checksum=$(sha256sum "$DEST" | awk '{print $1}')
if run_wrapper wrong-channel; then
if run_wrapper demo wrong-channel; then
echo "wrong-channel fixture should have failed" >&2
exit 1
fi
after_checksum=$(sha256sum "$DEST" | awk '{print $1}')
[[ "$before_checksum" == "$after_checksum" ]] || { echo "destination archive changed after failed wrong-channel run" >&2; exit 1; }
[[ ! -e "$ARCHIVE_ROOT/demo/channels/111.json" ]] || { echo "unexpected legacy fallback file created" >&2; exit 1; }
mkdir -p "$ARCHIVE_ROOT/seeded"
cp "$FIXTURE_DIR/append-existing.json" "$ARCHIVE_ROOT/seeded/$DEFAULT_FILE_NAME"
run_wrapper seeded append
SEEDED_DEST="$ARCHIVE_ROOT/seeded/$DEFAULT_FILE_NAME"
[[ -f "$SEEDED_DEST" ]] || { echo "expected seeded archive missing" >&2; exit 1; }
[[ "$(jq -r '.messages | length' "$SEEDED_DEST")" == "3" ]] || { echo "expected seeded archive to be updated in place" >&2; exit 1; }
seeded_mapped_dest=$(jq -r '."111"' "$ARCHIVE_ROOT/seeded/.dce-meta/channel-map.json")
[[ "$seeded_mapped_dest" == "$SEEDED_DEST" ]] || { echo "expected seeded channel map to point to existing archive" >&2; exit 1; }
[[ ! -e "$ARCHIVE_ROOT/seeded/channels/111.json" ]] || { echo "unexpected fallback file created for seeded archive" >&2; exit 1; }
mkdir -p "$ARCHIVE_ROOT/duplicate"
cp "$FIXTURE_DIR/append-existing.json" "$ARCHIVE_ROOT/duplicate/$DEFAULT_FILE_NAME"
cp "$FIXTURE_DIR/append-existing.json" "$ARCHIVE_ROOT/duplicate/Fixture Guild - Another Path [111].json"
if run_wrapper duplicate append; then
echo "duplicate existing matches should have failed" >&2
exit 1
fi
mkdir -p "$ARCHIVE_ROOT/invalid"
printf 'not-json\n' >"$ARCHIVE_ROOT/invalid/$DEFAULT_FILE_NAME"
if run_wrapper invalid append; then
echo "invalid existing archive should have failed" >&2
exit 1
fi
[[ ! -e "$ARCHIVE_ROOT/invalid/channels/111.json" ]] || { echo "unexpected fallback file created for invalid archive" >&2; exit 1; }
mkdir -p "$ARCHIVE_ROOT/mapped-outside-root/.dce-meta"
printf '{\"111\":\"%s\"}\n' "$ARCHIVE_ROOT/mapped-outside-root/../outside.json" >"$ARCHIVE_ROOT/mapped-outside-root/.dce-meta/channel-map.json"
if run_wrapper mapped-outside-root append; then
echo "mapped path outside target root should have failed" >&2
exit 1
fi
[[ ! -e "$ARCHIVE_ROOT/outside.json" ]] || { echo "unexpected outside-root file created from mapped path" >&2; exit 1; }
mkdir -p "$ARCHIVE_ROOT/seeded-wrong-channel"
cp "$FIXTURE_DIR/wrong-channel.json" "$ARCHIVE_ROOT/seeded-wrong-channel/$DEFAULT_FILE_NAME"
if run_wrapper seeded-wrong-channel append; then
echo "seeded archive with wrong embedded channel should have failed" >&2
exit 1
fi
[[ ! -e "$ARCHIVE_ROOT/seeded-wrong-channel/channels/111.json" ]] || { echo "unexpected fallback file created for wrong-channel seeded archive" >&2; exit 1; }
echo "run-discord-scrape smoke test passed"

View file

@ -1,6 +1,12 @@
{
"guild": {
"id": "222",
"name": "Fixture Guild"
},
"channel": {
"id": "111"
"id": "111",
"name": "fixture-room",
"category": "Testing Grounds"
},
"messages": [
{

View file

@ -1,6 +1,12 @@
{
"guild": {
"id": "222",
"name": "Fixture Guild"
},
"channel": {
"id": "111"
"id": "111",
"name": "fixture-room",
"category": "Testing Grounds"
},
"messages": [
{

View file

@ -1,6 +1,12 @@
{
"guild": {
"id": "222",
"name": "Fixture Guild"
},
"channel": {
"id": "999"
"id": "999",
"name": "wrong-room",
"category": "Testing Grounds"
},
"messages": [
{