feat(scrape): per-target container_memory in scrape config

Single --target runs apply optional container_memory from
scrape-targets.json when global DCE_CONTAINER_MEMORY is unset.
KotOR_discord_msgs defaults to 8g; scrape.env still overrides.
This commit is contained in:
Copilot 2026-06-03 09:55:33 -05:00
parent d280ba86e9
commit 8ca55f299b
8 changed files with 178 additions and 12 deletions

View file

@ -81,6 +81,7 @@
"name": "KotOR_discord_msgs",
"kind": "guild",
"output_dir": "/home/brunner56/Documents/KotOR_discord_msgs",
"container_memory": "8g",
"guild_ids": [],
"channel_ids": [],
"guild_name_patterns": ["KotOR_discord_msgs", "KotOR"]

View file

@ -0,0 +1,54 @@
---
title: "feat: Per-target container_memory in scrape-targets.json"
type: feat
status: complete
date: 2026-06-04
origin: /lfg — plan 063 deferred per-target memory; KotOR yes_general catch-up should auto-apply 8g when scraping KotOR_discord_msgs alone without global scrape.env change
---
# feat: Per-target container_memory in scrape-targets.json
## Summary
Optional `container_memory` on each target in `scrape-targets.json`. When `run-discord-scrape-host.sh` runs with exactly one `--target`, apply that target's memory cap to compose unless `DCE_CONTAINER_MEMORY` is already set non-zero in the environment or `scrape.env`.
## Problem
Operators must set global `DCE_CONTAINER_MEMORY=8g` in `scrape.env` for KotOR `yes_general` catch-up, affecting every scrape including lightweight targets. A per-target knob keeps default runs unlimited while auto-raising memory for KotOR-only runs.
## Requirements
| ID | Requirement |
|----|-------------|
| R1 | Targets may include optional `container_memory` (e.g. `8g`) |
| R2 | Host runner applies it when exactly one `--target` is selected and global `DCE_CONTAINER_MEMORY` is unset or `0` |
| R3 | Explicit `DCE_CONTAINER_MEMORY` in shell or `scrape.env` wins over config |
| R4 | Run plan banner shows per-target memory when applied |
| R5 | `KotOR_discord_msgs` configured with `container_memory: "8g"` |
| R6 | Host smoke asserts config-driven memory when env omits global cap |
| R7 | `DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh` → 21/21 |
## Implementation Units
### U1. Config field + host apply
**Files:** `scripts/lib/scrape-run-plan.sh`, `scripts/run-discord-scrape-host.sh`, `config/scrape-targets.json`
### U2. Smoke + docs
**Files:** `scripts/tests/run-discord-scrape-host-smoke.sh`, `docs/recurring-scrape-merge-readiness.md`, `docs/recurring-scrape-operator-checklist.md`
## Verification
```bash
./scripts/tests/run-discord-scrape-host-smoke.sh
DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh
```
## Scope Boundaries
### Deferred
- Live KotOR catch-up execution inside LFG
- Numeric max across multiple `--target` flags
- Structured JSON run logs

View file

@ -160,6 +160,8 @@ DCE_MIN_FREE_MB=0 ./scripts/run-operator-validation.sh \
**Plan 066 (2026-06-04):** `prove-incremental-append --channel` filters snapshots and grow-only comparison to selected channels.
**Plan 067 (2026-06-04):** Optional per-target `container_memory` in `scrape-targets.json` (single `--target` runs); `KotOR_discord_msgs` defaults to `8g`.
**Disk:** ~65 GiB free on `/home` (2026-05-30); large channel merges still need headroom.
## CI note (fork PRs)

View file

@ -58,12 +58,15 @@ Salvage then incremental scrape:
./scripts/run-operator-proof.sh --salvage-before-scrape --sync-gui --target NAME
```
**KotOR yes_general** (`221726893064454144`): first catch-up after a 2021 archive cursor can take hours and may OOM; salvage preserved partials before retrying. Stop duplicate validation processes (MyBook vs Downloads checkouts share the same lock). For large catch-up, set `DCE_CONTAINER_MEMORY=8g` in `scrape.env` (or export before the run), then:
**KotOR yes_general** (`221726893064454144`): first catch-up after a 2021 archive cursor can take hours and may OOM; salvage preserved partials before retrying. Stop duplicate validation processes (MyBook vs Downloads checkouts share the same lock). `KotOR_discord_msgs` sets `container_memory: "8g"` in `scrape-targets.json` for single-target runs; override globally with `DCE_CONTAINER_MEMORY` in `scrape.env` if needed. Channel-scoped proof:
```bash
./scripts/run-operator-validation.sh --salvage-before-scrape \
--target KotOR_discord_msgs --channel 221726893064454144 \
--log-file logs/kotor-yes-general.log
./scripts/prove-incremental-append.sh \
--target KotOR_discord_msgs --channel 221726893064454144
```
## GUI zip only

View file

@ -21,4 +21,7 @@ DCE_USERNS_MODE=
# Optional: raise scrape container memory for multi-year channel catch-up (yes_general, etc.).
# Examples: 8g, 8192m. Default 0 = no compose memory cap.
# Optional: raise container memory for large multi-year channel catch-up (compose mem_limit).
# Per-target: set container_memory on a target in config/scrape-targets.json (single --target runs).
# Global override (wins over config): uncomment below.
# DCE_CONTAINER_MEMORY=8g

View file

@ -18,7 +18,12 @@ print_scrape_config_plan() {
printf 'Targets (%s selected):\n' "${#requested_targets[@]}"
for name in "${requested_targets[@]}"; do
output_dir=$(jq -r --arg name "$name" '.targets[] | select(.name == $name) | .output_dir' "$config_path")
mem=$(target_container_memory "$config_path" "$name" 2>/dev/null || true)
if [[ -n "$mem" && "$mem" != "null" ]]; then
printf ' - %s → %s (container_memory: %s)\n' "$name" "$output_dir" "$mem"
else
printf ' - %s → %s\n' "$name" "$output_dir"
fi
done
return 0
fi
@ -35,3 +40,34 @@ enabled_target_names() {
local config_path=$1
jq -r '.targets[] | select(.enabled != false) | .name' "$config_path"
}
target_container_memory() {
local config_path=$1
local target_name=$2
if [[ ! -f "$config_path" ]]; then
printf '\n'
return 0
fi
jq -r --arg name "$target_name" '
.targets[]
| select(.name == $name)
| .container_memory // empty
' "$config_path"
}
apply_single_target_container_memory() {
local config_path=$1
local target_name=$2
local mem=""
[[ -n "$target_name" ]] || return 0
if [[ -n "${DCE_CONTAINER_MEMORY:-}" && "${DCE_CONTAINER_MEMORY:-0}" != "0" ]]; then
return 0
fi
mem=$(target_container_memory "$config_path" "$target_name")
[[ -n "$mem" && "$mem" != "null" ]] || return 0
export DCE_CONTAINER_MEMORY="$mem"
}

View file

@ -43,6 +43,8 @@ Environment:
DCE_COMPOSE_TTY When zero, compose run passes -T (no pseudo-TTY). Default omits -T
so compose backends allocate a TTY for line-buffered progress logs.
DCE_CONTAINER_MEMORY Optional container memory cap (e.g. 8g, 8192m). Default 0 = unlimited.
Targets may set container_memory in scrape-targets.json (used when
exactly one --target is selected and this env var is unset or 0).
Notes:
When $ENV_FILE is missing, exported DISCORD_TOKEN or DISCORD_TOKEN_FILE is used instead.
@ -507,6 +509,23 @@ run_subcommand_with_retry() {
die "Container run failed for '$subcommand' after one auth refresh retry."
}
collect_passthrough_targets() {
local -n _targets_out=$1
shift
local -a args=("$@")
local idx=0
_targets_out=()
while (( idx < ${#args[@]} )); do
if [[ "${args[idx]}" == "--target" ]]; then
_targets_out+=("${args[idx + 1]:-}")
idx=$((idx + 2))
continue
fi
idx=$((idx + 1))
done
}
main() {
local -a passthrough_args=()
local subcommand=""
@ -562,22 +581,20 @@ main() {
fi
[[ -f "$COMPOSE_FILE" ]] || die "Missing compose file: $COMPOSE_FILE"
local host_config host_targets=()
host_config=$(resolve_host_config_path "${passthrough_args[@]}")
collect_passthrough_targets host_targets "${passthrough_args[@]}"
if ((${#host_targets[@]} == 1)); then
apply_single_target_container_memory "$host_config" "${host_targets[0]}"
fi
if [[ "$subcommand" != "salvage" ]]; then
prepare_compose_env
fi
REAUTH_COMMAND="${DCE_REAUTH_COMMAND:-}"
run_disk_preflight_if_enabled "${passthrough_args[@]}"
local host_config host_targets=() arg_idx=0
host_config=$(resolve_host_config_path "${passthrough_args[@]}")
while (( arg_idx < ${#passthrough_args[@]} )); do
if [[ "${passthrough_args[arg_idx]}" == "--target" ]]; then
host_targets+=("${passthrough_args[arg_idx + 1]:-}")
arg_idx=$((arg_idx + 2))
continue
fi
arg_idx=$((arg_idx + 1))
done
print_scrape_config_plan "$host_config" "Host $subcommand" "${host_targets[@]}"
case "$subcommand" in

View file

@ -254,4 +254,54 @@ grep -q 'env:DCE_CONTAINER_MEMORY=8g' "$COMPOSE_MEM_LOG" || {
exit 1
}
TARGET_MEM_CONFIG="$TMP_DIR/target-mem-config.json"
mkdir -p "$TMP_DIR/archive/demo"
cat >"$TARGET_MEM_CONFIG" <<EOF
{
"archive_root": "$TMP_DIR/archive",
"targets": [
{
"name": "demo",
"output_dir": "$TMP_DIR/archive/demo",
"container_memory": "4g"
}
]
}
EOF
ENV_NO_MEM="$TMP_DIR/no-mem.env"
printf 'DISCORD_TOKEN=dummy\n' >"$ENV_NO_MEM"
COMPOSE_TARGET_MEM_LOG="$TMP_DIR/compose-target-mem.log"
env -u DCE_CONTAINER_MEMORY \
DCE_SKIP_SCRAPE_LOCK=1 \
DCE_COMPOSE_BIN="$FAKE_COMPOSE" \
DCE_REPO_ROOT="$REPO_ROOT" \
DCE_ENV_FILE="$ENV_NO_MEM" \
DCE_COMPOSE_FILE="$COMPOSE_FILE" \
FAKE_COMPOSE_ARGS_LOG="$COMPOSE_TARGET_MEM_LOG" \
"$REPO_ROOT/scripts/run-discord-scrape-host.sh" scrape \
--config "$TARGET_MEM_CONFIG" --target demo >/dev/null
grep -q 'env:DCE_CONTAINER_MEMORY=4g' "$COMPOSE_TARGET_MEM_LOG" || {
echo "expected target container_memory=4g in compose env when global unset" >&2
cat "$COMPOSE_TARGET_MEM_LOG" >&2
exit 1
}
ENV_OVERRIDE="$TMP_DIR/override-mem.env"
printf 'DISCORD_TOKEN=dummy\nDCE_CONTAINER_MEMORY=2g\n' >"$ENV_OVERRIDE"
COMPOSE_OVERRIDE_LOG="$TMP_DIR/compose-override-mem.log"
env -u DCE_CONTAINER_MEMORY \
DCE_SKIP_SCRAPE_LOCK=1 \
DCE_COMPOSE_BIN="$FAKE_COMPOSE" \
DCE_REPO_ROOT="$REPO_ROOT" \
DCE_ENV_FILE="$ENV_OVERRIDE" \
DCE_COMPOSE_FILE="$COMPOSE_FILE" \
FAKE_COMPOSE_ARGS_LOG="$COMPOSE_OVERRIDE_LOG" \
"$REPO_ROOT/scripts/run-discord-scrape-host.sh" scrape \
--config "$TARGET_MEM_CONFIG" --target demo >/dev/null
grep -q 'env:DCE_CONTAINER_MEMORY=2g' "$COMPOSE_OVERRIDE_LOG" || {
echo "expected scrape.env DCE_CONTAINER_MEMORY to override target config" >&2
cat "$COMPOSE_OVERRIDE_LOG" >&2
exit 1
}
echo "run-discord-scrape-host smoke test passed"