feat(scrape): disk preflight on host runner for cron jobs

run-discord-scrape-host.sh runs verify --disk-only before preflight/scrape
so setup-cron monthly jobs fail fast when archive roots are low on space.
Harden bootstrap smoke to surface failures when dry-run fails.
This commit is contained in:
Boden 2026-05-29 16:00:11 -05:00
parent 32b7f47d45
commit 44eadee634
4 changed files with 81 additions and 4 deletions

View file

@ -0,0 +1,26 @@
---
title: feat: Disk preflight on host runner (cron path)
type: feat
status: complete
date: 2026-05-29
origin: /lfg — monthly cron calls run-discord-scrape-host.sh directly, bypassing run-documents-scrape disk check
---
# feat: Disk preflight on host runner (cron path)
## Summary
`setup-cron.sh` invokes `run-discord-scrape-host.sh scrape`, not `run-documents-scrape.sh`. Run the same `--disk-only` check in the host wrapper before `preflight` and `scrape` so scheduled jobs fail fast when `/home` is full.
## Requirements
| ID | Requirement |
|----|-------------|
| R1 | `run-discord-scrape-host.sh` resolves host config from `--config` (maps `/config/...` to repo config) |
| R2 | Calls `verify-operator-ready.sh --disk-only` when the script exists (skips in minimal fake-repo smokes) |
| R3 | `DCE_SKIP_DISK_CHECK=1` bypasses check for tests that need it |
| R4 | `run-all-smokes.sh` still passes with `DCE_MIN_FREE_MB=0` |
## Verification
- `DCE_MIN_FREE_MB=0 ./scripts/run-all-smokes.sh`

View file

@ -56,7 +56,8 @@ df -h ~/Documents /home/brunner56/Downloads/DiscordChatExporter
```
Override threshold: `DCE_MIN_FREE_MB=2048 ./scripts/verify-operator-ready.sh`
Skip check (smokes only): `DCE_MIN_FREE_MB=0`
Skip check (smokes only): `DCE_MIN_FREE_MB=0`
Also enforced by `run-documents-scrape.sh`, `run-discord-scrape-host.sh` (cron), and `run-operator-validation.sh`.
## CI note (fork PRs)

View file

@ -12,6 +12,7 @@ DOCKER_BIN_OVERRIDDEN=0
REAUTH_COMMAND=""
COMPOSE_ENV_FILE=""
COMPOSE_ENV_TEMP=""
VERIFY_READY="$REPO_ROOT/scripts/verify-operator-ready.sh"
if [[ -n "${DCE_DOCKER_BIN:-}" ]]; then
DOCKER_BIN_OVERRIDDEN=1
@ -259,6 +260,48 @@ resolve_reauth_command() {
printf '%s\n' "$resolved_path"
}
resolve_host_config_path() {
local -a args=("$@")
local i=0 cfg="$REPO_ROOT/config/scrape-targets.json"
while (( i < ${#args[@]} )); do
if [[ "${args[i]}" == "--config" ]]; then
cfg="${args[i + 1]:-}"
case "$cfg" in
/config/*)
cfg="$REPO_ROOT/config/scrape-targets.json"
;;
./*)
cfg="$REPO_ROOT/${cfg#./}"
;;
/*) ;;
*)
cfg="$REPO_ROOT/$cfg"
;;
esac
break
fi
i=$((i + 1))
done
printf '%s\n' "$cfg"
}
run_disk_preflight_if_enabled() {
local -a args=("$@")
local cfg
if [[ "${DCE_SKIP_DISK_CHECK:-0}" == 1 ]]; then
return 0
fi
if [[ ! -x "$VERIFY_READY" ]]; then
return 0
fi
cfg=$(resolve_host_config_path "${args[@]}")
"$VERIFY_READY" --disk-only --config "$cfg"
}
is_discord_auth_failure() {
local output_file=$1
grep -Eqi \
@ -381,6 +424,7 @@ main() {
[[ -f "$COMPOSE_FILE" ]] || die "Missing compose file: $COMPOSE_FILE"
prepare_compose_env
REAUTH_COMMAND="${DCE_REAUTH_COMMAND:-}"
run_disk_preflight_if_enabled "${passthrough_args[@]}"
case "$subcommand" in
preflight|scrape)

View file

@ -41,9 +41,15 @@ cat >"$TMP_DIR/config.json" <<JSON
}
JSON
"$BOOTSTRAP" --dry-run --config "$TMP_DIR/config.json" | grep -q 'Dry run complete' || {
printf 'bootstrap --dry-run did not complete\n' >&2
set +e
bootstrap_output=$("$BOOTSTRAP" --dry-run --config "$TMP_DIR/config.json" 2>&1)
bootstrap_status=$?
set -e
if [[ "$bootstrap_status" -ne 0 ]] || ! grep -q 'Dry run complete' <<<"$bootstrap_output"; then
printf 'bootstrap --dry-run failed (status=%s)\n' "$bootstrap_status" >&2
printf '%s\n' "$bootstrap_output" >&2
exit 1
}
fi
printf 'bootstrap-recurring-scrape-smoke: ok\n'