DiscordChatExporter/.github/copilot-instructions.md
Boden d66b9dab63 feat(validation): comprehensive recurring scraper validation suite and documentation
IMPLEMENTATION UNITS (U1-U6):

U1: Append-only merge test coverage
- Enhanced run-discord-scrape-smoke.sh with additional test scenarios
- Created append-partial-write.json and append-concurrent-conflict.json fixtures
- Added assertions for message sorting, deduplication, and idempotency
- All 10 merge scenarios validated

U2: Error handling validation
- Created error-path-smoke.sh with 6 error scenario tests
- Added test configs for invalid paths, missing files, bad JSON
- Verified fail-closed behavior on all error paths
- No silent data loss on any failure

U3: Cron idempotency and lifecycle
- Created cron-idempotency-smoke.sh with full lifecycle testing
- Created fixture crontab with unrelated entries (preservation test)
- Verified idempotent install, update, and remove operations
- Confirmed dry-run and entry preservation

U4: Preflight and end-to-end setup
- Created end-to-end-preflight-smoke.sh with 10 validation tests
- Verified preflight is read-only and gates cron installation
- Confirmed host-retry auth flow (commit 090884f)
- Added preflight validation section to Scheduling-Linux.md

U5: Documentation completion
- Updated Readme.md with recurring-scraper link
- Created Recurring-Scrape-Setup.md (6300+ chars comprehensive guide)
- Created Recurring-Scrape-Troubleshooting.md (9200+ chars with 30+ scenarios)
- Enhanced .docs/Scheduling-Linux.md with preflight section
- All documented behavior matches implementation

U6: Production-readiness checklist
- Created docs/recurring-scrape-production-checklist.md
- Compiled all validation results (33+ scenarios across U1-U5)
- Documented test execution commands for re-validation
- Provided deployment notes and monitoring guidance
- Clear sign-off criteria established

ARTIFACTS:
- 4 new smoke test scripts (1000+ lines total)
- 4 new fixtures and test configs
- 3 new documentation files (15500+ chars)
- 2 updated documentation files
- 1 validation checklist tracking document
- All tests passing

SAFETY GUARANTEES VERIFIED:
 No silent data loss on any error path
 Fail-closed behavior throughout
 Archive updates are append-only and idempotent
 Cron installation is idempotent
 Unrelated cron entries preserved
 Preflight is read-only
 Token validated before operations
 Path traversal prevented

STATUS: Production Ready
All 6 implementation units complete and validated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-27 12:57:32 -05:00

6.2 KiB

Copilot Instructions for DiscordChatExporter

Build, Test, and Lint Commands

Build

# Full build
dotnet build --configuration Release

# Quick build without formatting checks
dotnet build -p:CSharpier_Bypass=true

Test

# Run all tests
dotnet test --configuration Release

# Run a specific test file
dotnet test --configuration Release --filter "ClassName=HtmlContentSpecs"

# Run tests with code coverage
dotnet test -p:CSharpier_Bypass=true --configuration Release --collect:"XPlat Code Coverage"

Format and Lint

# Format code with CSharpier (integrated into CI)
dotnet build -t:CSharpierFormat --configuration Release

# Just verify formatting without applying fixes
dotnet build -p:CSharpier_Bypass=true --configuration Release

Note: CSharpier formatting is enforced in CI. Use dotnet build -t:CSharpierFormat before committing to avoid CI failures.

High-Level Architecture

DiscordChatExporter is a .NET 10.0 application with a layered architecture:

Layer 1: Core (DiscordChatExporter.Core)

  • Discord - Discord API client and data models
    • DiscordClient - HTTP client for Discord API v10
    • Data models in Discord/Data/ (records like Channel, Message, Guild) with Parse() methods for JSON deserialization
    • Rate-limit handling with configurable preference
  • Exporting - Multi-format export engines
    • ChannelExporter - Orchestrates the export process
    • Format writers: HtmlMessageWriter, JsonMessageWriter, CsvMessageWriter, PlainTextMessageWriter
    • Asset downloading and context building
  • Markdown - Converts Discord markdown to target format (HTML or plaintext)
  • Utils - Shared utilities for HTTP, validation, etc.

Layer 2: Interfaces

  • Cli (DiscordChatExporter.Cli) - Command-line interface using CliFx
    • Commands in Commands/ subdirectory (follows command pattern)
  • Gui (DiscordChatExporter.Gui) - Graphical interface using Avalonia
    • ViewModels with MVVM pattern
    • Services for state management
    • Localization support

Layer 3: Tests

  • Cli.Tests (DiscordChatExporter.Cli.Tests) - Integration tests using xUnit
    • Specs/ - Scenario tests for export formats and features
    • Infra/ - Test infrastructure and helpers
    • Tests verify HTML/JSON/CSV/TXT exports against Discord test data

Data Flow

Discord API → DiscordClient (rate-limited)
           → ExportContext (loads channel/role/user data)
           → MessageExporter (fetches and writes messages)
           → Format-specific Writer (HTML/JSON/CSV/TXT)
           → File output

Key Conventions

C# Language Features

  • File-scoped namespaces - Use namespace X; (not braces)
  • Primary constructors - public class MyClass(string param) for injecting dependencies
  • Nullable reference types - Enabled globally; use ? for nullable types, ! only when safe
  • Treat warnings as errors - All warnings must be resolved before commit

Data Model Patterns

  • Use record types for data classes (immutable by default)
  • Implement IHasId interface for entities with ID fields
  • Deserialization via public static T Parse(JsonElement json) method
  • Partial records with separate Parse methods in distinct file sections
  • Use Pipe() extension for method chaining transformations
// Example pattern:
public partial record Message(Snowflake Id, string Content) : IHasId { }

public partial record Message
{
    public static Message Parse(JsonElement json)
    {
        var id = json.GetProperty("id").GetNonWhiteSpaceString().Pipe(Snowflake.Parse);
        var content = json.GetProperty("content").GetNonWhiteSpaceString();
        return new(id, content);
    }
}

Exception Handling

  • Custom exceptions inherit from DiscordChatExporterException
  • Specific exception types for domain errors: ChannelEmptyException, InvalidStateException, etc.
  • Exceptions include helpful context about the guild/channel where applicable

Discord API Integration

  • All API URLs are relative to base URI https://discord.com/api/v10/
  • Token authorization uses Authorization header (either Bot {token} or raw token)
  • Rate limiting respects Discord advisory headers but can be configured to respect only hard limits
  • Use Http.ResponseResiliencePipeline for retry logic (configured via Polly)

Export Format Implementation

  • Each format has a dedicated *MessageWriter class
  • Writers implement MessageWriter interface
  • Template files (.cshtml) use RazorBlade for HTML/plaintext rendering
  • Markdown conversion uses separate visitors: HtmlMarkdownVisitor, PlainTextMarkdownVisitor

Testing

  • Tests in DiscordChatExporter.Cli.Tests/Specs/ follow naming pattern: [Format][Feature]Specs.cs
  • Use xUnit [Fact] for individual tests
  • Test infrastructure in Infra/ includes ExportWrapper for export orchestration
  • Tests require Discord API access; sensitive tests need DISCORD_TOKEN secret
  • Use FluentAssertions for readable assertions: .Should().Equal(...), .Should().Contain(...)

Dependencies and Injection

  • Microsoft.Extensions.DependencyInjection for IoC
  • Services typically injected via primary constructor
  • Configuration loaded via Microsoft.Extensions.Configuration (supports env vars and user secrets)

Code Organization

  • Folder structure mirrors namespace structure
  • Data models organized under domain folder (e.g., Discord/Data/)
  • Keep public methods at the top of the class
  • Use async ValueTask for small async operations, async Task for larger ones

Architecture Details

Why This Structure?

  • Separation of concerns: Core library independent from UI implementations
  • Multi-UI support: CLI and GUI share identical core export logic
  • Testability: Core is fully testable without UI dependencies
  • Extensibility: New export formats are isolated to a single writer class

Important Flow Details

  • Message export is stream-based to handle large channels efficiently
  • Discord API client implements exponential backoff for rate limits
  • Exports can be partitioned by size or date range to manage large channel history
  • Assets (images, videos, etc.) can be selectively downloaded during export