DiscordChatExporter/.github/copilot-instructions.md
Boden d66b9dab63 feat(validation): comprehensive recurring scraper validation suite and documentation
IMPLEMENTATION UNITS (U1-U6):

U1: Append-only merge test coverage
- Enhanced run-discord-scrape-smoke.sh with additional test scenarios
- Created append-partial-write.json and append-concurrent-conflict.json fixtures
- Added assertions for message sorting, deduplication, and idempotency
- All 10 merge scenarios validated

U2: Error handling validation
- Created error-path-smoke.sh with 6 error scenario tests
- Added test configs for invalid paths, missing files, bad JSON
- Verified fail-closed behavior on all error paths
- No silent data loss on any failure

U3: Cron idempotency and lifecycle
- Created cron-idempotency-smoke.sh with full lifecycle testing
- Created fixture crontab with unrelated entries (preservation test)
- Verified idempotent install, update, and remove operations
- Confirmed dry-run and entry preservation

U4: Preflight and end-to-end setup
- Created end-to-end-preflight-smoke.sh with 10 validation tests
- Verified preflight is read-only and gates cron installation
- Confirmed host-retry auth flow (commit 090884f)
- Added preflight validation section to Scheduling-Linux.md

U5: Documentation completion
- Updated Readme.md with recurring-scraper link
- Created Recurring-Scrape-Setup.md (6300+ chars comprehensive guide)
- Created Recurring-Scrape-Troubleshooting.md (9200+ chars with 30+ scenarios)
- Enhanced .docs/Scheduling-Linux.md with preflight section
- All documented behavior matches implementation

U6: Production-readiness checklist
- Created docs/recurring-scrape-production-checklist.md
- Compiled all validation results (33+ scenarios across U1-U5)
- Documented test execution commands for re-validation
- Provided deployment notes and monitoring guidance
- Clear sign-off criteria established

ARTIFACTS:
- 4 new smoke test scripts (1000+ lines total)
- 4 new fixtures and test configs
- 3 new documentation files (15500+ chars)
- 2 updated documentation files
- 1 validation checklist tracking document
- All tests passing

SAFETY GUARANTEES VERIFIED:
 No silent data loss on any error path
 Fail-closed behavior throughout
 Archive updates are append-only and idempotent
 Cron installation is idempotent
 Unrelated cron entries preserved
 Preflight is read-only
 Token validated before operations
 Path traversal prevented

STATUS: Production Ready
All 6 implementation units complete and validated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-27 12:57:32 -05:00

154 lines
6.2 KiB
Markdown

# Copilot Instructions for DiscordChatExporter
## Build, Test, and Lint Commands
### Build
```bash
# Full build
dotnet build --configuration Release
# Quick build without formatting checks
dotnet build -p:CSharpier_Bypass=true
```
### Test
```bash
# Run all tests
dotnet test --configuration Release
# Run a specific test file
dotnet test --configuration Release --filter "ClassName=HtmlContentSpecs"
# Run tests with code coverage
dotnet test -p:CSharpier_Bypass=true --configuration Release --collect:"XPlat Code Coverage"
```
### Format and Lint
```bash
# Format code with CSharpier (integrated into CI)
dotnet build -t:CSharpierFormat --configuration Release
# Just verify formatting without applying fixes
dotnet build -p:CSharpier_Bypass=true --configuration Release
```
> **Note:** CSharpier formatting is enforced in CI. Use `dotnet build -t:CSharpierFormat` before committing to avoid CI failures.
## High-Level Architecture
DiscordChatExporter is a .NET 10.0 application with a layered architecture:
### Layer 1: Core (`DiscordChatExporter.Core`)
- **Discord** - Discord API client and data models
- `DiscordClient` - HTTP client for Discord API v10
- Data models in `Discord/Data/` (records like `Channel`, `Message`, `Guild`) with `Parse()` methods for JSON deserialization
- Rate-limit handling with configurable preference
- **Exporting** - Multi-format export engines
- `ChannelExporter` - Orchestrates the export process
- Format writers: `HtmlMessageWriter`, `JsonMessageWriter`, `CsvMessageWriter`, `PlainTextMessageWriter`
- Asset downloading and context building
- **Markdown** - Converts Discord markdown to target format (HTML or plaintext)
- **Utils** - Shared utilities for HTTP, validation, etc.
### Layer 2: Interfaces
- **Cli** (`DiscordChatExporter.Cli`) - Command-line interface using CliFx
- Commands in `Commands/` subdirectory (follows command pattern)
- **Gui** (`DiscordChatExporter.Gui`) - Graphical interface using Avalonia
- ViewModels with MVVM pattern
- Services for state management
- Localization support
### Layer 3: Tests
- **Cli.Tests** (`DiscordChatExporter.Cli.Tests`) - Integration tests using xUnit
- `Specs/` - Scenario tests for export formats and features
- `Infra/` - Test infrastructure and helpers
- Tests verify HTML/JSON/CSV/TXT exports against Discord test data
### Data Flow
```
Discord API → DiscordClient (rate-limited)
→ ExportContext (loads channel/role/user data)
→ MessageExporter (fetches and writes messages)
→ Format-specific Writer (HTML/JSON/CSV/TXT)
→ File output
```
## Key Conventions
### C# Language Features
- **File-scoped namespaces** - Use `namespace X;` (not braces)
- **Primary constructors** - `public class MyClass(string param)` for injecting dependencies
- **Nullable reference types** - Enabled globally; use `?` for nullable types, `!` only when safe
- **Treat warnings as errors** - All warnings must be resolved before commit
### Data Model Patterns
- Use `record` types for data classes (immutable by default)
- Implement `IHasId` interface for entities with ID fields
- Deserialization via `public static T Parse(JsonElement json)` method
- Partial records with separate `Parse` methods in distinct file sections
- Use `Pipe()` extension for method chaining transformations
```csharp
// Example pattern:
public partial record Message(Snowflake Id, string Content) : IHasId { }
public partial record Message
{
public static Message Parse(JsonElement json)
{
var id = json.GetProperty("id").GetNonWhiteSpaceString().Pipe(Snowflake.Parse);
var content = json.GetProperty("content").GetNonWhiteSpaceString();
return new(id, content);
}
}
```
### Exception Handling
- Custom exceptions inherit from `DiscordChatExporterException`
- Specific exception types for domain errors: `ChannelEmptyException`, `InvalidStateException`, etc.
- Exceptions include helpful context about the guild/channel where applicable
### Discord API Integration
- All API URLs are relative to base URI `https://discord.com/api/v10/`
- Token authorization uses `Authorization` header (either `Bot {token}` or raw token)
- Rate limiting respects Discord advisory headers but can be configured to respect only hard limits
- Use `Http.ResponseResiliencePipeline` for retry logic (configured via Polly)
### Export Format Implementation
- Each format has a dedicated `*MessageWriter` class
- Writers implement `MessageWriter` interface
- Template files (`.cshtml`) use RazorBlade for HTML/plaintext rendering
- Markdown conversion uses separate visitors: `HtmlMarkdownVisitor`, `PlainTextMarkdownVisitor`
### Testing
- Tests in `DiscordChatExporter.Cli.Tests/Specs/` follow naming pattern: `[Format][Feature]Specs.cs`
- Use xUnit `[Fact]` for individual tests
- Test infrastructure in `Infra/` includes `ExportWrapper` for export orchestration
- Tests require Discord API access; sensitive tests need `DISCORD_TOKEN` secret
- Use FluentAssertions for readable assertions: `.Should().Equal(...)`, `.Should().Contain(...)`
### Dependencies and Injection
- Microsoft.Extensions.DependencyInjection for IoC
- Services typically injected via primary constructor
- Configuration loaded via Microsoft.Extensions.Configuration (supports env vars and user secrets)
### Code Organization
- Folder structure mirrors namespace structure
- Data models organized under domain folder (e.g., `Discord/Data/`)
- Keep public methods at the top of the class
- Use `async ValueTask` for small async operations, `async Task` for larger ones
## Architecture Details
### Why This Structure?
- **Separation of concerns**: Core library independent from UI implementations
- **Multi-UI support**: CLI and GUI share identical core export logic
- **Testability**: Core is fully testable without UI dependencies
- **Extensibility**: New export formats are isolated to a single writer class
### Important Flow Details
- Message export is **stream-based** to handle large channels efficiently
- Discord API client implements **exponential backoff** for rate limits
- Exports can be **partitioned** by size or date range to manage large channel history
- Assets (images, videos, etc.) can be **selectively downloaded** during export