# Database Maintenance ## Overview HLL Vietnam keeps database cleanup at the application level. The current maintenance scope is intentionally narrow: - old `server_snapshots`; - old non-critical `rcon_admin_log_events`; - old critical `rcon_admin_log_events` only after retention and protected-match checks; - old non-protected `rcon_materialized_matches`; - dependent `rcon_match_player_stats` for deleted matches. The first maintenance pass does not routinely delete: - `displayed_historical_snapshots`; - file-based snapshots under `backend/data/snapshots/`; - public-scoreboard `historical_*` fallback tables; - `player_event_raw_ledger` and its worker metadata; - Elo/MMR tables; - Comunidad Hispana #03 data reactivation or targets. ## Why Application-Level And Not `pg_cron` Cleanup is versioned in backend code instead of delegated to `pg_cron`, host cron, or a separate container because the retention logic depends on product rules: - keep the latest 100 closed materialized matches; - keep the current month; - keep the previous month during the first 7 days of a new month; - keep the current week; - keep the previous week when weekly fallback may still need it; - keep child stats for protected matches; - avoid breaking current/live pages that still read recent AdminLog data. Those rules belong with the application’s read and write model, not inside database-only scheduling. ## Scheduled Cleanup Inside `historical-runner` Database maintenance is scheduled inside `app.historical_runner`. Behavior: - disabled by default; - no extra Docker service is added for maintenance; - the runner checks whether maintenance is due; - when enabled and due, the runner invokes `python -m app.database_maintenance cleanup --apply` behavior through the shared Python function; - failures are logged and do not crash the historical runner loop; - cleanup runs under the same writer-lock coordination used by the historical writer flows. Relevant structured log events: - `database-maintenance-scheduler-skipped-disabled` - `database-maintenance-scheduler-skipped-not-due` - `database-maintenance-scheduler-started` - `database-maintenance-scheduler-completed` - `database-maintenance-scheduler-failed` ## Environment Variables Required maintenance-related variables: ```text HLL_DB_MAINTENANCE_ENABLED=false HLL_DB_MAINTENANCE_INTERVAL_SECONDS=43200 HLL_RECENT_MATCHES_KEEP=100 HLL_ADMIN_LOG_NONCRITICAL_RETENTION_DAYS=30 HLL_ADMIN_LOG_CRITICAL_RETENTION_DAYS=90 HLL_SERVER_SNAPSHOT_RETENTION_DAYS=14 HLL_DB_MAINTENANCE_BATCH_SIZE=5000 ``` Meaning: - `HLL_DB_MAINTENANCE_ENABLED` Enables scheduled apply mode inside `historical-runner`. - `HLL_DB_MAINTENANCE_INTERVAL_SECONDS` Default scheduler interval. `43200` means every 12 hours. - `HLL_RECENT_MATCHES_KEEP` Number of latest closed materialized matches that must always be protected. - `HLL_ADMIN_LOG_NONCRITICAL_RETENTION_DAYS` Retention for non-critical AdminLog events such as chat/connect/disconnect. - `HLL_ADMIN_LOG_CRITICAL_RETENTION_DAYS` Retention for critical AdminLog events such as `kill`, `match_start`, `match_end`. - `HLL_SERVER_SNAPSHOT_RETENTION_DAYS` Retention for live server snapshots. - `HLL_DB_MAINTENANCE_BATCH_SIZE` Delete batch size for apply mode. ## Protected Data The cleanup command protects: - latest 100 closed materialized matches by default; - current month materialized matches; - previous month materialized matches when the current day is `1` through `7`; - current week materialized matches; - previous week materialized matches when weekly fallback may still need them; - `rcon_match_player_stats` belonging to protected matches; - current/live AdminLog data required for visible current-match surfaces; - `displayed_historical_snapshots`; - file snapshots in `backend/data/snapshots/`. If a match timestamp cannot be interpreted safely, that match is skipped and protected instead of deleted. ## Deleted Data Apply mode is currently allowed to delete: - `server_snapshots` older than retention; - non-critical `rcon_admin_log_events` older than retention; - critical `rcon_admin_log_events` older than retention only when they are not required by protected materialized match ranges; - non-protected `rcon_materialized_matches`; - dependent `rcon_match_player_stats` for deleted matches. Current critical AdminLog event types: - `kill` - `match_start` - `match_end` ## Dry-Run Command From `backend/`: ```powershell python -m app.database_maintenance cleanup --dry-run ``` From the repository root with the backend package on `PYTHONPATH`: ```powershell $env:PYTHONPATH='backend' python -m app.database_maintenance cleanup --dry-run ``` Inside Docker Compose: ```powershell docker compose exec backend python -m app.database_maintenance cleanup --dry-run ``` Useful dry-run options: ```powershell docker compose exec backend python -m app.database_maintenance cleanup --dry-run ` --recent-matches-keep 100 ` --admin-log-noncritical-retention-days 30 ` --admin-log-critical-retention-days 90 ` --server-snapshot-retention-days 14 ` --batch-size 5000 ``` Dry-run is the safe preview path and should be reviewed before any production apply. ## Apply Command Local module execution: ```powershell python -m app.database_maintenance cleanup --apply ``` Docker Compose: ```powershell docker compose exec backend python -m app.database_maintenance cleanup --apply ``` One-off local validation with a fixed time anchor: ```powershell python -m app.database_maintenance cleanup --apply --now 2026-06-20T12:00:00Z ``` Optional maintenance vacuum/analyze: ```powershell python -m app.database_maintenance cleanup --apply --vacuum-analyze ``` ## Table-Size Audit SQL ```sql select schemaname, relname as table_name, pg_size_pretty(pg_total_relation_size(relid)) as total_size, pg_size_pretty(pg_relation_size(relid)) as table_size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as indexes_size, n_live_tup as estimated_rows, n_dead_tup as estimated_dead_rows from pg_stat_user_tables order by pg_total_relation_size(relid) desc; ``` ## Row-Count And Age Audit SQL ### AdminLog events by type/date ```sql select event_type, count(*) as row_count, min(event_timestamp) as first_event_timestamp, max(event_timestamp) as last_event_timestamp, min(server_time) as first_server_time, max(server_time) as last_server_time from rcon_admin_log_events group by event_type order by row_count desc, event_type asc; ``` ### Materialized matches by server/date ```sql select target_key, source_basis, count(*) as matches, min(coalesce(ended_at, started_at)) as first_closed_at, max(coalesce(ended_at, started_at)) as last_closed_at from rcon_materialized_matches group by target_key, source_basis order by target_key asc, source_basis asc; ``` ### Server snapshots by date ```sql select server_id, min(captured_at) as first_captured_at, max(captured_at) as last_captured_at, count(*) as snapshot_rows from server_snapshots group by server_id order by last_captured_at desc; ``` ### Displayed snapshots count ```sql select snapshot_type, metric, snapshot_window, count(*) as snapshot_rows, min(generated_at) as first_generated_at, max(generated_at) as last_generated_at from displayed_historical_snapshots group by snapshot_type, metric, snapshot_window order by snapshot_type asc, metric asc, snapshot_window asc; ``` ## Logs To Inspect The cleanup command emits JSON logs. Minimum events to look for: - `database-maintenance-started` - `database-maintenance-plan` - `database-maintenance-table-skipped` - `database-maintenance-delete-batch` - `database-maintenance-completed` - `database-maintenance-error` Examples: ```powershell docker compose logs --tail=200 backend docker compose logs --tail=200 historical-runner ``` If scheduled cleanup is enabled: ```powershell docker compose logs --tail=200 historical-runner ``` ## Docker And Portainer Warnings - Never use `docker compose down -v` unless you intentionally want to delete PostgreSQL and mounted volume data. - Always review dry-run output before enabling apply in production. - Do not manually delete protected match or player-stat rows from PostgreSQL. - Keep backups before changing retention settings. - Do not add Comunidad Hispana #03 back into RCON targets in this task. - Do not add a separate maintenance container, host cron, or `pg_cron` job for this feature. For Portainer-style operations the same warning applies: - deleting volumes is destructive; - maintenance should run through the application command, not through manual table purges. ## Rollback And Restore Considerations - Retention changes are destructive when apply mode runs. - Keep a PostgreSQL backup before enabling scheduled apply in production. - If cleanup removes too much data, recovery is restore-based, not “undo last delete.” - Favor dry-run, smaller batch sizes, and reviewed retention values before long-running scheduled apply. ## Safe Operator Flow 1. Audit table size and row ages with the SQL above. 2. Run dry-run locally or in Compose. 3. Review protected counts and candidate counts in JSON output. 4. Enable `HLL_DB_MAINTENANCE_ENABLED=true` only after dry-run review. 5. Monitor `historical-runner` logs for scheduler events and cleanup completion.