9.1 KiB
Database Maintenance
Overview
HLL Vietnam keeps database cleanup at the application level.
The current maintenance scope is intentionally narrow:
- old
server_snapshots; - old non-critical
rcon_admin_log_events; - old critical
rcon_admin_log_eventsonly after retention and protected-match checks; - old non-protected
rcon_materialized_matches; - dependent
rcon_match_player_statsfor deleted matches.
The first maintenance pass does not routinely delete:
displayed_historical_snapshots;- file-based snapshots under
backend/data/snapshots/; - public-scoreboard
historical_*fallback tables; player_event_raw_ledgerand its worker metadata;- Elo/MMR tables;
- Comunidad Hispana #03 data reactivation or targets.
Why Application-Level And Not pg_cron
Cleanup is versioned in backend code instead of delegated to pg_cron, host cron, or a separate container because the retention logic depends on product rules:
- keep the latest 100 closed materialized matches;
- keep the current month;
- keep the previous month during the first 7 days of a new month;
- keep the current week;
- keep the previous week when weekly fallback may still need it;
- keep child stats for protected matches;
- avoid breaking current/live pages that still read recent AdminLog data.
Those rules belong with the application’s read and write model, not inside database-only scheduling.
Scheduled Cleanup Inside historical-runner
Database maintenance is scheduled inside app.historical_runner.
Behavior:
- disabled by default;
- no extra Docker service is added for maintenance;
- the runner checks whether maintenance is due;
- when enabled and due, the runner invokes
python -m app.database_maintenance cleanup --applybehavior through the shared Python function; - failures are logged and do not crash the historical runner loop;
- cleanup runs under the same writer-lock coordination used by the historical writer flows.
Relevant structured log events:
database-maintenance-scheduler-skipped-disableddatabase-maintenance-scheduler-skipped-not-duedatabase-maintenance-scheduler-starteddatabase-maintenance-scheduler-completeddatabase-maintenance-scheduler-failed
Environment Variables
Required maintenance-related variables:
HLL_DB_MAINTENANCE_ENABLED=false
HLL_DB_MAINTENANCE_INTERVAL_SECONDS=43200
HLL_RECENT_MATCHES_KEEP=100
HLL_ADMIN_LOG_NONCRITICAL_RETENTION_DAYS=30
HLL_ADMIN_LOG_CRITICAL_RETENTION_DAYS=90
HLL_SERVER_SNAPSHOT_RETENTION_DAYS=14
HLL_DB_MAINTENANCE_BATCH_SIZE=5000
Meaning:
HLL_DB_MAINTENANCE_ENABLEDEnables scheduled apply mode insidehistorical-runner.HLL_DB_MAINTENANCE_INTERVAL_SECONDSDefault scheduler interval.43200means every 12 hours.HLL_RECENT_MATCHES_KEEPNumber of latest closed materialized matches that must always be protected.HLL_ADMIN_LOG_NONCRITICAL_RETENTION_DAYSRetention for non-critical AdminLog events such as chat/connect/disconnect.HLL_ADMIN_LOG_CRITICAL_RETENTION_DAYSRetention for critical AdminLog events such askill,match_start,match_end.HLL_SERVER_SNAPSHOT_RETENTION_DAYSRetention for live server snapshots.HLL_DB_MAINTENANCE_BATCH_SIZEDelete batch size for apply mode.
Protected Data
The cleanup command protects:
- latest 100 closed materialized matches by default;
- current month materialized matches;
- previous month materialized matches when the current day is
1through7; - current week materialized matches;
- previous week materialized matches when weekly fallback may still need them;
rcon_match_player_statsbelonging to protected matches;- current/live AdminLog data required for visible current-match surfaces;
displayed_historical_snapshots;- file snapshots in
backend/data/snapshots/.
If a match timestamp cannot be interpreted safely, that match is skipped and protected instead of deleted.
Deleted Data
Apply mode is currently allowed to delete:
server_snapshotsolder than retention;- non-critical
rcon_admin_log_eventsolder than retention; - critical
rcon_admin_log_eventsolder than retention only when they are not required by protected materialized match ranges; - non-protected
rcon_materialized_matches; - dependent
rcon_match_player_statsfor deleted matches.
Current critical AdminLog event types:
killmatch_startmatch_end
Dry-Run Command
From backend/:
python -m app.database_maintenance cleanup --dry-run
From the repository root with the backend package on PYTHONPATH:
$env:PYTHONPATH='backend'
python -m app.database_maintenance cleanup --dry-run
Inside Docker Compose:
docker compose exec backend python -m app.database_maintenance cleanup --dry-run
Useful dry-run options:
docker compose exec backend python -m app.database_maintenance cleanup --dry-run `
--recent-matches-keep 100 `
--admin-log-noncritical-retention-days 30 `
--admin-log-critical-retention-days 90 `
--server-snapshot-retention-days 14 `
--batch-size 5000
Dry-run is the safe preview path and should be reviewed before any production apply.
Apply Command
Local module execution:
python -m app.database_maintenance cleanup --apply
Docker Compose:
docker compose exec backend python -m app.database_maintenance cleanup --apply
One-off local validation with a fixed time anchor:
python -m app.database_maintenance cleanup --apply --now 2026-06-20T12:00:00Z
Optional maintenance vacuum/analyze:
python -m app.database_maintenance cleanup --apply --vacuum-analyze
Table-Size Audit SQL
select
schemaname,
relname as table_name,
pg_size_pretty(pg_total_relation_size(relid)) as total_size,
pg_size_pretty(pg_relation_size(relid)) as table_size,
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as indexes_size,
n_live_tup as estimated_rows,
n_dead_tup as estimated_dead_rows
from pg_stat_user_tables
order by pg_total_relation_size(relid) desc;
Row-Count And Age Audit SQL
AdminLog events by type/date
select
event_type,
count(*) as row_count,
min(event_timestamp) as first_event_timestamp,
max(event_timestamp) as last_event_timestamp,
min(server_time) as first_server_time,
max(server_time) as last_server_time
from rcon_admin_log_events
group by event_type
order by row_count desc, event_type asc;
Materialized matches by server/date
select
target_key,
source_basis,
count(*) as matches,
min(coalesce(ended_at, started_at)) as first_closed_at,
max(coalesce(ended_at, started_at)) as last_closed_at
from rcon_materialized_matches
group by target_key, source_basis
order by target_key asc, source_basis asc;
Server snapshots by date
select
server_id,
min(captured_at) as first_captured_at,
max(captured_at) as last_captured_at,
count(*) as snapshot_rows
from server_snapshots
group by server_id
order by last_captured_at desc;
Displayed snapshots count
select
snapshot_type,
metric,
snapshot_window,
count(*) as snapshot_rows,
min(generated_at) as first_generated_at,
max(generated_at) as last_generated_at
from displayed_historical_snapshots
group by snapshot_type, metric, snapshot_window
order by snapshot_type asc, metric asc, snapshot_window asc;
Logs To Inspect
The cleanup command emits JSON logs. Minimum events to look for:
database-maintenance-starteddatabase-maintenance-plandatabase-maintenance-table-skippeddatabase-maintenance-delete-batchdatabase-maintenance-completeddatabase-maintenance-error
Examples:
docker compose logs --tail=200 backend
docker compose logs --tail=200 historical-runner
If scheduled cleanup is enabled:
docker compose logs --tail=200 historical-runner
Docker And Portainer Warnings
- Never use
docker compose down -vunless you intentionally want to delete PostgreSQL and mounted volume data. - Always review dry-run output before enabling apply in production.
- Do not manually delete protected match or player-stat rows from PostgreSQL.
- Keep backups before changing retention settings.
- Do not add Comunidad Hispana #03 back into RCON targets in this task.
- Do not add a separate maintenance container, host cron, or
pg_cronjob for this feature.
For Portainer-style operations the same warning applies:
- deleting volumes is destructive;
- maintenance should run through the application command, not through manual table purges.
Rollback And Restore Considerations
- Retention changes are destructive when apply mode runs.
- Keep a PostgreSQL backup before enabling scheduled apply in production.
- If cleanup removes too much data, recovery is restore-based, not “undo last delete.”
- Favor dry-run, smaller batch sizes, and reviewed retention values before long-running scheduled apply.
Safe Operator Flow
- Audit table size and row ages with the SQL above.
- Run dry-run locally or in Compose.
- Review protected counts and candidate counts in JSON output.
- Enable
HLL_DB_MAINTENANCE_ENABLED=trueonly after dry-run review. - Monitor
historical-runnerlogs for scheduler events and cleanup completion.