Fix
This commit is contained in:
246
docs/monthly-player-ranking-data-audit.md
Normal file
246
docs/monthly-player-ranking-data-audit.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# Monthly Player Ranking Data Audit
|
||||
|
||||
## Validation Date
|
||||
|
||||
- 2026-03-24
|
||||
|
||||
## Scope
|
||||
|
||||
Auditoria tecnica del estado real de datos para un futuro ranking mensual de
|
||||
"mejores jugadores" usando:
|
||||
|
||||
- codigo y esquema historico del backend
|
||||
- persistencia local en `backend/data/hll_vietnam_dev.sqlite3`
|
||||
- snapshots historicos ya generados en `backend/data/snapshots/`
|
||||
- discovery ya documentada de la fuente CRCON/scoreboard
|
||||
|
||||
No se implementa todavia ninguna formula de ranking, tabla nueva ni cambio de
|
||||
UI.
|
||||
|
||||
## Evidence Reviewed
|
||||
|
||||
- `backend/app/historical_models.py`
|
||||
- `backend/app/historical_storage.py`
|
||||
- `backend/app/historical_ingestion.py`
|
||||
- `backend/app/historical_snapshots.py`
|
||||
- `backend/app/historical_snapshot_storage.py`
|
||||
- `backend/app/payloads.py`
|
||||
- `docs/historical-domain-model.md`
|
||||
- `docs/historical-data-quality-notes.md`
|
||||
- `docs/historical-crcon-source-discovery.md`
|
||||
- `docs/historical-coverage-report.md`
|
||||
|
||||
## Current Persisted State
|
||||
|
||||
Local SQLite currently contains:
|
||||
|
||||
- `historical_servers`: `3`
|
||||
- `historical_matches`: `9638`
|
||||
- `historical_players`: `163506`
|
||||
- `historical_player_match_stats`: `1062244`
|
||||
- `historical_ingestion_runs`: `32`
|
||||
|
||||
Coverage visible in the local database today:
|
||||
|
||||
- `comunidad-hispana-01`: `8602` matches, from `2024-05-17T20:48:40Z` to `2026-03-23T16:01:20Z`
|
||||
- `comunidad-hispana-02`: `753` matches, from `2025-11-04T17:10:19Z` to `2026-03-23T18:58:06Z`
|
||||
- `comunidad-hispana-03`: `283` matches, from `2026-01-14T22:34:18Z` to `2026-03-08T18:11:52Z`
|
||||
|
||||
Important quality notes from the local dataset:
|
||||
|
||||
- all `historical_player_match_stats` rows have populated values for kills,
|
||||
deaths, teamkills, time, KPM, KDA, combat, offense, defense, support, level
|
||||
and team side
|
||||
- `85,270 / 163,506` players have SteamID; the rest currently depend on
|
||||
`crcon-player:*` identity, so identity continuity is usable but not equally
|
||||
strong for every player
|
||||
- all persisted matches have start/end timestamps, map and game mode
|
||||
- `7,961 / 9,638` persisted matches currently have both allied/axis score
|
||||
|
||||
## What Is Persisted Today
|
||||
|
||||
### Match level
|
||||
|
||||
Persisted per match:
|
||||
|
||||
- server
|
||||
- external match id
|
||||
- creation/start/end timestamps
|
||||
- map name, pretty name, game mode, image
|
||||
- allied score
|
||||
- axis score
|
||||
|
||||
Not persisted at match level:
|
||||
|
||||
- raw full CRCON JSON payload
|
||||
- derived win/loss per player
|
||||
- any tactical event ledger
|
||||
|
||||
### Player identity level
|
||||
|
||||
Persisted per player:
|
||||
|
||||
- stable player key
|
||||
- display name
|
||||
- SteamID when available
|
||||
- source player id
|
||||
- first seen / last seen
|
||||
|
||||
### Player per match level
|
||||
|
||||
Persisted per player-match row:
|
||||
|
||||
- level
|
||||
- team side
|
||||
- kills
|
||||
- deaths
|
||||
- teamkills
|
||||
- time seconds
|
||||
- kills per minute
|
||||
- deaths per minute
|
||||
- kill/death ratio
|
||||
- combat
|
||||
- offense
|
||||
- defense
|
||||
- support
|
||||
|
||||
## What Exists In CRCON Source But Is Not Persisted
|
||||
|
||||
The documented CRCON detail payload already exposes fields that the project does
|
||||
not currently store:
|
||||
|
||||
- `kills_by_type`
|
||||
- `kills_streak`
|
||||
- `longest_life_secs`
|
||||
- `shortest_life_secs`
|
||||
- `most_killed`
|
||||
- `death_by`
|
||||
- `weapons`
|
||||
- `death_by_weapons`
|
||||
|
||||
These fields are visible in the source discovery, but the current upsert logic
|
||||
only persists the smaller normalized subset listed above.
|
||||
|
||||
## What Was Not Confirmed As Available
|
||||
|
||||
The current repository evidence does not confirm any stable source fields for:
|
||||
|
||||
- garrisons destroyed
|
||||
- outposts destroyed
|
||||
- direct duel history in a structured reusable form
|
||||
- tactical actions such as node building, dismantling or commander abilities
|
||||
|
||||
For direct encounters, the source does expose `most_killed` and `death_by`, but
|
||||
that is not the same thing as a complete duel graph and is not stored today.
|
||||
|
||||
## Availability And Reliability Matrix
|
||||
|
||||
| Metric / signal | Exists in source | Persisted today | Reliability for ranking | Extra work | V1? |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| Kills | Yes | Yes | High | None | Yes |
|
||||
| Deaths | Yes | Yes | High | None | Yes |
|
||||
| Support | Yes | Yes | High | None | Yes |
|
||||
| Combat | Yes | Yes | Medium-High | Query only | Maybe |
|
||||
| Offense | Yes | Yes | Medium-High | Query only | Maybe |
|
||||
| Defense | Yes | Yes | Medium-High | Query only | Maybe |
|
||||
| Teamkills | Yes | Yes | High as penalty signal | Query only | Maybe |
|
||||
| Match count | Yes | Derivable | High | Query only | Yes |
|
||||
| Time played | Yes | Yes | High | Query only | Yes |
|
||||
| KPM | Yes | Yes | Medium-High if computed from totals, lower if averaging raw per-match KPM | Query only | Yes |
|
||||
| KDA / KD ratio | Yes | Yes | Medium-High if computed from totals, lower if averaging raw per-match KDA | Query only | Yes |
|
||||
| 100+ kill matches | Derivable | Exposed in leaderboard | Medium | None | No |
|
||||
| Win/loss context | Partially | Derivable from team side + scores when scores exist | Medium | Query and validation | Maybe |
|
||||
| Weapons profile | Yes | No | Medium-Low for V1 | New persistence/modeling | No |
|
||||
| Kill streak / life metrics | Yes | No | Medium-Low for V1 | New persistence/modeling | No |
|
||||
| Direct encounters / duels | Partial only | No | Low today | New extraction plus modeling | No |
|
||||
| Garrisons destroyed | Not confirmed | No | Unknown | Source validation first | No |
|
||||
| OPs destroyed | Not confirmed | No | Unknown | Source validation first | No |
|
||||
| Tactical impact composite | Partial proxies only | Partial | Medium after design work | Query/design | No for strict V1 |
|
||||
|
||||
## Current Product Readiness
|
||||
|
||||
The backend is already able to expose monthly leaderboard snapshots, but only
|
||||
for these metrics:
|
||||
|
||||
- `kills`
|
||||
- `deaths`
|
||||
- `support`
|
||||
- `matches_over_100_kills`
|
||||
|
||||
This means:
|
||||
|
||||
- the project already supports a monthly ranking surface operationally
|
||||
- the current ranking surface is narrower than the real data persisted in SQLite
|
||||
- offense, defense, combat, KPM and KDA are available in the database but not
|
||||
yet wired as first-class monthly leaderboard metrics
|
||||
|
||||
## Recommendation For Ranking V1
|
||||
|
||||
A realistic V1 should use only metrics already persisted with strong coverage
|
||||
and low modeling risk:
|
||||
|
||||
- total kills
|
||||
- total support
|
||||
- KPM recomputed from `SUM(kills) / SUM(time_seconds)`
|
||||
- KDA recomputed from `SUM(kills) / NULLIF(SUM(deaths), 0)`
|
||||
- minimum participation gate based on matches played and/or minutes played
|
||||
- optional small penalty for teamkills
|
||||
|
||||
Why this is the safest V1:
|
||||
|
||||
- no new ingestion is required
|
||||
- all needed raw fields already exist locally
|
||||
- the ranking can avoid inflated outliers by requiring minimum activity
|
||||
- KPM and KDA become more defensible when derived from totals, not from average
|
||||
of precomputed per-match ratios
|
||||
|
||||
## Recommendation For Ranking V2
|
||||
|
||||
A stronger V2 can expand the model with already persisted but not yet surfaced
|
||||
signals:
|
||||
|
||||
- offense
|
||||
- defense
|
||||
- combat
|
||||
- win/loss context derived from player side and match result when scores exist
|
||||
|
||||
V2 may also evaluate source-only fields if a later task decides to persist them:
|
||||
|
||||
- weapons-based detail
|
||||
- kill streak and life-span signals
|
||||
- partial rivalry/encounter signals from `most_killed` and `death_by`
|
||||
|
||||
## Metrics Not Recommended For Early Use
|
||||
|
||||
Not recommended for V1 and not yet defensible for a serious monthly ranking:
|
||||
|
||||
- garrisons destroyed
|
||||
- OPs destroyed
|
||||
- duel ranking
|
||||
- generic "impact in match" as a single opaque score
|
||||
|
||||
Reason:
|
||||
|
||||
- either the source availability is not confirmed
|
||||
- or the source exists but the project does not yet persist enough structure to
|
||||
make the metric auditable and stable
|
||||
|
||||
## Final Conclusion
|
||||
|
||||
The repository already has enough persisted historical data for a credible
|
||||
monthly Top 3 V1 without touching ingestion:
|
||||
|
||||
- kills
|
||||
- support
|
||||
- time played
|
||||
- deaths
|
||||
- teamkills
|
||||
- offense
|
||||
- defense
|
||||
- combat
|
||||
|
||||
The most realistic first release is a constrained monthly ranking based on
|
||||
volume plus efficiency, using only persisted fields and explicit participation
|
||||
thresholds. Tactical metrics such as garrisons, OPs and real duel graphs should
|
||||
stay out of scope until the source is revalidated and the missing structures are
|
||||
persisted deliberately.
|
||||
Reference in New Issue
Block a user