This commit is contained in:
devRaGonSa
2026-06-05 16:57:25 +02:00
commit 0da8338ba8
310 changed files with 45849 additions and 0 deletions

View File

@@ -0,0 +1,246 @@
# Monthly Player Ranking Data Audit
## Validation Date
- 2026-03-24
## Scope
Auditoria tecnica del estado real de datos para un futuro ranking mensual de
"mejores jugadores" usando:
- codigo y esquema historico del backend
- persistencia local en `backend/data/hll_vietnam_dev.sqlite3`
- snapshots historicos ya generados en `backend/data/snapshots/`
- discovery ya documentada de la fuente CRCON/scoreboard
No se implementa todavia ninguna formula de ranking, tabla nueva ni cambio de
UI.
## Evidence Reviewed
- `backend/app/historical_models.py`
- `backend/app/historical_storage.py`
- `backend/app/historical_ingestion.py`
- `backend/app/historical_snapshots.py`
- `backend/app/historical_snapshot_storage.py`
- `backend/app/payloads.py`
- `docs/historical-domain-model.md`
- `docs/historical-data-quality-notes.md`
- `docs/historical-crcon-source-discovery.md`
- `docs/historical-coverage-report.md`
## Current Persisted State
Local SQLite currently contains:
- `historical_servers`: `3`
- `historical_matches`: `9638`
- `historical_players`: `163506`
- `historical_player_match_stats`: `1062244`
- `historical_ingestion_runs`: `32`
Coverage visible in the local database today:
- `comunidad-hispana-01`: `8602` matches, from `2024-05-17T20:48:40Z` to `2026-03-23T16:01:20Z`
- `comunidad-hispana-02`: `753` matches, from `2025-11-04T17:10:19Z` to `2026-03-23T18:58:06Z`
- `comunidad-hispana-03`: `283` matches, from `2026-01-14T22:34:18Z` to `2026-03-08T18:11:52Z`
Important quality notes from the local dataset:
- all `historical_player_match_stats` rows have populated values for kills,
deaths, teamkills, time, KPM, KDA, combat, offense, defense, support, level
and team side
- `85,270 / 163,506` players have SteamID; the rest currently depend on
`crcon-player:*` identity, so identity continuity is usable but not equally
strong for every player
- all persisted matches have start/end timestamps, map and game mode
- `7,961 / 9,638` persisted matches currently have both allied/axis score
## What Is Persisted Today
### Match level
Persisted per match:
- server
- external match id
- creation/start/end timestamps
- map name, pretty name, game mode, image
- allied score
- axis score
Not persisted at match level:
- raw full CRCON JSON payload
- derived win/loss per player
- any tactical event ledger
### Player identity level
Persisted per player:
- stable player key
- display name
- SteamID when available
- source player id
- first seen / last seen
### Player per match level
Persisted per player-match row:
- level
- team side
- kills
- deaths
- teamkills
- time seconds
- kills per minute
- deaths per minute
- kill/death ratio
- combat
- offense
- defense
- support
## What Exists In CRCON Source But Is Not Persisted
The documented CRCON detail payload already exposes fields that the project does
not currently store:
- `kills_by_type`
- `kills_streak`
- `longest_life_secs`
- `shortest_life_secs`
- `most_killed`
- `death_by`
- `weapons`
- `death_by_weapons`
These fields are visible in the source discovery, but the current upsert logic
only persists the smaller normalized subset listed above.
## What Was Not Confirmed As Available
The current repository evidence does not confirm any stable source fields for:
- garrisons destroyed
- outposts destroyed
- direct duel history in a structured reusable form
- tactical actions such as node building, dismantling or commander abilities
For direct encounters, the source does expose `most_killed` and `death_by`, but
that is not the same thing as a complete duel graph and is not stored today.
## Availability And Reliability Matrix
| Metric / signal | Exists in source | Persisted today | Reliability for ranking | Extra work | V1? |
| --- | --- | --- | --- | --- | --- |
| Kills | Yes | Yes | High | None | Yes |
| Deaths | Yes | Yes | High | None | Yes |
| Support | Yes | Yes | High | None | Yes |
| Combat | Yes | Yes | Medium-High | Query only | Maybe |
| Offense | Yes | Yes | Medium-High | Query only | Maybe |
| Defense | Yes | Yes | Medium-High | Query only | Maybe |
| Teamkills | Yes | Yes | High as penalty signal | Query only | Maybe |
| Match count | Yes | Derivable | High | Query only | Yes |
| Time played | Yes | Yes | High | Query only | Yes |
| KPM | Yes | Yes | Medium-High if computed from totals, lower if averaging raw per-match KPM | Query only | Yes |
| KDA / KD ratio | Yes | Yes | Medium-High if computed from totals, lower if averaging raw per-match KDA | Query only | Yes |
| 100+ kill matches | Derivable | Exposed in leaderboard | Medium | None | No |
| Win/loss context | Partially | Derivable from team side + scores when scores exist | Medium | Query and validation | Maybe |
| Weapons profile | Yes | No | Medium-Low for V1 | New persistence/modeling | No |
| Kill streak / life metrics | Yes | No | Medium-Low for V1 | New persistence/modeling | No |
| Direct encounters / duels | Partial only | No | Low today | New extraction plus modeling | No |
| Garrisons destroyed | Not confirmed | No | Unknown | Source validation first | No |
| OPs destroyed | Not confirmed | No | Unknown | Source validation first | No |
| Tactical impact composite | Partial proxies only | Partial | Medium after design work | Query/design | No for strict V1 |
## Current Product Readiness
The backend is already able to expose monthly leaderboard snapshots, but only
for these metrics:
- `kills`
- `deaths`
- `support`
- `matches_over_100_kills`
This means:
- the project already supports a monthly ranking surface operationally
- the current ranking surface is narrower than the real data persisted in SQLite
- offense, defense, combat, KPM and KDA are available in the database but not
yet wired as first-class monthly leaderboard metrics
## Recommendation For Ranking V1
A realistic V1 should use only metrics already persisted with strong coverage
and low modeling risk:
- total kills
- total support
- KPM recomputed from `SUM(kills) / SUM(time_seconds)`
- KDA recomputed from `SUM(kills) / NULLIF(SUM(deaths), 0)`
- minimum participation gate based on matches played and/or minutes played
- optional small penalty for teamkills
Why this is the safest V1:
- no new ingestion is required
- all needed raw fields already exist locally
- the ranking can avoid inflated outliers by requiring minimum activity
- KPM and KDA become more defensible when derived from totals, not from average
of precomputed per-match ratios
## Recommendation For Ranking V2
A stronger V2 can expand the model with already persisted but not yet surfaced
signals:
- offense
- defense
- combat
- win/loss context derived from player side and match result when scores exist
V2 may also evaluate source-only fields if a later task decides to persist them:
- weapons-based detail
- kill streak and life-span signals
- partial rivalry/encounter signals from `most_killed` and `death_by`
## Metrics Not Recommended For Early Use
Not recommended for V1 and not yet defensible for a serious monthly ranking:
- garrisons destroyed
- OPs destroyed
- duel ranking
- generic "impact in match" as a single opaque score
Reason:
- either the source availability is not confirmed
- or the source exists but the project does not yet persist enough structure to
make the metric auditable and stable
## Final Conclusion
The repository already has enough persisted historical data for a credible
monthly Top 3 V1 without touching ingestion:
- kills
- support
- time played
- deaths
- teamkills
- offense
- defense
- combat
The most realistic first release is a constrained monthly ranking based on
volume plus efficiency, using only persisted fields and explicit participation
thresholds. Tactical metrics such as garrisons, OPs and real duel graphs should
stay out of scope until the source is revalidated and the missing structures are
persisted deliberately.