Fix

2026-06-05 16:57:25 +02:00
commit 0da8338ba8
310 changed files with 45849 additions and 0 deletions
--- a/docs/monthly-player-ranking-data-audit.md
+++ b/docs/monthly-player-ranking-data-audit.md
@@ -0,0 +1,246 @@
+# Monthly Player Ranking Data Audit
+
+## Validation Date
+
+- 2026-03-24
+
+## Scope
+
+Auditoria tecnica del estado real de datos para un futuro ranking mensual de
+"mejores jugadores" usando:
+
+- codigo y esquema historico del backend
+- persistencia local en `backend/data/hll_vietnam_dev.sqlite3`
+- snapshots historicos ya generados en `backend/data/snapshots/`
+- discovery ya documentada de la fuente CRCON/scoreboard
+
+No se implementa todavia ninguna formula de ranking, tabla nueva ni cambio de
+UI.
+
+## Evidence Reviewed
+
+- `backend/app/historical_models.py`
+- `backend/app/historical_storage.py`
+- `backend/app/historical_ingestion.py`
+- `backend/app/historical_snapshots.py`
+- `backend/app/historical_snapshot_storage.py`
+- `backend/app/payloads.py`
+- `docs/historical-domain-model.md`
+- `docs/historical-data-quality-notes.md`
+- `docs/historical-crcon-source-discovery.md`
+- `docs/historical-coverage-report.md`
+
+## Current Persisted State
+
+Local SQLite currently contains:
+
+- `historical_servers`: `3`
+- `historical_matches`: `9638`
+- `historical_players`: `163506`
+- `historical_player_match_stats`: `1062244`
+- `historical_ingestion_runs`: `32`
+
+Coverage visible in the local database today:
+
+- `comunidad-hispana-01`: `8602` matches, from `2024-05-17T20:48:40Z` to `2026-03-23T16:01:20Z`
+- `comunidad-hispana-02`: `753` matches, from `2025-11-04T17:10:19Z` to `2026-03-23T18:58:06Z`
+- `comunidad-hispana-03`: `283` matches, from `2026-01-14T22:34:18Z` to `2026-03-08T18:11:52Z`
+
+Important quality notes from the local dataset:
+
+- all `historical_player_match_stats` rows have populated values for kills,
+  deaths, teamkills, time, KPM, KDA, combat, offense, defense, support, level
+  and team side
+- `85,270 / 163,506` players have SteamID; the rest currently depend on
+  `crcon-player:*` identity, so identity continuity is usable but not equally
+  strong for every player
+- all persisted matches have start/end timestamps, map and game mode
+- `7,961 / 9,638` persisted matches currently have both allied/axis score
+
+## What Is Persisted Today
+
+### Match level
+
+Persisted per match:
+
+- server
+- external match id
+- creation/start/end timestamps
+- map name, pretty name, game mode, image
+- allied score
+- axis score
+
+Not persisted at match level:
+
+- raw full CRCON JSON payload
+- derived win/loss per player
+- any tactical event ledger
+
+### Player identity level
+
+Persisted per player:
+
+- stable player key
+- display name
+- SteamID when available
+- source player id
+- first seen / last seen
+
+### Player per match level
+
+Persisted per player-match row:
+
+- level
+- team side
+- kills
+- deaths
+- teamkills
+- time seconds
+- kills per minute
+- deaths per minute
+- kill/death ratio
+- combat
+- offense
+- defense
+- support
+
+## What Exists In CRCON Source But Is Not Persisted
+
+The documented CRCON detail payload already exposes fields that the project does
+not currently store:
+
+- `kills_by_type`
+- `kills_streak`
+- `longest_life_secs`
+- `shortest_life_secs`
+- `most_killed`
+- `death_by`
+- `weapons`
+- `death_by_weapons`
+
+These fields are visible in the source discovery, but the current upsert logic
+only persists the smaller normalized subset listed above.
+
+## What Was Not Confirmed As Available
+
+The current repository evidence does not confirm any stable source fields for:
+
+- garrisons destroyed
+- outposts destroyed
+- direct duel history in a structured reusable form
+- tactical actions such as node building, dismantling or commander abilities
+
+For direct encounters, the source does expose `most_killed` and `death_by`, but
+that is not the same thing as a complete duel graph and is not stored today.
+
+## Availability And Reliability Matrix
+
+| Metric / signal | Exists in source | Persisted today | Reliability for ranking | Extra work | V1? |
+| --- | --- | --- | --- | --- | --- |
+| Kills | Yes | Yes | High | None | Yes |
+| Deaths | Yes | Yes | High | None | Yes |
+| Support | Yes | Yes | High | None | Yes |
+| Combat | Yes | Yes | Medium-High | Query only | Maybe |
+| Offense | Yes | Yes | Medium-High | Query only | Maybe |
+| Defense | Yes | Yes | Medium-High | Query only | Maybe |
+| Teamkills | Yes | Yes | High as penalty signal | Query only | Maybe |
+| Match count | Yes | Derivable | High | Query only | Yes |
+| Time played | Yes | Yes | High | Query only | Yes |
+| KPM | Yes | Yes | Medium-High if computed from totals, lower if averaging raw per-match KPM | Query only | Yes |
+| KDA / KD ratio | Yes | Yes | Medium-High if computed from totals, lower if averaging raw per-match KDA | Query only | Yes |
+| 100+ kill matches | Derivable | Exposed in leaderboard | Medium | None | No |
+| Win/loss context | Partially | Derivable from team side + scores when scores exist | Medium | Query and validation | Maybe |
+| Weapons profile | Yes | No | Medium-Low for V1 | New persistence/modeling | No |
+| Kill streak / life metrics | Yes | No | Medium-Low for V1 | New persistence/modeling | No |
+| Direct encounters / duels | Partial only | No | Low today | New extraction plus modeling | No |
+| Garrisons destroyed | Not confirmed | No | Unknown | Source validation first | No |
+| OPs destroyed | Not confirmed | No | Unknown | Source validation first | No |
+| Tactical impact composite | Partial proxies only | Partial | Medium after design work | Query/design | No for strict V1 |
+
+## Current Product Readiness
+
+The backend is already able to expose monthly leaderboard snapshots, but only
+for these metrics:
+
+- `kills`
+- `deaths`
+- `support`
+- `matches_over_100_kills`
+
+This means:
+
+- the project already supports a monthly ranking surface operationally
+- the current ranking surface is narrower than the real data persisted in SQLite
+- offense, defense, combat, KPM and KDA are available in the database but not
+  yet wired as first-class monthly leaderboard metrics
+
+## Recommendation For Ranking V1
+
+A realistic V1 should use only metrics already persisted with strong coverage
+and low modeling risk:
+
+- total kills
+- total support
+- KPM recomputed from `SUM(kills) / SUM(time_seconds)`
+- KDA recomputed from `SUM(kills) / NULLIF(SUM(deaths), 0)`
+- minimum participation gate based on matches played and/or minutes played
+- optional small penalty for teamkills
+
+Why this is the safest V1:
+
+- no new ingestion is required
+- all needed raw fields already exist locally
+- the ranking can avoid inflated outliers by requiring minimum activity
+- KPM and KDA become more defensible when derived from totals, not from average
+  of precomputed per-match ratios
+
+## Recommendation For Ranking V2
+
+A stronger V2 can expand the model with already persisted but not yet surfaced
+signals:
+
+- offense
+- defense
+- combat
+- win/loss context derived from player side and match result when scores exist
+
+V2 may also evaluate source-only fields if a later task decides to persist them:
+
+- weapons-based detail
+- kill streak and life-span signals
+- partial rivalry/encounter signals from `most_killed` and `death_by`
+
+## Metrics Not Recommended For Early Use
+
+Not recommended for V1 and not yet defensible for a serious monthly ranking:
+
+- garrisons destroyed
+- OPs destroyed
+- duel ranking
+- generic "impact in match" as a single opaque score
+
+Reason:
+
+- either the source availability is not confirmed
+- or the source exists but the project does not yet persist enough structure to
+  make the metric auditable and stable
+
+## Final Conclusion
+
+The repository already has enough persisted historical data for a credible
+monthly Top 3 V1 without touching ingestion:
+
+- kills
+- support
+- time played
+- deaths
+- teamkills
+- offense
+- defense
+- combat
+
+The most realistic first release is a constrained monthly ranking based on
+volume plus efficiency, using only persisted fields and explicit participation
+thresholds. Tactical metrics such as garrisons, OPs and real duel graphs should
+stay out of scope until the source is revalidated and the missing structures are
+persisted deliberately.