METHODOLOGY
Short version: EWA measures whether a player's minutes moved his team closer to winning after accounting for who else was on the floor. We test that by hiding future seasons, training on older games, and checking whether EWA beats simpler baselines. The friendlier overview is on /about.
EWA asks a simple question: when this player was on the floor, did his possessions move his team closer to winning after accounting for teammates and opponents?
For each season, we trained the prediction layer only on older games, hid the next season, and checked whether EWA beat simpler baselines.
Across four seasons, EWA picked winners better than a model that only knows team strength. Its probabilities were also better in every fold.
Vegas averaged 67.7% accuracy; EWA averaged 59.2%. That gap is expected because markets use injuries, line movement, and sharp action.
In plain English: EWA picked winners +3.40 percentage points better than team-only on average across four seasons, and the direction was positive in every fold. The stricter probability grades also improved: Brier −3.58% (4/4 folds CI-exclude zero), log-loss −2.66% (4/4), and margin error −1.59% (3/4). Market odds averaged 67.7% accuracy, so EWA is useful signal, not a Vegas substitute.
This is the audit version of the sentence above. The 2024-25 fold (n_train = 5,822, n_test = 401) is shown as a representative slice. Brier and log-loss are stricter ways to grade probabilities: lower is better. Margin RMSE grades the spread: lower is better. Accuracy is the simple "did it pick the winner?" number: higher is better. Bracketed numbers are 95% bootstrap CIs.
| Model | Brier | Accuracy | Margin RMSE |
|---|---|---|---|
| Naive (50/50) | 0.2500 [0.250, 0.250] | 50% expected — | 15.75 [14.7, 16.7] |
| Home court only | 0.2456 [0.241, 0.251] | 56.9% [51.9, 61.6] | 15.58 [14.5, 16.5] |
| Team identity (no players) | 0.2451 [0.240, 0.251] | 58.1% [53.4, 62.8] | 15.56 [14.5, 16.6] |
| EWA (roster-aware) | 0.2365 [0.228, 0.244] | 59.4% [54.4, 64.3] | 15.31 [14.2, 16.3] |
| Market (Vegas, de-vigged) | 0.2011 [0.184, 0.218] | 67.3% [62.6, 72.1] | N/A — |
Market is included as benchmark/context. The accuracy gap (~8 pp pooled across folds) reflects information EWA does not use — line movement, sharp action, real-time injuries. We don't try to close it on this page.
Across the 4 folds, here's how often EWA's improvement over team-only is statistically distinguishable from zero. CIs that exclude zero are paired-bootstrap CIs computed within each individual fold (1,000 resamples, n ≈ 400-440 per fold).
Each row is an independent chronological fold: train strictly on games from prior seasons, test on one season's odds-matched games. The pattern holds across all four cutoffs — Brier and log-loss CI-exclude zero in 4/4 folds, margin RMSE in 3/4. Same direction, same approximate magnitude, every time.
| Test | n_train | n_test | EWA acc | Mkt acc | Δ Brier | Δ Log-loss | Δ RMSE |
|---|---|---|---|---|---|---|---|
| 2021-22 | 2,136 | 417 | 59.2% | 69.8% | +3.95% ✓ | +2.88% ✓ | +1.82% ✓ |
| 2022-23 | 3,366 | 404 | 60.9% | 64.4% | +3.11% ✓ | +2.34% ✓ | +1.23% ✗ |
| 2023-24 | 4,593 | 438 | 57.1% | 69.2% | +3.73% ✓ | +2.79% ✓ | +1.71% ✓ |
| 2024-25 | 5,822 | 401 | 59.4% | 67.3% | +3.51% ✓ | +2.63% ✓ | +1.61% ✓ |
✓ marks deltas whose 95% CI excludes zero within that fold. The single-table comparison above (5 models, 2024-25 fold) is the most-recent fold; the other three cutoffs show the same shape. The EWA-aware improvement is not a single-cutoff artifact.
We use a roster-aware recent-usage aggregate, defaulting to each team's last 30 games. Sensitivity checks across 15 / 30 / 45 / 60 games show the EWA signal is strongest in recent windows and fades as older roster usage is included — consistent with roster drift over time. The default of 30 was set as a disciplined mid-window value, not because it dominates any single metric.
| N | EWA Brier | Δ Brier | Δ Log-loss | Δ Margin RMSE |
|---|---|---|---|---|
| 15 | 0.2438 | +2.81% ✓ | +2.11% ✓ | +1.24% ✓ |
| 30 (default) | 0.2449 | +2.39% ✓ | +1.79% ✓ | +0.97% ✓ |
| 45 | 0.2473 | +1.44% ✗ | +1.08% ✗ | +0.62% ✓ |
| 60 | 0.2478 | +1.24% ✗ | +0.93% ✗ | +0.55% ✓ |
✓ marks deltas whose 95% CI excludes zero. The story is robust across recent windows, especially 15 ≤ N ≤ 30: three of four metrics (Brier, log-loss, margin RMSE) are statistically distinguishable. At N = 45 and 60 the aggregate grows stale and only margin RMSE remains significant. We publish at the default window rather than the best-on-test window.
When EWA says a team has a 65% chance to win, do they actually win about 65% of the time? Each dot below is a probability bin from the held-out games — predicted on the x-axis, actual win rate on the y-axis. Perfect calibration is the dashed diagonal. Dot size shows games per bin.
Central bins are the populated ones in this fold (n = 93, 181, 128, 22). Calibration drifts a little at the high end on this 438-game test set — fewer games per bin means more sampling noise. We treat calibration as a property to monitor across runs, not a single number.
The simplest impact stat is raw plus-minus — point differential while a player is on the court. It looks honest and breaks immediately. In recent seasons, players like Payton Pritchard and Luke Kornet have posted higher raw on-court plus-minus than Stephen Curry, Giannis Antetokounmpo, and Luka Dončić. Not because they generate more impact — because they happen to share the floor with stars on winning teams.
Ridge regression with player-level controls is what fixes this. EWA splits credit in a way that controls for teammates and opponents, so a strong rotation player on a great team doesn't inherit his teammates' impact. That's the attribution layer. Shrinkage then ensures small-sample players don't ride a hot streak to the top of the rankings.
Nikola Jokić's rate over the last three seasons is +8.16 EWA / 100 possessions. Decomposed by role, 84% of that comes from assisting — not scoring, not rebounding. His best pair with Jamal Murray adds +1.4 wins added together; strong, but they underperform what you'd expect from stacking their individual numbers. That's the kind of read no box score or single-number metric gives you.
A sequence model trained on play-by-play estimates win probability after every event. The change in win probability across each possession (WPA) is the unit of credit.
A regularized regression splits each possession’s WPA across the ten players on court while controlling for teammates, opponents, and home court. This is the regularized adjusted plus-minus tradition (Sill 2010), with role-aware interactions added on top.
Players with few possessions get pulled toward the population mean by both a count-based shrinkage (count / (count + k)) and an Empirical Bayes step. This is what keeps a 100-possession rookie from showing up next to Jokić on the leaderboard.
For game prediction, per-team EWA aggregates use each team's most recent 30 train games — not a static average across the whole training period. This keeps the predictor honest about mid-season trades and roster turnover.
“Estimated Wins Added” was used in the early 2000s by John Hollinger as a linear function of PER: (PER − 11) × Minutes / 67. That formulation is no longer maintained and was box-score derived, with no possession context, no role decomposition, and no shrinkage.
The version of EWA on Alleygorithm shares the goal — per-game wins added above the league baseline — but the methodology is fundamentally different. Ridge regression on possession-level win-probability change (WPA), with role-aware decomposition and empirical-Bayes shrinkage. Same destination, modern math. Treat the acronym the way the field treats “WAR”: a category, multiple flavors, judged on methodology and out-of-sample performance.
EWA isn't a new technique. It's an honest reassembly of established methods with a transparent validation harness on top.
Regularized adjusted plus-minus via ridge regression. The base technique behind EWA’s attribution layer.
Possession-level win-probability swings as a credit signal. EWA inherits this framing rather than the raw point-differential one.
Statistical / Box Plus-Minus. Where role and box-stat information enter as priors. EWA’s role-aware interactions are in this tradition.
The two strongest public predictive metrics. EWA borrows their commitment to chronological holdout testing and roster-aware aggregation.
Reading these openly is the price of asking you to trust the rest. Every limitation below is on the roadmap and labeled in our internal validation reports.
The validation code is open and runnable. The numbers above came from scripts/validate_pregame_prediction.py with --recent-games-per-team 30 on a chronological holdout. The window-sensitivity sweep ran via scripts/sweep_recent_games_window.sh. The attribution math lives in unified_scores.py.
scripts/validate_pregame_prediction.pyEngine + ridge attribution: unified_scores.pySweep ridge alpha (2,500 / 5,000 / 7,500 / 10,000) and bootstrap seeds across the 4 rolling-origin folds. Demonstrates the result is not a single-hyperparameter or single-seed artifact.
Replace per-team possession averages with per-player rolling minute estimates. Closes part of the gap to EPM/DARKO's richer minute models.
Counterfactual calculator: "if Player X is out, the lineup loses N wins added." The most direct expression of EWA's player-level attribution and the natural foundation for a paid analytics tier.
Daily-refreshed pregame projections that incorporate the day's active rosters and inactives. Today's harness uses recent training data; the live layer uses recent live data.
Retrain the win-probability model with a strict cutoff before each test window so the WPA labels themselves are leakage-free. The current harness uses the production WP model and discloses that limitation; this closes it.
Plus/minus measures point differential while you're on court. EWA measures how much each possession changed win probability — weighting high-leverage moments more — and then splits credit fairly via ridge regression. Plus/minus conflates your impact with your teammates'.
EWA captures context. A star on a dominant team faces fewer high-leverage possessions because the game state is already stable. The public scores also apply shrinkage, so lower-volume players get pulled toward the middle.
Score artifacts refresh on a daily cadence; the underlying win-probability model is retrained on a slower review cycle. The footer shows the most recent promoted run currently being served.
Market is a fifth column in our validation table — we have multi-season de-vigged moneylines for 1,954 NBA games matched cleanly to game IDs. Across all 4 rolling-origin folds, market accuracy averages ~67.7%; roster-aware EWA averages ~59.2%. The ~8 pp gap is real and reflects information markets have that we don't (sharp action, line movement, real-time injuries). We report it as a benchmark, not a target.
Those are the four most recent NBA seasons where we have both play-by-play data and de-vigged pregame moneylines, and where each fold has a strictly older training set available. The pattern (Brier and log-loss CIs excluding zero, margin RMSE excluding zero in 3/4) holds across every fold tested.
Yes — that's what /predictions is. Every game we predict, you can see what the model said and (after the game) whether it called the winner. Across the four published rolling-origin folds, EWA accuracy averages 59.2%; the de-vigged Vegas market averages 67.7%. EWA beats team-only baselines but doesn't approach the market — Vegas has information we don't (sharp action, line movement, real-time injuries). The page tracks the model's live record so you can see exactly how it's doing.