The empirical evidence behind the Oracle's calibration claim, computed on a held-out 2023+ slice (1,720 weekend × symbol observations, 172 weekends, 10 tickers). All numbers re-derivable from the public repo via scripts/run_calibration.py; methodology evolution at reports/methodology_history.md.
For every consumer target τ from 0.50 to 0.99 in 0.01 increments, we serve the deployed v2 / M5 Oracle on the OOS panel and compute realised coverage. A perfectly calibrated band lies on the diagonal. The four anchor points (0.68, 0.85, 0.95, 0.99) are where the deployed schedules C_BUMP_SCHEDULE and DELTA_SHIFT_SCHEDULE are tuned; everything else is linear-interpolation off-grid.
Pyth's published conf is documented as publisher dispersion, not probability of coverage. Read at face value (k = 1.96 → "95% Gaussian wrap"), Pyth covers ~10% of weekend Monday opens. To match Soothsayer's 95% claim, a consumer needs to scale Pyth's conf by ≈50× — a calibration the consumer must construct themselves. Chainlink during marketStatus = 5 publishes a band of zero width.
Per-symbol calibration evidence on the full panel (5,986 weekends, 12 years). The leave-one-out validation (D7) refits the calibration surface on the other 9 symbols and serves the held-out one through the pooled-fallback path — the production code path for tickers with sparse history. The mechanism transfers.
Window size, train-test split timing, and held-out symbols all produce coverage within ±3pp of the deployed value. Window=156 is the only choice that simultaneously passes Kupiec at α=0.05 on all three targets — empirically defensible, not arbitrary.
Every claim on this dashboard re-derives from the public repository. The methodology evolution log records every decision, hypothesis, and rejected alternative. The paper drafts under reports/paper1_coverage_inversion/ contain the formal version of these claims, currently being prepared for arXiv (q-fin.RM) and ACM AFT.