Calibration receipts · OOS 2024+ · 12 yrs of public data

When you ask for a 95% band,
you get a 95% band.

Soothsayer's served band contains Monday's open at the rate it claims. Pyth's published 95% confidence interval contains it ~10% of the time. Chainlink doesn't publish a band at all on weekends. Verifiable on 12 years of public data — the calibration claim is the product.

Claimed coverage vs realised coverage · 1,720 OOS weekend × symbol observations · 2023+
Soothsayer — sweep across consumer τ
Pyth + naive ±1.96·conf — read as a 95% CI
Pyth + consumer-fit ±50·conf — when the consumer back-fits
Perfect calibration — y = x
Soothsayer at τ = 0.95
95.0%
Held-out 2023+ · Kupiec puc=1.000 · Christoffersen pind=0.485 · 1,720 obs.
Pyth + naive ±1.96·conf
10.2%
If you read Pyth's "95%" CI as a probability statement, it contains Monday's open ~1 time in 10. Pyth's docs explicitly call this an aggregation diagnostic, not a coverage claim.
Chainlink during weekend
no band
100% of weekend observations have bid ≈ 0, ask = 0 during marketStatus = 5. The "uncertainty signal" is binary.
The gap, every weekend

Pick a weekend. See who was right.

For each (symbol, weekend) in the 2024+ OOS slice with Pyth coverage, here's what each oracle said Friday afternoon, and where the price actually opened Monday. Soothsayer's band is calibrated to whatever target you ask for; Pyth's published CI (read at face value as 95%) is essentially zero-width vs the realised gap; Chainlink stale-holds Friday's close.

The receipts

Every weekend, every oracle, on the record.

All 265 (symbol, weekend) observations from the 2024+ OOS slice with Pyth coverage. Sortable; click any column header. The aggregate row above the table reports the inside-rate by oracle.

Weekend Symbol Realised move (bps) Soothsayer Pyth + 1.96·conf Pyth half-width (bps) Sooth half-width (bps)
Verify it yourself

Calibration transparency means no black box.

Every claim on this page re-derives from the public repository. The methodology evolution log records every decision, hypothesis, and rejected alternative. Methodology dashboard has the full ablation evidence: walk-forward stability, leave-one-out cross-asset transfer, window sensitivity, conformal-comparison bootstrap, more.