Forecast diagnostics - brave_fresh (honest)

26 resolved events (14 binary, 12 multi) - hand-run snapshot - numbers from evaluation/brier.py

binary Brier
0.118
multiclass Brier
0.505
ECE (calibration error)
0.226

Reliability diagram - dot on the line = calibrated; above = under-confident; below = over-confident

predicted probability -> empirical frequency ->

Murphy decomposition (binary): reliability 0.1112 (lower=better) - resolution 0.1760 (higher=better) - uncertainty 0.2296

confidencenBrierwinner p
0.5-0.680.2610.449
0.6-0.710.4700.315
0.7-0.80
0.8-0.970.0250.846
0.9-1.030.0070.617

By category

categorynbinary Briermulti Briermean conf
Elections30.1710.3410.802
Sports160.1410.4280.607
Entertainment40.0640.9870.382
Politics30.0140.4360.527

Binary vs multi

kindnbinary Briermulti Brierwinner p
binary140.1700.3400.652
multi120.0570.6980.327