Forecast diagnostics - brave_fresh (honest)

26 resolved events (14 binary, 12 multi) - hand-run snapshot - numbers from evaluation/brier.py

binary Brier

0.118

multiclass Brier

0.505

ECE (calibration error)

0.226

Reliability diagram - dot on the line = calibrated; above = under-confident; below = over-confident

Murphy decomposition (binary): reliability 0.1112 (lower=better) - resolution 0.1760 (higher=better) - uncertainty 0.2296

confidence	n	Brier	winner p
0.5-0.6	8	0.261	0.449
0.6-0.7	1	0.470	0.315
0.7-0.8	0
0.8-0.9	7	0.025	0.846
0.9-1.0	3	0.007	0.617

category	n	binary Brier	multi Brier	mean conf
Elections	3	0.171	0.341	0.802
Sports	16	0.141	0.428	0.607
Entertainment	4	0.064	0.987	0.382
Politics	3	0.014	0.436	0.527

kind	n	binary Brier	multi Brier	winner p
binary	14	0.170	0.340	0.652
multi	12	0.057	0.698	0.327