Calibration overlay - 5 models on the same 26 events
What you're looking at: reliability diagram for each model. The diagonal is perfect calibration. A model whose line sits on the diagonal predicts probabilities that match empirical hit rates; deviation up = under-confident, deviation down = over-confident. Bubble size = number of events in that probability bin.
Reliability diagram (Murphy 1973) for each model variant on the 26-event resolved set. Five probability bins on the x-axis (0.0-0.2, 0.2-0.4, ..., 0.8-1.0); empirical hit rate on the y-axis; bubble size = number of events in that bin. Perfectly calibrated = points on the y=x diagonal.