Team CanadaHacks · Prophet Hacks forecasting track · snapshot 2026-05-19 · hand-maintained
forecast_agent_server.py LIVEFastAPI endpoint on Railway: POST /predict, health routes, dashboards, observatory, traces. The deployed surface.
forecast_track.py ENGINEAll forecaster variants, Brave retrieval, source ranking, longshot guard, event-semantics router. Production calls predict_multi_outcome_retrieval.
forecaster.py · agent.pySkeleton forecast helpers and the trading-track agent. Not on the live forecast path; kept for reference.
risk.py, market_filter.py, logger.py, chat_completions_adapter.py - sizing, filtering, logging, OpenAI-compatible shim.
| variant | what it is | status |
|---|---|---|
| multi_outcome_retrieval | Brave evidence + Opus 4.7 + market-anchor prompt + longshot guard + top-K classifier | PRODUCTION |
| uniform_prior | deterministic 1/N, no model call | baseline |
| opus_47 / 46 / gpt55 / gpt52 / single_llm | single-model forecasters across vendors | ablation |
| ensemble_logit / leaderboard | logit-mean across models / leaderboard ensemble | ablation |
| multi_outcome / sc3 | per-outcome forecast / self-consistency k=3 | ablation |
| hybrid_routed | binary to GPT-5.5, multi to multi_outcome | research |
| multi_outcome_retrieval_sae | empirical-Bayes shrinkage over (domain x horizon x price) | negative* |
| experiment | verdict |
|---|---|
| Search-provider bake-off (leakage axis) - today | retrieval leakage = 3.1x inflation; honest 0.118 |
| Top-K event-semantics classifier | shipped a1f899b0 |
| Opus 4.7 + market anchor; 0.10 longshot floor cap | shipped |
| Multi-vendor swap (Gemini / GPT-5.5 / GPT-5.2) | keep Opus 4.7 |
| Retrieval count sweep 3 / 5 / 8 | failed gate |
| Verification prompt / self-critique / ensemble-of-6 | regressed |
| Abstain-to-market (price from snippets) | no-op (0/26 had price) |
| SAE shrinkage | 0.1157 - re-open vs honest* |
| Subset-1200 scale validation | 0.1224 honest, n=1200 |
forecast_track.py / forecast_agent_server.py / data/resolved.json.docs/HANDOFF.md - cold-start state + work lanes
docs/DECISIONS.md - append-only log (newest at bottom)
docs/FINDINGS.md · RESEARCH_QUEUE.md · FORWARD_RESEARCH_PLAN.md
* SAE / abstain were rejected against the hindsight-inflated 0.038; against the honest 0.118 they are roughly competitive, so they are flagged for re-evaluation. This page is a hand-maintained snapshot; numbers trace to DECISIONS.md and data/predictions/. The Oracles · ForecastingPath · honest backtest, leakage-audited.