MULTI-AGENT · KALSHI · RESEARCH-FIRST

Automated edge detection
& execution pipeline

Edge-Radar scans thousands of open Kalshi markets, cross-references prices against sportsbook consensus, crypto spot, and weather forecasts — then surfaces mispriced contracts through 13 risk gates and Kelly sizing before placing limit orders. Every decision is logged for post-hoc calibration.

7
Pipeline stages
12
Risk gates
5
Edge models
27
Sport filters
297
Tests passing

§1 Data Flow

Scan → Size → Execute → Settle. Interactive diagram — hover nodes, click for detail.

§2 Pipeline

Seven sequential stages. Each advances an opportunity or drops it.
01

Fetch read · parallel

Pull all open Kalshi markets via the signed REST client. In parallel, query 8-12 US sportsbooks through The Odds API, ESPN, NHL/MLB stats, CoinGecko spot, and NWS hourly forecasts.

Sources
Kalshi, Odds API (rotating keys), ESPN, NHL/MLB Stats, CoinGecko, NWS, Yahoo
Rotation
Set-based Odds API key rotation — tries every key once on 401/429 before bailing
Caching
Weather fetched once per home team; shared across spread + total markets
02

Categorize routing

Every ticker prefix is mapped to a category (moneyline, spread, total, futures, crypto, weather, S&P). The category determines which edge model runs next.

Map
30 prefix → category rules across 18 sports + prediction markets
Fallback
Unmapped prefixes log once, then drop rather than pollute scoring
03

Compare fair value vs. ask

Each opportunity is scored on four independent dimensions. The composite score gates whether it continues. Details in Edge Models.

Edge (40%)
fair_value − kalshi_ask, side-picked YES vs NO
Confidence (30%)
Book count + range + stats + sharp money + rest/B2B
Liquidity (20%)
10 − spread × 20
Time (10%)
Placeholder for time-decay weighting
04

Cap dedup

Limit to top 3 per game/event so a single contest can't dominate the scan. Bracket dedup collapses same-outcome correlated markets on the same day.

Per-event
Top 3 by composite score per ticker root
Bracket dedup
Intra-day correlated bracket collapse (same outcome)
05

Risk-Check 12 gates +Gate 3.5

Seven reject gates and two sizing caps. Every gate is logged with pass/reject reason and input values. Full table in Risk Gates.

Sizing
Batch-Kelly with soft-cap above 15% edge, half-Kelly on sub-35¢ NO bets
Janitor
Cancels zero-fill resting orders > 24h before placing new ones
06

Execute signed · limit-only

RSA-signed limit orders on Kalshi. A full trade journal row is written per order with ticker, side, price, size, fair value, edge, confidence, composite, and rationale.

Auth
RSA private key loaded inline or via KALSHI_PRIVATE_KEY env
Dry run
Default DRY_RUN=true; identical logging for backtest parity
07

Monitor settle · calibrate

The settler scans for closed markets, realizes P&L, and appends to the trade journal. The calibration tool groups settled trades by edge bucket, sport, side, and confidence to measure Brier score and realized ROI.

Settle
make settle — realized P&L + CLV tracking
Calibrate
model_calibration.py — Brier by edge bucket, sport, confidence
Dashboard
Streamlit app — scan, execute, portfolio, settle

§3 Edge Models

Five specialized models — each outputs a fair-value probability compared against Kalshi ask.
ML

Moneyline (2-way de-vig)

De-vig each book's line using the multiplicative method. Take a weighted median across 8-12 books.

  • Sharp books 3× (Pinnacle, Circa)
  • Recreational 0.7× (DK, FD)
  • Confidence from book count + range + stats
SP

Spreads (Normal CDF)

Infer expected margin from the consensus line, model the final as Normal(μ, σ), compute P(margin > strike).

  • NBA σ=13.8, NCAAB σ=12.1 (R2 bump)
  • NHL σ=2.5 · MLB σ=4.025
  • +Rest/B2B + weather stdev adjust
TO

Totals (CDF + weather)

Same CDF approach with a weather overlay for outdoor NFL/MLB. Weather shifts both fair value and stdev.

  • Wind > 15mph, rain > 40%, cold
  • Dome = 0 adjustment (auto-detect)
  • Severe +0.5 · moderate +0.3 · mild +0.1
FU

Futures (N-way de-vig)

De-vig the full N-outcome market by distributing the overround proportionally across sportsbook futures odds.

  • NFL, NBA, NHL, MLB, PGA
  • Weighted median across books
  • Higher liquidity threshold required
PR

Predictions

Model-specific: crypto uses log-normal volatility, weather uses NWS ensemble, S&P uses VIX-implied vol.

  • Crypto: BTC, ETH, XRP, DOGE, SOL
  • Weather: 13 US cities (NWS/NOAA)
  • S&P: Yahoo + VIX → strike prob

§4 Risk Gates

Seven reject gates + two sizing caps. Every gate logs its pass/reject reason with inputs.
# Type Gate Trigger
1 reject Daily loss limit Realized daily loss >= MAX_DAILY_LOSS
2 reject Open position cap Open positions >= MAX_OPEN_POSITIONS
3 reject Edge threshold Edge < global MIN_EDGE_THRESHOLD or per-sport override (NBA 12%, NCAAB 10%)
3.5 reject R7 Lottery-ticket floor Market price < MIN_MARKET_PRICE (default $0.10); strict less-than
4 reject Composite score Score < MIN_COMPOSITE_SCORE (default 6.0)
4.5 reject R3 Confidence floor Confidence rank < MIN_CONFIDENCE (default medium). Low-confidence: 0W-3L, -105% ROI
4.6 reject R1 NO-favorite guard NO bets with price < NO_SIDE_FAVORITE_THRESHOLD (0.25) need edge >= NO_SIDE_MIN_EDGE (0.25) AND confidence=high
4.7 reject R25 Prediction-market safety Opportunity category in {crypto, weather, spx, mentions, companies, politics} → reject unless ALLOW_PREDICTION_BETS=true
5 reject Duplicate ticker Already holding this market
6 reject Per-event cap Per-event positions >= MAX_PER_EVENT (default 3)
7 reject C5 Series dedup Matchup (sport + team pair, date-agnostic) already bet in last SERIES_DEDUP_HOURS (48 global; per-sport: MLB 72, NHL 72 — R9)
8 cap Max bet size Computed bet > MAX_BET_SIZE → cap to limit
9 cap Bet ratio Single bet > MAX_BET_RATIO × batch median → cap
K

Kelly sizing

Batch-Kelly divides KELLY_FRACTION (0.25) by the batch size so concurrent opportunities share risk.

  • Soft cap: edge above KELLY_EDGE_CAP (15%) decayed by KELLY_EDGE_DECAY (0.5)
  • NO-side half-Kelly: NO bets priced < NO_SIDE_KELLY_PRICE_FLOOR (35¢) sized at 0.5× normal Kelly
J

Resting-order janitor R4

Runs at the top of execute_pipeline() when execute=true AND DRY_RUN=false.

  • Lists status=resting orders
  • Cancels those > RESTING_ORDER_MAX_HOURS (24) with zero fills
  • Partial/full fills left for the settler

§5 Risk Limits

Defaults shipped in .env.example. All overridable per environment.
$

Position sizing

  • UNIT_SIZE = $1.00 (Kelly floor)
  • KELLY_FRACTION = 0.25
  • MAX_BET_SIZE = $100
  • KELLY_EDGE_CAP = 0.15
  • KELLY_EDGE_DECAY = 0.5
Σ

Portfolio caps

  • MAX_DAILY_LOSS = $250
  • MAX_OPEN_POSITIONS = 10
  • MAX_PER_EVENT = 3
  • MAX_BET_RATIO = 3.0× batch median
  • SERIES_DEDUP_HOURS = 48 (MLB 72, NHL 72 via per-sport overrides — R9)

Edge thresholds

  • MIN_EDGE_THRESHOLD = 0.03 (global)
  • MIN_EDGE_THRESHOLD_NBA = 0.12
  • MIN_EDGE_THRESHOLD_NCAAB = 0.10
  • MIN_COMPOSITE_SCORE = 6.0
  • MIN_CONFIDENCE = medium

Lottery-ticket floor

  • MIN_MARKET_PRICE = $0.10
  • Strict less-than ($0.09 rejected)
  • Set to 0 to disable
  • Blocks sub-10¢ longshot cluster
N

NO-side guard

  • NO_SIDE_FAVORITE_THRESHOLD = 0.25
  • NO_SIDE_MIN_EDGE = 0.25
  • NO_SIDE_KELLY_PRICE_FLOOR = 0.35
  • NO_SIDE_KELLY_MULTIPLIER = 0.5

Order hygiene

  • RESTING_ORDER_MAX_HOURS = 24
  • Zero-fill stale orders cancelled pre-execute
  • DRY_RUN default = true

§6 Recent Updates

Shipped changes, most recent first.
2026-04-28 — R24b File-Backed Odds API Cache
Two-tier cache for The Odds API — in-process dict in front of a new file-backed layer at data/cache/odds/
R24bCross-process cache for sportsbook odds payloads
F31 documented one Odds API key dropping 175 → 0 remaining in five minutes. Root cause: each scan.py invocation started with empty in-process caches and refetched all 18 sport keys from scratch; back-to-back scans (scheduler bursts, dashboard re-renders) doubled quota burn for no gain. New scripts/shared/odds_cache.py with load(sport_key, markets, ttl_seconds), store(), clear(); files at data/cache/odds/<sport_key>__<markets>.json (commas in markets sanitized to underscores; original markets string preserved inside JSON). Silent-on-error throughout — corrupt file = miss, never an exception.
R24bTwo new env knobs in app/config.py
ODDS_CACHE_TTL_SECONDS=300 (5 min default — longer than typical filter-fiddling, shorter than meaningful pre-game line movement; 0 disables) and ODDS_CACHE_ENABLED=true. Validates non-negative TTL. Wired into both edge_detector.fetch_odds_api() and futures_edge.fetch_outrights(); the existing in-process dicts stay so existing _odds_cache.clear() test calls still work. Hits log Odds API file cache hit for X (age Ns, M events) so cache age is visible in scan output.
R24bDistinct from R23's quota cache
R23 caches per-key remaining-request counters at data/cache/odds_api_quota.json so fresh processes skip exhausted keys. R24b caches the actual sportsbook payloads. Both layered cleanly: a key-rotation 401 doesn't poison the response cache, and a stored response doesn't lie about quota state. +10 regression tests (320 → 330 passing). Offline round-trip smoke confirms call 2 (in-process dict cleared, file cache populated) returns identical events with 0 HTTP calls.
2026-04-25 — Config Centralization (PR #149)
Phase 1 + 2 + 3 — typed config module, 65 os.getenv reads migrated, lint guard against regression
P1New app/config.py — single source of truth
Audit found 75 os.getenv calls across 14 files with type-coercion drift (MIN_EDGE_THRESHOLD read in 5 places, two type styles; DRY_RUN coerced inconsistently). Built a typed module: 10 frozen dataclasses (Kalshi/Odds/Alpaca/Telegram credentials + RiskLimits, GateThresholds, KellyConfig, PerSportOverrides, System) with from_env() coercion and Config.validate() for impossible combos. Memoized via get_config() / reset_config(). 32 unit tests. Pure addition — no scripts touched, no behavior change.
P2Migrated all 8 script groups to get_config()
65 os.getenv reads removed across doctor.py (9), risk_check.py (5), kalshi_client.py (8), edge_detector.py + fetch_odds.py (3), kalshi_executor.py (23 — the heavyweight, all 11 risk gates and per-sport overrides), 6 small modules (11), and webapp/services.py (6). Module-level constants kept as plain mutable globals where tests directly mutate them; only the initial source changed. or None pattern preserves None-on-unset semantics for credentials spliced into HTTP headers. Streamlit Cloud bug found and fixed: webapp/app.py puts webapp/ on sys.path[0], shadowing the app/ package; resolved by re-inserting PROJECT_ROOT at sys.path[0] inside services.py. All 292 prior tests still pass.
P3Lint guard at scripts/lint/check_config_centralization.py
Walks app/, scripts/, webapp/ for os.getenv / os.environ. Allows app/config.py, comment-only lines, and lines tagged # config-bootstrap (reserved for the 4 Streamlit secrets-bootstrap lines in webapp/services.py). Wired into make lint-config and a pre-commit hook with always_run: true so the lint sees the whole tree, not just staged files. 5 unit tests cover codebase-clean baseline, regression detection, annotation suppression, comment-line ignore, and app/config.py exclusion.
P2Bonus: dropped a small dead constant + display normalization
MIN_EDGE in risk_check.py was defined but never referenced; deleted. doctor.py's display of UNIT_SIZE is now :.2f-formatted, so a value of .50 in .env renders as $0.50 instead of $.50. Numeric values reaching every gate are byte-identical.
2026-04-24 — 30-Day Calibration Cycle + Futures Bug Hunt + Prediction Audit
R12–R18, R20, R21–R23, R24a, R25 — calibration unblocked, NBA floor, one-way confidence, scanner parity, 3 futures fixes, Odds API key rotation, webapp cache, scan-table Gate column, prediction safety gate
R15model_calibration.py points at settlement source
Script was reading trade_log (16 entries, 3 closed) instead of kalshi_settlements.json (173 entries). CLI errored with "need at least 10" despite 160 settled bets existing. Fixed by reading settlements and normalizing field names. Unblocks R12.
R12First 160-trade calibration report
Brier 0.2657 (worse than coin-flip). Per-sport Brier surfaces NBA 0.3306 as worst of all sports (NHL 0.2376, MLB 0.2519, NCAAB 0.2885). High-confidence WR 47% below Medium 53% portfolio-wide. 25%+ edge bucket softened from -24% (14d) to +16% (30d) — suggestive R2 is working.
R14NBA floor bumped 0.08 → 0.12
Raised MIN_EDGE_THRESHOLD_NBA. Also restored both NBA and NCAAB overrides in the live .env — documented in .env.example but missing from the actual env file, so both had been silently falling back to the 3% global floor. Scope intentionally minimal: most NBA bleed was High-confidence picks (fixed in R13) or sub-10¢ lottery tickets (already caught by R7).
R13Confidence bumps are one-way (down only)
_adjust_confidence_with_stats() now drops a tier on contradicts but no-ops on supports. Applies to all three call sites (team stats, rest/B2B, sharp money). Upward bumps correlated with inflated claimed edge but worse realized outcomes. Base "high" tier still reachable via ≥8 sharp-books + tight-consensus rule. +4 regression tests (218 → 222 passing).
R16Monthly calibration cron
New calibration profile runs model_calibration.py --days 30 --save on day 1 of each month at 02:00 (after nightly settler). Installer extended to support MONTHLY schedules. Narrowed scripts/schedulers/ gitignore so the portable automation/ folder is now tracked.
R17Scanner flag parity (--budget, --report-dir)
Futures and prediction scanners didn't accept --budget or --report-dir. Extracted shared parse_budget_arg() helper; added both flags across all scanners; wired each to execute_pipeline(budget=…) and save_scan_report(output_dir=…).
R21Dedup passes futures through unchanged
A futures scan of 20 opportunities was being collapsed to 2 before risk gates ran. dedup_correlated_brackets was treating every team outcome in a championship as an alt-line bracket. Fix: when category=="futures", use the full ticker as the dedup key. Concentration still bounded by Gate 6 (MAX_PER_EVENT=2).
R22FUTURES_MAP prefix-collision + semantic fix
Futures scan was surfacing "+30-75% edge" on almost every MLB team. Two bugs: (1) KXMLBPLAYOFFS-26-LAD matched the KXMLB prefix first via startswith; (2) playoff-qualifier and conference-winner markets pointed to championship-winner odds — fundamentally the wrong question. Fixed by switching to exact-series match and removing the 5 semantically-broken entries (KXMLBPLAYOFFS, KXNBAEAST/WEST, KXNHLEAST/WEST). Same scan now surfaces 2 real +4% edges instead of 45 bogus +30-75% "edges".
R23Odds API key rotation + persistent quota cache
Live probe showed 5 of 10 configured keys exhausted; fetch_outrights retry loop exited after 3 attempts and never reached the healthy key at index 5. Fixed: switched to tried: set[str] loop, added mark_exhausted() on 401, persistent quota cache at data/cache/odds_api_quota.json. Fresh processes now skip exhausted keys instantly.
R24aWebapp scan cache (@st.cache_data(ttl=60))
Zero @st.cache decorators existed anywhere in webapp/. Every click of SCAN MARKETS fired a fresh Odds API fetch. 60s TTL on run_scan(); CLEAR button wipes the cache on demand.
R18Scan tables show a Gate column
Scan output was silently hiding which rows the executor would reject. New preflight_gate_status() helper checks the 5 static per-opportunity gates; returns ok / edge / price / score / conf / no-fav / pred-off. Wired into all four scanner tables.
R20Prediction-market audit
First full eval of the 6 prediction modules since they shipped. Findings: zero prediction bets in 173 settlements; all 6 modules cache live data with no TTL; 4 of 6 have no unit tests; live scans produce garbage fair values (crypto +80% on 4¢ tails, weather $1.00 on 1°F windows); a Miami weather bet was one --unit-size away from executing live. Recommendation: park until rebuilt.
R25New Gate 4.7 — prediction-market safety
Rejects crypto / weather / spx / mentions / companies / politics categories unless ALLOW_PREDICTION_BETS=true. Default off until R25b (TTL caches) + R25c (rebuild one model with tests) are shipped. R18's Gate column surfaces the rejection as pred-off at scan time.
2026-04-22 — R7, Q1-Q5
Lottery-ticket floor + repo-analysis response
R7New Gate 3.5 — MIN_MARKET_PRICE floor
Rejects any bet below $0.10 ask. Built in response to F10 from the 14-day review: sub-10¢ bets were 1W-3L with the model claiming "+50% edge" on lottery tickets. Strict less-than: $0.09 rejected, $0.10 approved. No edge/confidence carve-out. Set to 0 to disable. 5 new tests (213 → 218 passing).
Q1Web app market_type wired through service layer
UI exposed sports/futures/prediction but everything routed into sports-only. run_scan() now dispatches by market type.
Q2-Q5Test contamination, doc drift, Pages branch, pandas
Fixed env-leak in test_approved_clean_when_no_caps_hit. Collapsed count-specific "8 risk gates" refs to "all risk gates" linking to CLAUDE.md. Flipped Pages workflow from main to master. Promoted pandas>=2.1.4 to a first-class runtime dep.
2026-04-21 — 14-Day Review Response
R1, R2, R3, R4 — four hardening gates
R1NO-favorite guard + half-Kelly dampener (Gate 4.6)
All 13 high-edge losers in the 14-day window were NO bets on heavy favorites. Reject NO bets < 25¢ unless edge >= 25% AND confidence=high. NO bets < 35¢ sized at half-Kelly.
R2Per-sport stdev bump (supersedes C2)
NBA margin stdev 12 → 13.8 (+15%), NCAAB 11 → 12.1 (+10%), MLB 3.5 → 4.025 (+15%). Same widening on totals. Wider distributions pull probability mass toward 50%, reducing the favorite-band overconfidence gap (+18% at 60-70% bucket).
R3MIN_CONFIDENCE reject gate (Gate 4.5)
Low-confidence bets were 0W-3L / -105% ROI across two review windows. Now rejected outright instead of warning. Default medium.
R4Resting-order janitor
16% of new orders were resting 25-66h with zero fills. New cancel_stale_resting_orders() helper runs at the top of execute_pipeline() when live-execute. 32 new tests (181 → 213 passing).
2026-04-18 — First Post-Baseline Calibration
C1, C3, C5 — calibration-driven tuning
C1Kelly edge soft-cap
Claimed edges >=25% realized -35% ROI while 10-15% edges returned +127%. trusted_edge() softly caps Kelly sizing above 15%. Raw edge still flows through gates and reports.
C3Per-sport MIN_EDGE_THRESHOLD
NBA -15% ROI and NCAAB -62% ROI at the 3% global floor while NHL was +100%. Per-sport overrides: NBA 8%, NCAAB 10%.
C5Series-level dedup (Gate 7)
Same-matchup bets across consecutive nights compounded losses (Angels @ Yankees 3-night, Mets @ Dodgers 2-night). New gate rejects bets whose matchup key was already bet within SERIES_DEDUP_HOURS (48). Per-sport overrides added in R9 (2026-04-27): MLB and NHL bumped to 72h after F12 — a NYM/LAD pair bet 49h apart slipped the global window.
2026-04-08 — Coverage + Dashboard
Full sports coverage & Streamlit Cloud
Odds API mapping: 4 → 18 sports
NFL, NCAAF, soccer (EPL/UCL/La Liga/Serie A/Bundesliga/Ligue 1/MLS), UFC, boxing, F1, PGA, IPL, NCAA W-basketball added. No-filter scans went from 11 → 30 prefixes.
Multi-filter and Streamlit Cloud
--filter mlb,nhl comma-separated multi-sport scans. Dashboard deployed to Streamlit Community Cloud with password gate; inline PEM support for cloud filesystems.

§7 Calibration Snapshot

14-day review window — 2026-04-07 through 2026-04-21.
76
Settled trades
48.7%
Win rate (37W-39L)
+31%
ROI
0.2646
Brier score
100
Trades → R12 re-run

Findings driving recent risk changes

  • F1 — YES +93% ROI (n=48); NO -20% (n=28); NO at >=20% edge: 31% WR (n=16) → R1 Gate 4.6
  • F6 — Low-confidence 0W-3L / -105% ROI → R3 Gate 4.5
  • F10 — Sub-10¢ bets 1W-3L with claimed "+50% edge" → R7 Gate 3.5
  • 60-70% favorite band overconfidence +18% (n=40) → R2 stdev bump

Attribution plan (R12)

R12 re-runs model_calibration.py at 100 post-baseline trades (currently 66). The window between R2's ship date and that checkpoint is the cleanest place to measure whether the probability-width fix improved Brier.