Predictions that emerge from a network of political actors. 让预测从政治人物的关系网中涌现。
Not poll aggregation, not "expert" bots. We wire politicians, factions and institutions into a signed network, let signal propagate, and read a probability out of the interactions — then honestly quantify our own data leakage. 不是民调聚合,也不是「专家 agent 投票」。我们把政客、派系、机构连成一张有符号网络,让信号传播、从互动里读出概率 —— 并诚实量化自己的数据泄露。
A computational-poli-sci experiment一个计算政治学实验
Can a political outcome be computed out of an actor network — before it is announced?政治结果能否在揭晓前,从行动者网络里「算」出来?
The core idea核心思路
Each agent = one political figure. Signal propagates along the edges — allies converge, opponents push back — and the decider's perceived utility shifts round by round. Monte-Carlo it; read out a distribution.每个 agent = 一个政治人物。信号沿边传播 —— 盟友趋同、对手反向 —— 决策者的感知效用逐轮改变。蒙特卡洛多次,读出分布。
The prediction comes from nobody's judgment — it emerges from the interaction.预测不来自任何人的判断,而是从互动中涌现。
Why it's different为什么不一样
Most "AI prediction" projects report one flattering number. This one's spine is honesty: it quantifies its own leakage (89.5% → 63.2% out-of-sample) and stakes its credibility on prospective bets whose results don't exist yet — the only test you can't game.大多数「AI 预测」只报一个好看的数字。这个项目的脊梁是诚实:量化自身泄露(样本外 89.5% → 63.2%),并把信誉押在结果尚不存在的前瞻预测上 —— 唯一无法作弊的测试。
Knowing what it doesn't know知道自己不知道
Classify → find the decider → only then split agents先判类型 → 定谁是 decider → 才分裂 agent
One template does not fit every question. The key insight: not every question has a single decider.不是「一个模板套所有问题」。关键洞察:不是每个问题都有「一个决策者」。
① Classify question → {type, single decider?, how it settles}
│ appointment / election / multilateral-binary / departure-timing
▼
② Settlement structure who / what decides? ← the earliest, heaviest call
├ appointment → one decider agent (Trump makes the call)
├ election → voter blocs → vote aggregate (there is no "one person")
├ multilateral→ neutral adjudicator reads the whole game
└ timing → each actor's departure risk → who triggers first
▼
③ Ground the actors candidate set + factions/blocs + relations, with cited evidence
▼
④ Split & simulate independent agents · multi-round · Monte-Carlo → distribution
▼
⑤ Reconcile vs market → locate the disagreement → log a bet, or raise a flag
Why ② is the hinge②为什么是关键
Network + signal flow is most faithful to appointments — someone really decides. Elections need voter blocs, not a single node. Wrong settlement structure = wrong everything — so it's the first call.关系网 + 信息流动对任命最忠实 —— 真有一个人拍板。选举该建选民分块而非单节点。选错结算结构 = 全盘错,所以它最先决定。
The mistake we corrected我们修正的错误
An early version assumed there's always a decider. For elections that's false — millions of voters in one node makes the model over-confident (Cox 98% vs market 63%). The fix: branch ② by type.早期默认「总有一个决策者」。对选举是错的 —— 几百万选民塞进单节点 → 过度自信(Cox 98% vs 盘口 63%)。修复:② 按类型分叉。
Each actor is an independent LLM playing a real person每个行动者 = 一个独立 LLM,扮演真人
The outcome emerges from their interaction (built per Generative Agents / BDI).事件结果从它们的互动中涌现(按 Generative Agents / BDI 实现)。
Orchestrator spawn N agents · advance rounds · route messages · read result │ ├─▶ Agent A persona/BDI · private info · memory · limited perception ├─▶ Agent B … separate context — can't see each other's reasoning └─▶ Agent C … │ public broadcast / targeted DM ▼ Environment event stream: [{round, from, to, public, text}…]
Inside one agent一个 agent 的内部
Designed against the known failure modes对抗文献已知陷阱
| Trap (arXiv:2507.19364)陷阱 | Design设计 |
|---|---|
| Info-asymmetry collapse信息不对称崩坏 | limited perception; decider isn't omniscient受限感知;决策者非全知 |
| Average-persona convergence平均人格收敛 | independent sub-agents + strong persona + high temp独立子代理 + 强人格 + 高温 |
| Confident nonsense自信地胡说 | outcome emerges; prospective test adjudicates结果从博弈涌现;前瞻测试裁决 |
Tested: US–Iran ran on 6 independent Claude sub-agents that genuinely diverged (private 15–28%) instead of averaging out.已实测:美伊用 6 个独立 Claude 子代理跑通,各 agent 真分化(私评 15–28%),不塌成平均人。
One explainable card per market每个市场,一张可解释的卡
Model probability vs the market, the edge, a one-sentence "why," and a draggable actor network. Paste one line of iframe anywhere — and run your own question below.模型概率 vs 盘口、edge、一句「为什么」、一张可拖动的关系网。复制一行 iframe 放到任意页面 —— 也可以在下面跑你自己的问题。
One widget, four audiences一个 widget,四重通吃
Every view is counted每次浏览都在数
The card pings /api/beacon on every view. That count is the traction signal. Read-only Polymarket data; it never trades.卡片每次浏览回传 /api/beacon。这串数字就是 traction。只读 Polymarket 数据,绝不下单。
Predict any political question预测任意政治问题
Type a question; the engine classifies it, finds the decider, grounds the actors and returns a Why-card. Connect an API key to run live.输入一个问题;引擎判类型、定 decider、落地行动者,返回一张 Why 卡片。接上 API key 即可实时运行。
Every market, model vs the line全部市场,模型 vs 盘口
Appointments, primaries, generals and 2028 — each card is one explainable prediction. Hit Details on any card for the full decision trace.任命、初选、大选与 2028 —— 每张卡 = 一个可解释预测。点任意卡片的详细看完整决策过程。
48 leak-free bets, scored daily48 个无泄露预测,每日打分
Three-way: raw engine / temperature-calibrated / market. A remote agent pulls Polymarket settlements every day and computes Brier for all three.三方对照:raw 引擎 / 温度校准 / 市场。一个远程 agent 每天拉 Polymarket 结算,给三方算 Brier。
Where the model bets against the market模型对市场下的分歧赌注
Win these and there's a real edge; lose and it was naïve. Both are information.赢了才叫有 edge,输了就是太天真。两种都是真信息。
| Market市场 | Model模型 | Market市场 | Settle |
|---|---|---|---|
| UN Secretary-General | Grynspan 46% | 29% | 12/31 |
| Israel PM | Bennett 34% | Netanyahu 33% | 12/31 |
| Wisconsin Dem primary | Barnes 42% | Hong 44% | 8/11 |
| Minnesota GOP primary | Qualls 61% | 72% | 8/11 |
Settlement calendar结算日历
6/3 ▸ 11 Korean local elections (first leak-free scores) 6/9 ▸ SC / Maine primaries ×2 6/16 ▸ DC / Ohio primaries ×2 6/23 ▸ Maryland / NY ×2 6/30 ▸ Colorado ×2 7–8 ▸ 7 gubernatorial primaries 11/3 ▸ 11 statewide governor races (blue-wave bets resolve) 12/31 ▸ UN SG / Trump Labor Sec
A remote agent pulls the latest settlements every day and outputs a three-way Brier report. The results come to you.一个远程 agent 每天自动拉最新结算,输出三方 Brier 日报。结果会自己来。
Markets have numbers. Experts have talk. We wanted mechanism — honestly.市场有数字,专家有话术。我们想要机制 —— 而且诚实。
Why we built it, what to expect, and how it competes — stated plainly.为什么做、预期是什么、怎么竞争 —— 直说。
A political market has a price — but no reason政治市场有价 —— 却无因
You know "Warsh 60%". You don't know why, and you can't compute it before the announcement.你知道「Warsh 60%」,却不知道为什么,也无法在公布前把它算出来。
| Method方法 | Election | Appointment | Explainable可解释 | The gap缺口 |
|---|---|---|---|---|
| Polls / fundamentals民调 / 基本面 | strong | none | medium | no "polls" for appointments任命类没有民调 |
| Expert-LLM "committee"专家 LLM 开会 | ~ | ~ | weak | no mechanism; confident nonsense无机制;自信地胡说 |
| Market price itself盘口价本身 | accurate | accurate | black box | no causality, no foresight无因果、无前瞻生成 |
The insight: an outcome can be computed from actors interacting across a network. Especially elite appointments — no polls, decided by power structure — a slice nobody does systematically.洞察:结果能从行动者沿关系网的互动中「算」出来。尤其精英任命类 —— 没有民调、由权力结构决定 —— 是没人系统做的一块。
Two tracks, split by who decides按 decider 分两轨
Appointment track任命轨
One decider + influencers on a signed network + Monte-Carlo → a distribution. Fed Chair and others reproduced the real result.一个决策者 + 影响者沿有符号网络传播 + 蒙特卡洛 → 分布。Fed 主席等已复现真实结果。
Election track选举轨
The decider is millions of voters → modeled as voter blocs (size × turnout × lean) → vote aggregate. Corrects the single-decider over-confidence.decider 是几百万选民 → 建模为选民分块(规模×投票率×偏好)→ 票额聚合。修正单决策者的过度自信。
We don't promise to beat the market不承诺打败盘口
We promise three things we can actually deliver.承诺三件做得到的事。
- 01Explainable可解释A causal "who moved whom" narrative — not a black-box number.给出「谁推动了谁」的因果叙事,而非黑箱数字。
- 02Honestly calibrated诚实校准Leakage quantified by leave-one-out (89.5% → 63.2% true). Only as confident as it has earned.留一法量化泄露(89.5% → 真实 63.2%)。只配得上它该有的自信。
- 03Prospective self-proof前瞻自证Timestamp predictions on unsettled markets; after settlement, score model Brier vs market — the only test you can't game.对未结算市场登记预测;结算后算 模型 vs 盘口 Brier —— 唯一无法作弊的测试。
It knows exactly what it can and can't do它精确知道自己能与不能
① Quantify the leakage (leave-one-out)① 量化泄露(留一法)
Authoring attributes while knowing the result leaks. Learn global weights, evaluate out-of-sample:填属性时已知结果 → 泄露。学全局权重,做样本外评估:
| Method方法 | Accuracy | Brier |
|---|---|---|
| Uniform guess均匀瞎猜 | ~22% | 0.73 |
| Learned weights (LOO)学习权重 (留一) | 63% | 0.72 |
| Hand-tuned (with answers)手调 (带答案) | 89% | 0.19 |
89% washes down to 63% — the gap is leakage. The learned weights are legible too: proximity / loyalty / visibility ≫ competence.89% 洗白后只剩 63% —— 差的就是泄露。学到的权重也可解释:近身/忠诚/上镜 ≫ 能力。
② Does the engine layer help? (ablation)② 引擎层真有效吗(消融)
| Config配置 | Accuracy | Brier |
|---|---|---|
| β=0 base utility (network off)β=0 纯基础效用 | 81% | 0.090 |
| β=1.6 full networkβ=1.6 完整网络 | 86% | 0.051 |
| + momentum+ 动量 | 81% | 0.053 |
The network layer cuts Brier 44% — it helps. Momentum hurt — it's off. Use the data, not your mouth.网络层把 Brier 降 44% —— 有效;动量反而有害 —— 已关。用数据,不靠嘴。
High backtest accuracy carries leakage and can't be washed — a mechanism demo, not proof. The only trustworthy test is a prospective prediction — the result doesn't exist yet, so it can't be gamed.回测高准确率带泄露、无法洗白 —— 只是机制演示,不是证明。唯一可信的测试是前瞻预测 —— 结果尚不存在,无法作弊。
Users, competitors, moat, risk用户、竞品、护城河、风险
Target user目标用户
People who trade these markets — want "a second opinion with a causal reason + an edge vs the line." Also serves analysts/media and technical evaluators.交易这些市场的人 —— 想要「带因果理由的第二意见 + 相对盘口的 edge」。也服务分析/媒体与技术评估方。
Competitors & differentiation竞品与差异化
| Competitor竞品 | Has它有 | Lacks → we supply它缺 → 我们补 |
|---|---|---|
| Polymarket price | accuracy, liquidity准、流动性 | black box → we give causal mechanism黑箱 → 给因果机制 |
| Metaculus / poll models | strong on elections选举强 | no appointments → we cover the blind spot任命无解 → 覆盖盲区 |
| GPT "expert panel" | easy to make易做 | no mechanism → results emerge + prospective test无机制 → 结果涌现 + 前瞻证伪 |
Moat护城河
- Explainable causal narrative (rivals are black-box or guessing)可解释的因果叙事(竞品多是黑箱或瞎猜)
- A scientific stance: quantified leakage + prospective falsification — when AI predictions are everywhere, verifiable = trust量化泄露 + 前瞻证伪的科学态度 —— AI 预测满天飞时,可验证 = 信任
- The overlooked appointments niche被忽视的任命类细分
- Embeddable cards carry distribution + a traction beacon可嵌入卡片自带分发 + traction 埋点
Risk (exposed honestly)风险(诚实暴露)
- Edge unproven — only open-market settlements will telledge 未证 —— 等前瞻结算才知道
- Election track over-confidence, being fixed with voter blocs选举轨过度自信,待用选民分块修正
- Reflexivity: an understood strategy gets priced away反身性:策略被理解后会被定价侵蚀
- Scaling needs a working LLM key to run the author step规模化需有效 LLM key 跑 author
The engine runs, the product ships. What's missing isn't the tech — it's "the user who'll hurt over this." Next: put the card in front of traders and get the first real signal of demand.引擎可跑、产品已 ship。缺的不是技术,是「会为此难受的用户」—— 下一步是把卡片发到交易者面前,拿回第一个真实需求信号。