Computational political science/ v2

Predictions that emerge from a network of political actors. 让预测从政治人物的关系网中涌现。

Not poll aggregation, not "expert" bots. We wire politicians, factions and institutions into a signed network, let signal propagate, and read a probability out of the interactions — then honestly quantify our own data leakage. 不是民调聚合,也不是「专家 agent 投票」。我们把政客、派系、机构连成一张有符号网络,让信号传播、从互动里读出概率 —— 并诚实量化自己的数据泄露

15
markets · LLM multi-agent sims个市场 · LLM 多 agent 模拟
300+
voter archetypes in the election track选举轨里的选民原型
63.2%
rule baseline, true out-of-sample (LOO)规则基线,真实样本外 (留一)
Actor network — interactive行动者关系网 · 可交互

What it is它是什么

A computational-poli-sci experiment一个计算政治学实验

Can a political outcome be computed out of an actor network — before it is announced?政治结果能否在揭晓前,从行动者网络里「算」出来?

The core idea核心思路

Each agent = one political figure. Signal propagates along the edges — allies converge, opponents push back — and the decider's perceived utility shifts round by round. Monte-Carlo it; read out a distribution.每个 agent = 一个政治人物。信号沿边传播 —— 盟友趋同、对手反向 —— 决策者的感知效用逐轮改变。蒙特卡洛多次,读出分布。

The prediction comes from nobody's judgment — it emerges from the interaction.预测不来自任何人的判断,而是从互动中涌现

Why it's different为什么不一样

Most "AI prediction" projects report one flattering number. This one's spine is honesty: it quantifies its own leakage (89.5% → 63.2% out-of-sample) and stakes its credibility on prospective bets whose results don't exist yet — the only test you can't game.大多数「AI 预测」只报一个好看的数字。这个项目的脊梁是诚实:量化自身泄露(样本外 89.5% → 63.2%),并把信誉押在结果尚不存在的前瞻预测上 —— 唯一无法作弊的测试。

Knowing what it doesn't know知道自己不知道

Prediction pipeline预测 pipeline

Classify → find the decider → only then split agents先判类型 → 定谁是 decider → 才分裂 agent

One template does not fit every question. The key insight: not every question has a single decider.不是「一个模板套所有问题」。关键洞察:不是每个问题都有「一个决策者」

① Classify        question → {type, single decider?, how it settles}
       │  appointment / election / multilateral-binary / departure-timing
       
② Settlement structure   who / what decides?  ← the earliest, heaviest call
       ├ appointment → one decider agent (Trump makes the call)
       ├ election    → voter blocs → vote aggregate (there is no "one person")
       ├ multilateral→ neutral adjudicator reads the whole game
       └ timing      → each actor's departure risk → who triggers first
       
③ Ground the actors  candidate set + factions/blocs + relations, with cited evidence
       
④ Split & simulate   independent agents · multi-round · Monte-Carlo → distribution
       
⑤ Reconcile        vs market → locate the disagreement → log a bet, or raise a flag

Why ② is the hinge②为什么是关键

Network + signal flow is most faithful to appointments — someone really decides. Elections need voter blocs, not a single node. Wrong settlement structure = wrong everything — so it's the first call.关系网 + 信息流动对任命最忠实 —— 真有一个人拍板。选举该建选民分块而非单节点。选错结算结构 = 全盘错,所以它最先决定。

The mistake we corrected我们修正的错误

An early version assumed there's always a decider. For elections that's false — millions of voters in one node makes the model over-confident (Cox 98% vs market 63%). The fix: branch ② by type.早期默认「总有一个决策者」。对选举是错的 —— 几百万选民塞进单节点 → 过度自信(Cox 98% vs 盘口 63%)。修复:② 按类型分叉。

Multi-agent architecture · step ④多 agent 架构 · 第④步

Each actor is an independent LLM playing a real person每个行动者 = 一个独立 LLM,扮演真人

The outcome emerges from their interaction (built per Generative Agents / BDI).事件结果从它们的互动中涌现(按 Generative Agents / BDI 实现)。

Orchestrator  spawn N agents · advance rounds · route messages · read result
      
      ├─▶ Agent A  persona/BDI · private info · memory · limited perception
      ├─▶ Agent B  … separate context — can't see each other's reasoning
      └─▶ Agent C  
              │  public broadcast / targeted DM
              
      Environment  event stream: [{round, from, to, public, text}…]

Inside one agent一个 agent 的内部

persona / BDIBelief · Desire · Intention + a strong, distinct persona.信念 · 欲望 · 意图 + 鲜明独立的人设。
perceptiononly public broadcasts + DMs to itself — isolated, not omniscient.只看公开广播 + 发给自己的私信 —— 隔离、非全知
memorya stream scored by recency × importance × relevance → top-k.记忆流按 近因 × 重要度 × 相关度 检索 top-k。
reflectionperiodically synthesizes memory into a higher-level belief.周期性把记忆综合成更高层判断。
actiona public statement OR a private DM (back-channel vs open).公开表态私信游说(后门 vs 公开)。

Designed against the known failure modes对抗文献已知陷阱

Trap (arXiv:2507.19364)陷阱Design设计
Info-asymmetry collapse信息不对称崩坏limited perception; decider isn't omniscient受限感知;决策者非全知
Average-persona convergence平均人格收敛independent sub-agents + strong persona + high temp独立子代理 + 强人格 + 高温
Confident nonsense自信地胡说outcome emerges; prospective test adjudicates结果从博弈涌现;前瞻测试裁决

Tested: US–Iran ran on 6 independent Claude sub-agents that genuinely diverged (private 15–28%) instead of averaging out.已实测:美伊用 6 个独立 Claude 子代理跑通,各 agent 真分化(私评 15–28%),不塌成平均人。

02Predictions预测

One explainable card per market每个市场,一张可解释的卡

Model probability vs the market, the edge, a one-sentence "why," and a draggable actor network. Paste one line of iframe anywhere — and run your own question below.模型概率 vs 盘口、edge、一句「为什么」、一张可拖动的关系网。复制一行 iframe 放到任意页面 —— 也可以在下面跑你自己的问题。

One widget, four audiences一个 widget,四重通吃

Trader交易者
edge + a reason to trust it, in 3 secondsedge + 敢信的理由,3 秒内
Investor投资人
an AI that explains itself + embeds anywhere会自解释的 AI + 可嵌入分发
Growth增长
your link travels with every embed嵌到哪都带你的链接
Method方法论
a calibration badge puts honesty on its face校准徽章把诚实摆在脸上

Every view is counted每次浏览都在数

The card pings /api/beacon on every view. That count is the traction signal. Read-only Polymarket data; it never trades.卡片每次浏览回传 /api/beacon这串数字就是 traction。只读 Polymarket 数据,绝不下单。

Custom event自定义事件

Predict any political question预测任意政治问题

Type a question; the engine classifies it, finds the decider, grounds the actors and returns a Why-card. Connect an API key to run live.输入一个问题;引擎判类型、定 decider、落地行动者,返回一张 Why 卡片。接上 API key 即可实时运行。

Live cards · 15 markets实时卡片 · 15 个市场

Every market, model vs the line全部市场,模型 vs 盘口

Appointments, primaries, generals and 2028 — each card is one explainable prediction. Hit Details on any card for the full decision trace.任命、初选、大选与 2028 —— 每张卡 = 一个可解释预测。点任意卡片的详细看完整决策过程。

validated · settled, the model hit已结算,模型命中 2-way · aligned, small edge同向,小 edge contrarian · against market, falsifiable反市场,可证伪
Live experiment正在跑的实验

48 leak-free bets, scored daily48 个无泄露预测,每日打分

Three-way: raw engine / temperature-calibrated / market. A remote agent pulls Polymarket settlements every day and computes Brier for all three.三方对照:raw 引擎 / 温度校准 / 市场。一个远程 agent 每天拉 Polymarket 结算,给三方算 Brier。

Where the model bets against the market模型对市场下的分歧赌注

Win these and there's a real edge; lose and it was naïve. Both are information.赢了才叫有 edge,输了就是太天真。两种都是真信息。

Market市场Model模型Market市场Settle
UN Secretary-GeneralGrynspan 46%29%12/31
Israel PMBennett 34%Netanyahu 33%12/31
Wisconsin Dem primaryBarnes 42%Hong 44%8/11
Minnesota GOP primaryQualls 61%72%8/11

Settlement calendar结算日历

6/3   ▸ 11 Korean local elections   (first leak-free scores)
6/9   ▸ SC / Maine primaries  ×2
6/16  ▸ DC / Ohio primaries   ×2
6/23  ▸ Maryland / NY  ×2     6/30 ▸ Colorado ×2
7–8   ▸ 7 gubernatorial primaries
11/3  ▸ 11 statewide governor races   (blue-wave bets resolve)
12/31 ▸ UN SG / Trump Labor Sec
It scores itself它自己打分

A remote agent pulls the latest settlements every day and outputs a three-way Brier report. The results come to you.一个远程 agent 每天自动拉最新结算,输出三方 Brier 日报。结果会自己来。

03About · why, expectations, analysis关于 · 原因、预期、分析

Markets have numbers. Experts have talk. We wanted mechanism — honestly.市场有数字,专家有话术。我们想要机制 —— 而且诚实。

Why we built it, what to expect, and how it competes — stated plainly.为什么做、预期是什么、怎么竞争 —— 直说。

The problem & the gap问题与缺口

A political market has a price — but no reason政治市场有价 —— 却无因

You know "Warsh 60%". You don't know why, and you can't compute it before the announcement.你知道「Warsh 60%」,却不知道为什么,也无法在公布前把它算出来。

Method方法ElectionAppointmentExplainable可解释The gap缺口
Polls / fundamentals民调 / 基本面strongnonemediumno "polls" for appointments任命类没有民调
Expert-LLM "committee"专家 LLM 开会~~weakno mechanism; confident nonsense无机制;自信地胡说
Market price itself盘口价本身accurateaccurateblack boxno causality, no foresight无因果、无前瞻生成

The insight: an outcome can be computed from actors interacting across a network. Especially elite appointments — no polls, decided by power structure — a slice nobody does systematically.洞察:结果能从行动者沿关系网的互动中「算」出来。尤其精英任命类 —— 没有民调、由权力结构决定 —— 是没人系统做的一块。

What we build做什么

Two tracks, split by who decides按 decider 分两轨

Validated已验证

Appointment track任命轨

One decider + influencers on a signed network + Monte-Carlo → a distribution. Fed Chair and others reproduced the real result.一个决策者 + 影响者沿有符号网络传播 + 蒙特卡洛 → 分布。Fed 主席等已复现真实结果。

In progress进行中

Election track选举轨

The decider is millions of voters → modeled as voter blocs (size × turnout × lean) → vote aggregate. Corrects the single-decider over-confidence.decider 是几百万选民 → 建模为选民分块(规模×投票率×偏好)→ 票额聚合。修正单决策者的过度自信。

Expectations · honest, not hype预期 · 诚实,不吹

We don't promise to beat the market不承诺打败盘口

We promise three things we can actually deliver.承诺三件做得到的事。

  • 01
    Explainable可解释A causal "who moved whom" narrative — not a black-box number.给出「谁推动了谁」的因果叙事,而非黑箱数字。
  • 02
    Honestly calibrated诚实校准Leakage quantified by leave-one-out (89.5% → 63.2% true). Only as confident as it has earned.留一法量化泄露(89.5% → 真实 63.2%)。只配得上它该有的自信。
  • 03
    Prospective self-proof前瞻自证Timestamp predictions on unsettled markets; after settlement, score model Brier vs market — the only test you can't game.未结算市场登记预测;结算后算 模型 vs 盘口 Brier —— 唯一无法作弊的测试。
The honesty ledger诚实账

It knows exactly what it can and can't do它精确知道自己能与不能

① Quantify the leakage (leave-one-out)① 量化泄露(留一法)

Authoring attributes while knowing the result leaks. Learn global weights, evaluate out-of-sample:填属性时已知结果 → 泄露。学全局权重,做样本外评估:

Method方法AccuracyBrier
Uniform guess均匀瞎猜~22%0.73
Learned weights (LOO)学习权重 (留一)63%0.72
Hand-tuned (with answers)手调 (带答案)89%0.19

89% washes down to 63% — the gap is leakage. The learned weights are legible too: proximity / loyalty / visibility ≫ competence.89% 洗白后只剩 63% —— 差的就是泄露。学到的权重也可解释:近身/忠诚/上镜 ≫ 能力

② Does the engine layer help? (ablation)② 引擎层真有效吗(消融)

Config配置AccuracyBrier
β=0 base utility (network off)β=0 纯基础效用81%0.090
β=1.6 full networkβ=1.6 完整网络86%0.051
+ momentum+ 动量81%0.053

The network layer cuts Brier 44% — it helps. Momentum hurt — it's off. Use the data, not your mouth.网络层把 Brier 降 44% —— 有效;动量反而有害 —— 已关。用数据,不靠嘴。

Fatal honesty致命的诚实

High backtest accuracy carries leakage and can't be washed — a mechanism demo, not proof. The only trustworthy test is a prospective prediction — the result doesn't exist yet, so it can't be gamed.回测高准确率带泄露、无法洗白 —— 只是机制演示,不是证明。唯一可信的测试是前瞻预测 —— 结果尚不存在,无法作弊。

Product analysis产品分析

Users, competitors, moat, risk用户、竞品、护城河、风险

Target user目标用户

People who trade these markets — want "a second opinion with a causal reason + an edge vs the line." Also serves analysts/media and technical evaluators.交易这些市场的人 —— 想要「带因果理由的第二意见 + 相对盘口的 edge」。也服务分析/媒体与技术评估方。

Competitors & differentiation竞品与差异化

Competitor竞品Has它有Lacks → we supply它缺 → 我们补
Polymarket priceaccuracy, liquidity准、流动性black box → we give causal mechanism黑箱 → 给因果机制
Metaculus / poll modelsstrong on elections选举强no appointments → we cover the blind spot任命无解 → 覆盖盲区
GPT "expert panel"easy to make易做no mechanism → results emerge + prospective test无机制 → 结果涌现 + 前瞻证伪

Moat护城河

  • Explainable causal narrative (rivals are black-box or guessing)可解释的因果叙事(竞品多是黑箱或瞎猜)
  • A scientific stance: quantified leakage + prospective falsification — when AI predictions are everywhere, verifiable = trust量化泄露 + 前瞻证伪的科学态度 —— AI 预测满天飞时,可验证 = 信任
  • The overlooked appointments niche被忽视的任命类细分
  • Embeddable cards carry distribution + a traction beacon可嵌入卡片自带分发 + traction 埋点

Risk (exposed honestly)风险(诚实暴露)

  • Edge unproven — only open-market settlements will telledge 未证 —— 等前瞻结算才知道
  • Election track over-confidence, being fixed with voter blocs选举轨过度自信,待用选民分块修正
  • Reflexivity: an understood strategy gets priced away反身性:策略被理解后会被定价侵蚀
  • Scaling needs a working LLM key to run the author step规模化需有效 LLM key 跑 author
Where it stands现状

The engine runs, the product ships. What's missing isn't the tech — it's "the user who'll hurt over this." Next: put the card in front of traders and get the first real signal of demand.引擎可跑、产品已 ship。缺的不是技术,是「会为此难受的用户」—— 下一步是把卡片发到交易者面前,拿回第一个真实需求信号。