Computational political science/ v2

Predictions that emerge from a network of political actors. 让预测从政治人物的关系网中涌现。

Not poll aggregation, not "expert" bots. We wire politicians, factions and institutions into a signed network, let signal propagate, and read a probability out of the interactions — then honestly quantify our own data leakage. 不是民调聚合，也不是「专家 agent 投票」。我们把政客、派系、机构连成一张有符号网络，让信号传播、从互动里读出概率 —— 并诚实量化自己的数据泄露。

See the predictions看预测→ Why we built this为什么做

markets · LLM multi-agent sims个市场 · LLM 多 agent 模拟

300+

voter archetypes in the election track选举轨里的选民原型

63.2%

rule baseline, true out-of-sample (LOO)规则基线，真实样本外 (留一)

Actor network — interactive行动者关系网 · 可交互

What it is它是什么

A computational-poli-sci experiment一个计算政治学实验

Can a political outcome be computed out of an actor network — before it is announced?政治结果能否在揭晓前，从行动者网络里「算」出来？

The core idea核心思路

Each agent = one political figure. Signal propagates along the edges — allies converge, opponents push back — and the decider's perceived utility shifts round by round. Monte-Carlo it; read out a distribution.每个 agent = 一个政治人物。信号沿边传播 —— 盟友趋同、对手反向 —— 决策者的感知效用逐轮改变。蒙特卡洛多次，读出分布。

The prediction comes from nobody's judgment — it emerges from the interaction.预测不来自任何人的判断，而是从互动中涌现。

Why it's different为什么不一样

Most "AI prediction" projects report one flattering number. This one's spine is honesty: it quantifies its own leakage (89.5% → 63.2% out-of-sample) and stakes its credibility on prospective bets whose results don't exist yet — the only test you can't game.大多数「AI 预测」只报一个好看的数字。这个项目的脊梁是诚实：量化自身泄露（样本外 89.5% → 63.2%），并把信誉押在结果尚不存在的前瞻预测上 —— 唯一无法作弊的测试。

Knowing what it doesn't know知道自己不知道

Prediction pipeline预测 pipeline

Classify → find the decider → only then split agents先判类型 → 定谁是 decider → 才分裂 agent

One template does not fit every question. The key insight: not every question has a single decider.不是「一个模板套所有问题」。关键洞察：不是每个问题都有「一个决策者」。

① Classify        question → {type, single decider?, how it settles}
       │  appointment / election / multilateral-binary / departure-timing
       ▼
② Settlement structure   who / what decides?  ← the earliest, heaviest call
       ├ appointment → one decider agent (Trump makes the call)
       ├ election    → voter blocs → vote aggregate (there is no "one person")
       ├ multilateral→ neutral adjudicator reads the whole game
       └ timing      → each actor's departure risk → who triggers first
       ▼
③ Ground the actors  candidate set + factions/blocs + relations, with cited evidence
       ▼
④ Split & simulate   independent agents · multi-round · Monte-Carlo → distribution
       ▼
⑤ Reconcile        vs market → locate the disagreement → log a bet, or raise a flag

Why ② is the hinge②为什么是关键

Network + signal flow is most faithful to appointments — someone really decides. Elections need voter blocs, not a single node. Wrong settlement structure = wrong everything — so it's the first call.关系网 + 信息流动对任命最忠实 —— 真有一个人拍板。选举该建选民分块而非单节点。选错结算结构 = 全盘错，所以它最先决定。

The mistake we corrected我们修正的错误

An early version assumed there's always a decider. For elections that's false — millions of voters in one node makes the model over-confident (Cox 98% vs market 63%). The fix: branch ② by type.早期默认「总有一个决策者」。对选举是错的 —— 几百万选民塞进单节点 → 过度自信（Cox 98% vs 盘口 63%）。修复：② 按类型分叉。

Multi-agent architecture · step ④多 agent 架构 · 第④步

Each actor is an independent LLM playing a real person每个行动者 = 一个独立 LLM，扮演真人

The outcome emerges from their interaction (built per Generative Agents / BDI).事件结果从它们的互动中涌现（按 Generative Agents / BDI 实现）。

Orchestrator  spawn N agents · advance rounds · route messages · read result
      │
      ├─▶ Agent A  persona/BDI · private info · memory · limited perception
      ├─▶ Agent B  … separate context — can't see each other's reasoning
      └─▶ Agent C  …
              │  public broadcast / targeted DM
              ▼
      Environment  event stream: [{round, from, to, public, text}…]

Inside one agent一个 agent 的内部

persona / BDIBelief · Desire · Intention + a strong, distinct persona.信念 · 欲望 · 意图 + 鲜明独立的人设。

perceptiononly public broadcasts + DMs to itself — isolated, not omniscient.只看公开广播 + 发给自己的私信 —— 隔离、非全知。

memorya stream scored by recency × importance × relevance → top-k.记忆流按近因 × 重要度 × 相关度检索 top-k。

reflectionperiodically synthesizes memory into a higher-level belief.周期性把记忆综合成更高层判断。

actiona public statement OR a private DM (back-channel vs open).公开表态或 私信游说（后门 vs 公开）。

Designed against the known failure modes对抗文献已知陷阱

Trap (arXiv:2507.19364)陷阱	Design设计
Info-asymmetry collapse信息不对称崩坏	limited perception; decider isn't omniscient受限感知；决策者非全知
Average-persona convergence平均人格收敛	independent sub-agents + strong persona + high temp独立子代理 + 强人格 + 高温
Confident nonsense自信地胡说	outcome emerges; prospective test adjudicates结果从博弈涌现；前瞻测试裁决

Tested: US–Iran ran on 6 independent Claude sub-agents that genuinely diverged (private 15–28%) instead of averaging out.已实测：美伊用 6 个独立 Claude 子代理跑通，各 agent 真分化（私评 15–28%），不塌成平均人。

02Predictions预测

One explainable card per market每个市场，一张可解释的卡

Model probability vs the market, the edge, a one-sentence "why," and a draggable actor network. Paste one line of iframe anywhere — and run your own question below.模型概率 vs 盘口、edge、一句「为什么」、一张可拖动的关系网。复制一行 iframe 放到任意页面 —— 也可以在下面跑你自己的问题。

One widget, four audiences一个 widget，四重通吃

Trader交易者

edge + a reason to trust it, in 3 secondsedge + 敢信的理由，3 秒内

Investor投资人

an AI that explains itself + embeds anywhere会自解释的 AI + 可嵌入分发

Growth增长

your link travels with every embed嵌到哪都带你的链接

Method方法论

a calibration badge puts honesty on its face校准徽章把诚实摆在脸上

Every view is counted每次浏览都在数

The card pings /api/beacon on every view. That count is the traction signal. Read-only Polymarket data; it never trades.卡片每次浏览回传 /api/beacon。这串数字就是 traction。只读 Polymarket 数据，绝不下单。

Custom event自定义事件

Predict any political question预测任意政治问题

Type a question; the engine classifies it, finds the decider, grounds the actors and returns a Why-card. Connect an API key to run live.输入一个问题；引擎判类型、定 decider、落地行动者，返回一张 Why 卡片。接上 API key 即可实时运行。

Live cards · 15 markets实时卡片 · 15 个市场

Every market, model vs the line全部市场，模型 vs 盘口

Appointments, primaries, generals and 2028 — each card is one explainable prediction. Hit Details on any card for the full decision trace.任命、初选、大选与 2028 —— 每张卡 = 一个可解释预测。点任意卡片的详细看完整决策过程。

validated · settled, the model hit已结算，模型命中 2-way · aligned, small edge同向，小 edge contrarian · against market, falsifiable反市场，可证伪

Live experiment正在跑的实验

48 leak-free bets, scored daily48 个无泄露预测，每日打分

Three-way: raw engine / temperature-calibrated / market. A remote agent pulls Polymarket settlements every day and computes Brier for all three.三方对照：raw 引擎 / 温度校准 / 市场。一个远程 agent 每天拉 Polymarket 结算，给三方算 Brier。

Where the model bets against the market模型对市场下的分歧赌注

Win these and there's a real edge; lose and it was naïve. Both are information.赢了才叫有 edge，输了就是太天真。两种都是真信息。

Market市场	Model模型	Market市场	Settle
UN Secretary-General	Grynspan 46%	29%	12/31
Israel PM	Bennett 34%	Netanyahu 33%	12/31
Wisconsin Dem primary	Barnes 42%	Hong 44%	8/11
Minnesota GOP primary	Qualls 61%	72%	8/11

Settlement calendar结算日历

6/3   ▸ 11 Korean local elections   (first leak-free scores)
6/9   ▸ SC / Maine primaries  ×2
6/16  ▸ DC / Ohio primaries   ×2
6/23  ▸ Maryland / NY  ×2     6/30 ▸ Colorado ×2
7–8   ▸ 7 gubernatorial primaries
11/3  ▸ 11 statewide governor races   (blue-wave bets resolve)
12/31 ▸ UN SG / Trump Labor Sec

It scores itself它自己打分

A remote agent pulls the latest settlements every day and outputs a three-way Brier report. The results come to you.一个远程 agent 每天自动拉最新结算，输出三方 Brier 日报。结果会自己来。

03About · why, expectations, analysis关于 · 原因、预期、分析

Markets have numbers. Experts have talk. We wanted mechanism — honestly.市场有数字，专家有话术。我们想要机制 —— 而且诚实。

Why we built it, what to expect, and how it competes — stated plainly.为什么做、预期是什么、怎么竞争 —— 直说。

The problem & the gap问题与缺口

A political market has a price — but no reason政治市场有价 —— 却无因

You know "Warsh 60%". You don't know why, and you can't compute it before the announcement.你知道「Warsh 60%」，却不知道为什么，也无法在公布前把它算出来。

Method方法	Election	Appointment	Explainable可解释	The gap缺口
Polls / fundamentals民调 / 基本面	strong	none	medium	no "polls" for appointments任命类没有民调
Expert-LLM "committee"专家 LLM 开会	~	~	weak	no mechanism; confident nonsense无机制；自信地胡说
Market price itself盘口价本身	accurate	accurate	black box	no causality, no foresight无因果、无前瞻生成

The insight: an outcome can be computed from actors interacting across a network. Especially elite appointments — no polls, decided by power structure — a slice nobody does systematically.洞察：结果能从行动者沿关系网的互动中「算」出来。尤其精英任命类 —— 没有民调、由权力结构决定 —— 是没人系统做的一块。

What we build做什么

Two tracks, split by who decides按 decider 分两轨

Validated已验证

Appointment track任命轨

One decider + influencers on a signed network + Monte-Carlo → a distribution. Fed Chair and others reproduced the real result.一个决策者 + 影响者沿有符号网络传播 + 蒙特卡洛 → 分布。Fed 主席等已复现真实结果。

In progress进行中

Election track选举轨

The decider is millions of voters → modeled as voter blocs (size × turnout × lean) → vote aggregate. Corrects the single-decider over-confidence.decider 是几百万选民 → 建模为选民分块（规模×投票率×偏好）→ 票额聚合。修正单决策者的过度自信。

Expectations · honest, not hype预期 · 诚实，不吹

We don't promise to beat the market不承诺打败盘口

We promise three things we can actually deliver.承诺三件做得到的事。

01
Explainable可解释A causal "who moved whom" narrative — not a black-box number.给出「谁推动了谁」的因果叙事，而非黑箱数字。
02
Honestly calibrated诚实校准Leakage quantified by leave-one-out (89.5% → 63.2% true). Only as confident as it has earned.留一法量化泄露（89.5% → 真实 63.2%）。只配得上它该有的自信。
03
Prospective self-proof前瞻自证Timestamp predictions on unsettled markets; after settlement, score model Brier vs market — the only test you can't game.对未结算市场登记预测；结算后算模型 vs 盘口 Brier —— 唯一无法作弊的测试。

The honesty ledger诚实账

It knows exactly what it can and can't do它精确知道自己能与不能

① Quantify the leakage (leave-one-out)① 量化泄露（留一法）

Authoring attributes while knowing the result leaks. Learn global weights, evaluate out-of-sample:填属性时已知结果 → 泄露。学全局权重，做样本外评估：

Method方法	Accuracy	Brier
Uniform guess均匀瞎猜	~22%	0.73
Learned weights (LOO)学习权重 (留一)	63%	0.72
Hand-tuned (with answers)手调 (带答案)	89%	0.19

89% washes down to 63% — the gap is leakage. The learned weights are legible too: proximity / loyalty / visibility ≫ competence.89% 洗白后只剩 63% —— 差的就是泄露。学到的权重也可解释：近身/忠诚/上镜 ≫ 能力。

② Does the engine layer help? (ablation)② 引擎层真有效吗（消融）

Config配置	Accuracy	Brier
β=0 base utility (network off)β=0 纯基础效用	81%	0.090
β=1.6 full networkβ=1.6 完整网络	86%	0.051
+ momentum+ 动量	81%	0.053

The network layer cuts Brier 44% — it helps. Momentum hurt — it's off. Use the data, not your mouth.网络层把 Brier 降 44% —— 有效；动量反而有害 —— 已关。用数据，不靠嘴。

Fatal honesty致命的诚实

High backtest accuracy carries leakage and can't be washed — a mechanism demo, not proof. The only trustworthy test is a prospective prediction — the result doesn't exist yet, so it can't be gamed.回测高准确率带泄露、无法洗白 —— 只是机制演示，不是证明。唯一可信的测试是前瞻预测 —— 结果尚不存在，无法作弊。

Product analysis产品分析

Users, competitors, moat, risk用户、竞品、护城河、风险

Target user目标用户

People who trade these markets — want "a second opinion with a causal reason + an edge vs the line." Also serves analysts/media and technical evaluators.交易这些市场的人 —— 想要「带因果理由的第二意见 + 相对盘口的 edge」。也服务分析/媒体与技术评估方。

Competitors & differentiation竞品与差异化

Competitor竞品	Has它有	Lacks → we supply它缺 → 我们补
Polymarket price	accuracy, liquidity准、流动性	black box → we give causal mechanism黑箱 → 给因果机制
Metaculus / poll models	strong on elections选举强	no appointments → we cover the blind spot任命无解 → 覆盖盲区
GPT "expert panel"	easy to make易做	no mechanism → results emerge + prospective test无机制 → 结果涌现 + 前瞻证伪

Moat护城河

Explainable causal narrative (rivals are black-box or guessing)可解释的因果叙事（竞品多是黑箱或瞎猜）
A scientific stance: quantified leakage + prospective falsification — when AI predictions are everywhere, verifiable = trust量化泄露 + 前瞻证伪的科学态度 —— AI 预测满天飞时，可验证 = 信任
The overlooked appointments niche被忽视的任命类细分
Embeddable cards carry distribution + a traction beacon可嵌入卡片自带分发 + traction 埋点

Risk (exposed honestly)风险（诚实暴露）

Edge unproven — only open-market settlements will telledge 未证 —— 等前瞻结算才知道
Election track over-confidence, being fixed with voter blocs选举轨过度自信，待用选民分块修正
Reflexivity: an understood strategy gets priced away反身性：策略被理解后会被定价侵蚀
Scaling needs a working LLM key to run the author step规模化需有效 LLM key 跑 author

Where it stands现状

The engine runs, the product ships. What's missing isn't the tech — it's "the user who'll hurt over this." Next: put the card in front of traders and get the first real signal of demand.引擎可跑、产品已 ship。缺的不是技术，是「会为此难受的用户」—— 下一步是把卡片发到交易者面前，拿回第一个真实需求信号。