← Back

TurnOne

Low-rank payoff structure in 155K competitive Pokémon battles.

Python PyTorch

The Cost of Convention in Low-Rank Games — Stanford CS234, Winter 2025–26.

Expert Pokémon VGC strategies look nothing like Nash equilibria (total variation 0.99), yet expert-vs-expert outcomes match Nash payoffs within 0.02. Random strategies achieve the same result. The question is not whether experts are rational, but whether the game notices.

SVD of the ~200-action payoff matrices reveals effective rank 3 — 93% of strategic variation lives in the payoff null space. Convention is free because there aren’t enough payoff-relevant directions for experts to go wrong in different ways.

Conservative Q-Learning (offline RL) reshapes 60% of the distribution vs behavioral cloning, but 94% of that shift lands in the null space. Exploitability drops only 17%. CQL finds a different convention, not a better strategy.

See also: TurnZero — same dataset, earlier decision point, same conclusion.