Paste a Zoom chat block of next-word guesses and see the room’s distribution beside model top-k probabilities. The gap between human prediction and model probability is the thing to study — not the answer, but why the two distributions diverge where they do.
Live preview · launch for the interactive version
When the room’s next-word guesses are pasted in, the distribution reveals how context, genre, and expectation shape human prediction — then the model’s probabilities show where and why they diverge.
The same sentence stem in a news article, a recipe, or a text message produces different guesses. Register shapes what feels obvious — and the model’s training data carries the same shaping forces.
Every next token is a ranked distribution, not a single answer. The game makes that distribution visible and discussable before the model generates its single output from the top of it.
Collect room guesses before revealing the model. The comparison is the experiment — not the answer.
Write down your own next-word guess, then collect several from the room via Zoom chat. Keep them hidden before revealing model probabilities.
Enter the guesses, then reveal the model’s top-k. Where does the room cluster? Where does the model diverge? Where do they agree?
Run the same stem with more or less preceding context. How does adding one sentence before the stem shift the distribution?
Name what the model weighted that the room didn’t. Genre expectation? Register? A statistical pattern in a specific training domain?