Pick a context word, see the bigram table build, then follow the arithmetic: count → divide → probability → predict. The tool makes the full frequency-division chain inspectable — every token prediction follows this chain, whether the model is simple or very large.
Live preview · launch for the interactive version
Before any prediction, there is counting: how many times did each word follow this context word in training data? The bigram table makes that count visible before the probability appears.
The model doesn’t “know” the next word. It divides: count of this continuation divided by total count of all continuations. Probability is frequency division, made explicit.
Count → divide → probability → sample. Every token prediction follows this chain. Count the Next Token exposes each step so the mechanism is inspectable — not just described.
Follow the arithmetic. Don’t skip to the probability — trace the count that produced it.
Choose a common word and watch the bigram table build. How many continuations are possible? How evenly are they distributed?
Identify the most frequent continuation. Divide its count by the total count of all continuations. Does the probability match what you would predict?
Run the same setup with a more specific context word. What happens to the distribution — does it sharpen or spread?
Name the specific mechanism: does a rare word become likely because of a particular pattern in the training data? Name that pattern.