Home/ Tool index/ Dataset Balance Simulator
Launch ready Session 2 · Images Interactive

Dataset Balance Simulator

Bias as a consequence of composition, not malice. Adjust a simplified training mix and watch the model's default and its likelihoods move with the data.

tools/dataset-balance-simulator/

Live preview · launch for the interactive version

§ A · What it makes visible

three hidden mechanisms
Fig. 01

The training mix

Set how often each category appears in the data the model learns from — the inputs, made adjustable.

Fig. 02

The shifted default

As the mix changes, the most-likely output changes with it. The default follows the majority.

Fig. 03

Likelihoods, not rules

Nothing is hand-coded. The skew in the data simply becomes the skew in the behavior.

§ B · How to investigate it

run it like an experiment, not a toy
01 · Predict

Before you slide

Predict which output becomes the default if one category dominates.

guess: the majority class wins
02 · Change one thing

Move a single slider

Push one category from balanced to dominant; hold the rest still.

50 / 50 → 90 / 10
03 · Compare evidence

Read the shift

How did the default and the ranked likelihoods move?

default flips to the majority
04 · Name it

Name the mechanism

Not 'it's biased' — say how: composition drove the default.

data skew → output skew

§ C · Debrief questions

after the investigation
Who chose what went into the training mix?
When does a 'default' tip over into a harm?
What would a balanced dataset even mean here?
What does the human decide once the skew is visible?

§ D · Related

pairs well with · use in context