Dataset Balance Simulator

Bias as a consequence of composition, not malice. Adjust a simplified training mix and watch the model's default and its likelihoods move with the data.

Launch the tool → Open the worksheet

tools/dataset-balance-simulator/

Live preview · launch for the interactive version

§ A · What it makes visible

three hidden mechanisms

Fig. 01

The training mix

Set how often each category appears in the data the model learns from — the inputs, made adjustable.

Fig. 02

The shifted default

As the mix changes, the most-likely output changes with it. The default follows the majority.

Fig. 03

Likelihoods, not rules

Nothing is hand-coded. The skew in the data simply becomes the skew in the behavior.

§ B · How to investigate it

run it like an experiment, not a toy

01 · Predict

Before you slide

Predict which output becomes the default if one category dominates.

guess: the majority class wins

02 · Change one thing

Move a single slider

Push one category from balanced to dominant; hold the rest still.

50 / 50 → 90 / 10

03 · Compare evidence

Read the shift

How did the default and the ranked likelihoods move?

default flips to the majority

04 · Name it

Name the mechanism

Not 'it's biased' — say how: composition drove the default.

data skew → output skew

§ C · Debrief questions

after the investigation

Who chose what went into the training mix?

When does a 'default' tip over into a harm?

What would a balanced dataset even mean here?

What does the human decide once the skew is visible?

§ D · Related

pairs well with · use in context

Pairs well with

Default Test Comparison ViewerDefaults as evidence Bridge · Default is a design decisionConcept explainer

Use it in context

Session 2 · ImagesWhere this tool sits in the arc Session 2 FacilitationTiming & facilitation moves Image Default TestCapture your evidence