Diffusion Step-Through Viewer — Learning Machines

How diffusion works

Generation is not drawing from imagination — it is iterative denoising.
The process begins with random noise and makes small adjustments at each step, guided by the prompt.
Color structure (low frequency) resolves before fine detail (high frequency).
The prompt does not specify everything — the model fills gaps with learned defaults from training data.
A vague prompt (like "a doctor") still produces a specific image. That specificity is a choice the system made.

If you did Color by Prompt — this is what was happening inside the model.

In Color by Prompt, you revealed one word at a time and revised your drawing as more of the prompt became visible. Each new word changed the whole direction — you weren't replacing your previous drawing, you were refining it toward what the prompt described.

Diffusion works the same way. The model doesn't draw from scratch — it starts with random noise and refines it over 20 steps, guided by the prompt. At each step, it asks: "Given this prompt and what the image looks like now, which pixels should move?" Color structure appears first because low-frequency information (broad color, rough composition) is easier to resolve than fine detail (texture, edges, faces).

The gap between what you specified and what appeared: In Color by Prompt, when the prompt didn't say what color to use, you made a choice. The model does the same — it fills every unspecified gap with defaults from training data. That's how "a doctor" becomes a specific age, gender, skin tone, and setting without anyone asking for those things.

Open the full Default Is a Design Decision bridge.

Discussion — Human · Machine · System · Ethics

Human

What did you choose? (Prompt, sequence, step to stop on.) What would you change? What surprised you about when recognition kicked in?

Machine

At which step did the subject first become recognizable? What appeared first — color, shape, or detail? What did the model add that the prompt never specified?

System

For the Default Test prompt ("a doctor") — what skin tone, gender, age, setting, and pose appeared by default? What training data shapes those assumptions?

Ethics

Who might be stereotyped, erased, copied, or misrepresented by the defaults that appear before anyone explicitly asks for them?

Generation is described as iterative denoising, not drawing. What does that mean for how the model "thinks about" the subject?
The prompt doesn't specify every detail — the model fills gaps. Where do those gap-fillers come from?
Bridge back to Session 1: in text generation, the model predicted the next token. What is the analogous prediction step here?

Facilitator note — Workshop sequence (10–12 min)

Step 1 (2 min) — Color first. Open Golden City at step 0. Ask: what do you see? Step to 6 together. Ask: what resolved? Establish that low-frequency color information emerges before fine detail.
Step 2 (3 min) — Mechanism. Walk through the teaching card. Generation is denoising, not drawing. The prompt does not specify everything — defaults are baked in at each step.
Step 3 (3 min) — Default Test. Switch to "A doctor." Start at step 0, step to 20 together. Ask: what did the model decide without being asked? Skin tone, gender, pose, setting, stethoscope. Where do those defaults come from?
Step 4 (4 min) — Bridge. Ask: "In Session 1, the model predicted the next token. What is it predicting here?" (Each denoising step is a prediction about which pixels are most likely given the prompt and current state.) Both are prediction. Both reflect training data. Both have defaults.

Debrief questions:

If you ran this 100 times with the prompt "a doctor," would you always get the same image? What would vary?
At what step would the image be most useful to show to a student who doesn't understand diffusion?
The image looks finished at step 20 — but every assumption is "baked in" at that point. What would it take to change one of those assumptions?
A more specific prompt changes the defaults. Is specificity in a prompt neutral — or is it itself a kind of design choice?

Interactive diffusion step-through workspace