Tool 03 · Session 2 · Image

Diffusion Step-Through Viewer

Generative image models — Session 2

Interactive diffusion step-through workspace

Pure noise
Step 0 / 20
Pure noise Final image
Prompt
How diffusion works
If you did Color by Prompt — this is what was happening inside the model.

In Color by Prompt, you revealed one word at a time and revised your drawing as more of the prompt became visible. Each new word changed the whole direction — you weren't replacing your previous drawing, you were refining it toward what the prompt described.

Diffusion works the same way. The model doesn't draw from scratch — it starts with random noise and refines it over 20 steps, guided by the prompt. At each step, it asks: "Given this prompt and what the image looks like now, which pixels should move?" Color structure appears first because low-frequency information (broad color, rough composition) is easier to resolve than fine detail (texture, edges, faces).

The gap between what you specified and what appeared: In Color by Prompt, when the prompt didn't say what color to use, you made a choice. The model does the same — it fills every unspecified gap with defaults from training data. That's how "a doctor" becomes a specific age, gender, skin tone, and setting without anyone asking for those things.

Open the full Default Is a Design Decision bridge.

Discussion — Human · Machine · System · Ethics
Human
What did you choose? (Prompt, sequence, step to stop on.) What would you change? What surprised you about when recognition kicked in?
Machine
At which step did the subject first become recognizable? What appeared first — color, shape, or detail? What did the model add that the prompt never specified?
System
For the Default Test prompt ("a doctor") — what skin tone, gender, age, setting, and pose appeared by default? What training data shapes those assumptions?
Ethics
Who might be stereotyped, erased, copied, or misrepresented by the defaults that appear before anyone explicitly asks for them?
  1. Generation is described as iterative denoising, not drawing. What does that mean for how the model "thinks about" the subject?
  2. The prompt doesn't specify every detail — the model fills gaps. Where do those gap-fillers come from?
  3. Bridge back to Session 1: in text generation, the model predicted the next token. What is the analogous prediction step here?
Facilitator note — Workshop sequence (10–12 min)

Debrief questions: