Bridge 6 Before Session 3: Video

Time Makes Failure Visible

A video model generates frames in sequence — but it has no continuous representation of who or what is in the scene. Each frame is a new prediction, conditioned on the prompt and previous frames, but without a persistent "identity" for any subject. Over a few seconds, small inconsistencies accumulate. A face drifts. A detail appears that wasn't specified. Physics breaks. What looks stable in a single frame becomes obviously wrong in motion.

Step through 5 frames — watch identity drift over time

Frame 0 of 4

No drift yet

Cumulative drift — all 5 frames at once

Attribute

F0 (prompt)

F1 (+2s)

F2 (+4s)

F3 (+6s)

F4 (+8s)

Face

✓

⚠ slight

✗ different

Hair

✓

⚠ longer

✗ wavy

Jacket

✓ red

⚠ brighter

⚠ red-orange

✗ orange

Earring

—

+ added

persists

Setting

✓

⚠ light

✗ path changed

Key line "Each frame is a new prediction. The model has no memory of what the face looked like two seconds ago — only a statistical pull toward what usually comes next."

This is not a bug that will be fixed. It is the mechanism. Video generation is frame prediction under temporal conditioning — not a continuous simulation of a stable world. As duration increases, drift accumulates. Identity consistency tools (ControlNet, IP-Adapter, and similar) reduce drift but do not eliminate it — they add constraints, not understanding. Seeing the failure makes the mechanism visible.

Now open the tool

In the Temporal Telephone, a visual scene passes through multiple AI transformation stages. Watch what the model preserves, what it loses, and what it invents. The drift you just stepped through — stable in frame one, different person by frame four — is exactly what the tool lets you observe and document.

Open Temporal Telephone →