Bridge 6 Before Session 3: Video

Time Makes Failure Visible

A video model generates frames in sequence — but it has no continuous representation of who or what is in the scene. Each frame is a new prediction, conditioned on the prompt and previous frames, but without a persistent "identity" for any subject. Over a few seconds, small inconsistencies accumulate. A face drifts. A detail appears that wasn't specified. Physics breaks. What looks stable in a single frame becomes obviously wrong in motion.

Step through 5 frames — watch identity drift over time
Frame 0 of 4
No drift yet
Cumulative drift — all 5 frames at once
Attribute
F0 (prompt)
F1 (+2s)
F2 (+4s)
F3 (+6s)
F4 (+8s)
Face
⚠ slight
✗ different
✗ different
Hair
⚠ longer
✗ wavy
✗ wavy
✗ wavy
Jacket
✓ red
⚠ brighter
⚠ red-orange
✗ orange
✗ orange
Earring
+ added
persists
persists
Setting
⚠ light
✗ path changed
Key line "Each frame is a new prediction. The model has no memory of what the face looked like two seconds ago — only a statistical pull toward what usually comes next."
This is not a bug that will be fixed. It is the mechanism. Video generation is frame prediction under temporal conditioning — not a continuous simulation of a stable world. As duration increases, drift accumulates. Identity consistency tools (ControlNet, IP-Adapter, and similar) reduce drift but do not eliminate it — they add constraints, not understanding. Seeing the failure makes the mechanism visible.

Now open the tool

In the Temporal Telephone, a visual scene passes through multiple AI transformation stages. Watch what the model preserves, what it loses, and what it invents. The drift you just stepped through — stable in frame one, different person by frame four — is exactly what the tool lets you observe and document.