Run A/B/C video prompt tests, document failure modes, and make an evidence-based claim about what the model is modeling.