The video is part of the supplemental material for the ViewDelta project hosted on arXiv. The research focuses on "Change Detection," which is the task of identifying what has been modified, added, or removed between two photos of the same scene, even if the camera angle has shifted. What the Video likely shows

This file is used to prove that the architecture can: 21206mp4

Use text tokens to focus only on specific changes rather than every pixel difference (like shadows or lighting). The video is part of the supplemental material

Correct for different camera viewpoints without needing manual calibration. 21206mp4