The video is part of the supplemental material for the ViewDelta project hosted on arXiv. The research focuses on "Change Detection," which is the task of identifying what has been modified, added, or removed between two photos of the same scene, even if the camera angle has shifted. What the Video likely shows
This file is used to prove that the architecture can: 21206mp4
Use text tokens to focus only on specific changes rather than every pixel difference (like shadows or lighting). The video is part of the supplemental material
Correct for different camera viewpoints without needing manual calibration. 21206mp4