: Define the need for better AI evaluation in video processing.

: Describe the use of "counterfactuals" and proficiency tests used in the benchmark.

: The show intentionally deconstructs the "meddling kids" archetype, making the characters more flawed and cynical.

: Research using ViLMA has shown that current video-language models often perform no better at temporal reasoning than models that only see static images. Paper Structure :