: The researchers captured synchronized data using multiple sensors, including: RGB-D Cameras : For visual and depth information.
: Labels for various activities such as "playing with blocks," "hand washing," "eating," and "sleeping."
: Extracted 2D or 3D pose estimations of the children and teachers.