V-JEPA


"They also applied the model to robotics: They showed how to further fine-tune a new predictor network using only about 60 hours of robot data (including videos of the robot and information about its actions), then used the fine-tuned model to plan the robot’s next action. 'Such a model can be used to solve simple robotic manipulation tasks and paves the way to future work in this direction,' Garrido said.

"To push V-JEPA 2, the team designed a more difficult benchmark for intuitive physics understanding, called IntPhys 2. V-JEPA 2 and other models did only slightly better than chance on these tougher tests. 

"One reason, Garrido said, is that V-JEPA 2 can handle only about a few seconds of video as input and predict a few seconds into the future. Anything longer is forgotten. 

"You could make the comparison again to infants, but Garrido had a different creature in mind. 'In a sense, the model’s memory is reminiscent of a goldfish,' he said."



Comments

Popular posts from this blog

Hamza Chaudhry

Swarm 🦹‍♂️

Digital ID tracking system