V-JEPA

October 03, 2025

"In June, the V-JEPA team at Meta released their next-generation 1.2-billion-parameter model, V-JEPA 2 (opens a new tab), which was pretrained on 22 million videos.

"They also applied the model to robotics: They showed how to further fine-tune a new predictor network using only about 60 hours of robot data (including videos of the robot and information about its actions), then used the fine-tuned model to plan the robot’s next action. 'Such a model can be used to solve simple robotic manipulation tasks and paves the way to future work in this direction,' Garrido said.

"To push V-JEPA 2, the team designed a more difficult benchmark for intuitive physics understanding, called IntPhys 2. V-JEPA 2 and other models did only slightly better than chance on these tougher tests.

"One reason, Garrido said, is that V-JEPA 2 can handle only about a few seconds of video as input and predict a few seconds into the future. Anything longer is forgotten.

"You could make the comparison again to infants, but Garrido had a different creature in mind. 'In a sense, the model’s memory is reminiscent of a goldfish,' he said."

Search This Blog

chatainews

V-JEPA

Comments

Post a Comment

Popular posts from this blog

When their AI chums have Bob's data

Hamza Chaudhry

Supporting Artistes (SAs)