Meta's V-JEPA 2 mannequin teaches AI to know its environment

Meta on Wednesday unveiled its new V-JEPA 2 AI mannequin, a “world mannequin” that’s designed to assist AI brokers perceive the world round them.

V-JEPA 2 is an extension of the V-JEPA mannequin that Meta launched final yr, which was educated on over a million hours of video. This coaching information is meant to assist robots or different AI brokers function within the bodily world, understanding and predicting how ideas like gravity will influence what occurs subsequent in a sequence.

These are the sorts of widespread sense connections that babies and animals make as their brains develop — once you play fetch with a canine, for instance, the canine will (hopefully) perceive how bouncing a ball on the bottom will trigger it to rebound upward, or the way it ought to run towards the place it thinks the ball will land, and never the place the ball is at that exact second.

Meta depicts examples the place a robotic could also be confronted with, for instance, the point-of-view of holding a plate and a spatula and strolling towards a range with cooked eggs. The AI can predict {that a} very possible subsequent motion can be to make use of the spatula to maneuver the eggs to the plate.

In keeping with Meta, V-JEPA 2 is 30x quicker than Nvidia’s Cosmos mannequin, which additionally tries to boost intelligence associated to the bodily world. Nonetheless, Meta could also be evaluating its personal fashions based on completely different benchmarks than Nvidia.

“We consider world fashions will usher a brand new period for robotics, enabling actual world AI brokers to assist with chores and bodily duties without having astronomical quantities of robotic coaching information,” explained Meta’s Chief AI Scientist Yann LeCun in a video.