close
close

What are AI world models and why are they important?

What are AI world models and why are they important?

World models, also called world simulators, are touted by some as the next big thing in AI.

AI pioneer Fei-Fei Li’s World laboratories has raised $230 million to build “big world models” and DeepMind hired one of the creators of OpenAI’s video generator, Sorato work on ‘world simulators’.

But who cares Are these things?

World models are inspired by the mental models of the world that people develop naturally. Our brains take the abstract representations of our senses and transform them into a more concrete understanding of the world around us, producing what we called “models” long before AI adopted the term. The predictions our brains make based on these models influence how we perceive the world.

A paper by AI researchers David Ha and Jurgen Schmidhuber gives the example of a baseball batter. Batters have milliseconds to decide how to swing their bat – shorter than the time it takes for visual signals to reach the brain. The reason they can hit a 100-mile-per-hour fastball is because they can instinctively predict where the ball will go, Ha and Schmidhuber say.

“For professional players, this all happens unconsciously,” the research duo writes. “Their muscles reflexively swing the bat at the right time and location, according to the predictions of their internal models. They can act quickly on their predictions of the future, without having to consciously roll out possible future scenarios to form a plan.”

It is these unconscious reasoning aspects of world models that some argue are a prerequisite for human-level intelligence.

Modeling the world

Although the concept has been around for decades, world models have recently gained popularity, in part due to their promising applications in the field of generative video.

Most, if not all, AI-generated videos veer into uncanny valley territory. Keep an eye on them long enough and all that bizarre will happen, like limbs twisting and merging together.

While a generative model trained on years of video footage can accurately predict that a basketball will bounce, it actually has no idea why – just as language models don’t really understand the concepts behind words and phrases. But a world model that even remotely understands why the basketball bounces the way it does will be better able to demonstrate that it does so.

To enable this kind of insight, world models are trained on a range of data, including photos, audio, videos and text, with the intention of creating internal representations of how the world works, and the ability to reason about the consequences of actions. .

Job Gen-3
An example of AI startup Runway’s Gen-3 video generation model. Image credits:Track

“A viewer expects the world he is looking at to behave in a similar way to his reality,” Mashrabov said. “When a feather falls with the weight of an anvil or a bowling ball shoots hundreds of feet into the air, it is shocking and takes the viewer out of the moment. With a strong world model, the model will understand this, rather than a creator defining how each object is expected to move – which is tedious, cumbersome and a bad use of time.”

But better video generation is just the tip of the iceberg for global models. Researchers including Yann LeCun, Meta’s chief AI scientist, say the models could one day be used for advanced forecasting and planning in both the digital and physical domains.

In one conversation earlier this year, LeCun described how a world model could help achieve a desired goal through reasoning. A model with a basic representation of a “world” (e.g. a video of a dirty room), given a goal (a clean room), could devise a series of actions to achieve that goal (using vacuum cleaners to sweep, cleaning the room to do) washing dishes, emptying the trash) not because it is a pattern it has observed, but because on a deeper level it knows how to go from dirty to clean.

“We need machines that understand the world; (machines) that can remember things, that have intuition, that have common sense – things that can reason and plan on the same level as humans,” LeCun said. “Despite what you may have heard from some of the most enthusiastic people, current AI systems are not capable of this.”

Although LeCun estimates that we are at least a decade away from the world models he envisions, current world models show promise as basic physics simulators.

Open AI Sora Minecraft
Sora controls a player in Minecraft – and renders the world. Image credits:Open AI

OpenAI notes in a blog post that Sora, which views it as a world model, can simulate actions like a painter leaving brushstrokes on a canvas. Models like Sora — and Sora yourself – can also be effective simulate video games. For example, Sora can display a Minecraft-like user interface and game world.

Future world models may be able to generate 3D worlds on demand for gaming, virtual photography and more, World Labs co-founder Justin Johnson said at a episode from the a16z podcast.

“We already have the ability to create virtual, interactive worlds, but it takes hundreds and hundreds of millions of dollars and a lot of development time,” Johnson said. “With (World Models) you can extract not just an image or a fragment, but a fully simulated, vibrant and interactive 3D world.”

High obstacles

While the concept is enticing, many technical challenges stand in the way.

Training and running world models requires enormous computing power, even compared to the amount currently used by generative models. While some of the latest language models can run on a modern smartphone, Sora (likely an early world model) would require thousands of GPUs to train and use, especially if its use becomes commonplace.

World models, like all AI models too hallucinate – and internalize biases in their training data. For example, a global model trained largely on videos of sunny weather in European cities might struggle to understand or represent Korean cities in snowy conditions, or simply do so incorrectly.

A general lack of training data threatens to exacerbate these problems, Mashrabov says.

“We’ve seen models really limited across generations of people of a certain type or race,” he said. “Training data for a world model should be broad enough to cover a diverse range of scenarios, but also very specific to where the AI ​​can deeply understand the nuances of those scenarios.”

In a recent one afterThe CEO of AI startup Runway, Cristóbal Valenzuela, says data and technical issues prevent current models from accurately capturing the behavior of a world’s inhabitants (for example, people and animals). “Models will need to generate consistent maps of the environment,” he said, “and the ability to navigate and interact with those environments.”

Open AI Sora
A video generated by Sora. Image credits:Open AI

However, if all major hurdles are overcome, Mashrabov believes that world models can bridge AI more robustly with the real world – leading to breakthroughs not only in virtual world generation, but also in robotics and AI decision-making.

They could also produce more capable robots.

Robots today are limited in what they can do because they are unaware of the world around them (or of their own bodies). World models could give them that awareness, Mashrabov said – at least to some extent.

“With an advanced world model, an AI could develop a personalized understanding of whatever scenario it finds itself in,” he said, “and reason out possible solutions.”