Google’s UK-based AI research arm, DeepMind, has lifted the veil on Genie 3, its newest world model designed to train general-purpose AI agents, a move it insists represents a major milestone on the road to artificial general intelligence (AGI).
Unlike earlier iterations, Genie 3 can generate multiple minutes of interactive 3D worlds at 720p resolution and 24 frames per second from simple text prompts, producing environments that remain physically consistent over time, a trait the lab says it did not explicitly programme. The model is capable of rendering richly detailed photo-realistic or imaginary worlds, and allows users to influence in-world events using additional prompts.
Although still in research preview, Genie 3 builds significantly on its predecessor Genie 2, which could only create 10–20-second clips, and DeepMind’s Veo 3, which has demonstrated advanced understanding of physics in video generation. Critically, Genie 3 operates auto-regressively, generating each frame based on prior ones, thereby “remembering” what has already unfolded to preserve cause-and-effect relationships within its simulations.
This memory-like capability enables it to model physics in a manner similar to human intuition, such as anticipating when a glass might fall from a table or gauging the need to duck when an object is descending overhead.
DeepMind researchers believe the real value of Genie 3 lies in its ability to train embodied agents, AI systems that learn through interaction with their environment. During internal tests, the model successfully powered DeepMind’s multi-world agent, SIMA, to complete tasks such as approaching marked equipment in simulated warehouse settings.
By generating reliable virtual environments in which agents can practise and learn, Genie 3 helps overcome one of the biggest hurdles in developing AGI: realistic, scalable training grounds. The model also reportedly circumvents the need for hard-coded physics engines, instead teaching itself how the world works through pattern recognition over long-term simulations.
However, as excitement within the AI research community grows, creatives and industry watchers are sounding caution. Some fear world models could accelerate a trend of job displacement across gaming, animation, and filmmaking, sectors already grappling with layoffs linked to AI-driven production pipelines.
Others are raising alarms over potential copyright infringements, given that some models appear to be trained on gameplay clips or videos whose licensing status remains unclear. While Google maintains it observes YouTube’s terms of service in building such systems, it has not publicly clarified which videos are being used for training, a lack of disclosure that could leave the tech giant exposed to future legal challenges.
