Original title: The Next Generation Pixar: How AI will merge film and games

Author: Jonathan Lai

Compiled by: TechFlow

 

Over the past century, technological change has enabled many of our favorite stories. In the 1930s, for example, Walt Disney invented the multiplane camera and produced the first full-color animation with synchronized sound. This technological breakthrough led to the groundbreaking animated film Snow White and the Seven Dwarfs.

The 1940s saw the rise of Marvel and DC Comics, known as the "Golden Age of Comics," thanks to widespread use of four-color rotary presses and offset printing, which allowed comics to be printed on a large scale. The limitations of the technology—low resolution, limited tonal range, dot-matrix printing on cheap newsprint—created the iconic "pulp" look we still recognize today.

Likewise, Pixar was uniquely positioned in the 1980s to take advantage of new technology platforms—computers and 3D graphics. Co-founder Edwin Catmull was an early researcher at NYIT’s Computer Graphics Lab and Lucasfilm, pioneering basic CGI concepts that led to the first fully computer-generated animated feature, Toy Story. Pixar’s graphics rendering suite, Renderman, has been used in more than 500 films to date.

With each wave of technology, early prototypes that started out as novelties evolved into new formats for deep storytelling, led by new generations of creators. Today, we believe the next Pixar is on the horizon. Generative AI is driving a fundamental shift in creative storytelling, enabling a new generation of human creators to tell stories in entirely new ways.

Specifically, we believe that the Pixar of the next century will not be born through traditional movies or animation, but through interactive video. This new storytelling format will blur the line between video games and TV/film - blending deep storytelling with audience initiative and "game play", opening up a huge new market.

Games: The cutting edge of modern storytelling

There are two major waves emerging today that could accelerate the formation of a new generation of storytelling companies:

  1. Consumer shift towards interactive media (as opposed to linear/passive media i.e. TV/movies)

  2. Technological advances driven by generative AI

Over the past 30 years, we’ve seen a continued shift in consumers, with games and interactive media becoming more popular with each generation. For Gen Z and younger, games are now the preferred way to spend their free time, beating out TV and movies. In 2019, Netflix CEO Reed Hastings said in a letter to shareholders: “We compete with (and often lose to) Fortnite more than we do with HBO.” For most households, the question is “what are we playing” rather than “what are we watching.”

While television, movies, and books still tell compelling stories, many of the most innovative and successful new stories are now being told in games. Take Harry Potter. The open-world role-playing game Hogwarts Legacy gives players an unprecedented level of immersion in the experience of being a new student at Hogwarts. The game was a bestseller in 2023, grossing over $1 billion at launch and surpassing the box office of all Harry Potter films except the final film, Harry Potter and the Deathly Hallows: Part 2 ($1.03 billion).

Gaming intellectual property (IP) has also seen huge success in TV and film adaptations recently. Naughty Dog’s The Last of Us became HBO Max’s most-watched series in 2023, averaging 32 million viewers per episode. The Super Mario Bros. movie had the biggest opening weekend for an animated film worldwide with $1.4 billion. Then there are the critically acclaimed Fallout series, Paramount’s Halo series, Tom Holland’s Tomb Raider movie, Michael Bay’s Skibidi Toilet movie—and the list goes on.

A key reason interactive media is so powerful is that active participation helps build intimacy with a story or universe. An hour of playing a game can provide far greater concentration than an hour of passively watching TV. Many games are also social, with multiplayer mechanics baked into their core design. The most memorable stories are often the ones we create and share with friends and family.

Audiences continue to interact with intellectual property across multiple mediums (watch, play, create, share), which makes the stories more than just entertainment, but also part of a person's identity. The magical transformation happens when a person grows from a simple "Harry Potter viewer" to a "loyal Potter fan", the latter of which is more persistent and builds an identity and multi-person community around what was once a single-person activity.

Overall, while the greatest stories in our history have been told in linear media, looking ahead, games and interactive media will be where future stories will be told—and therefore where we believe the most important storytelling companies of the next century will be born.

Interactive video: the combination of narrative and games

Given the dominance of gaming in culture, we believe the next Pixar will emerge through a media format that combines storytelling with gaming. One format we see great potential for is interactive video.

First, what is interactive video and how is it different from a video game? In a video game, the developer pre-loads a set of assets into the game engine. For example, in Super Mario Bros., artists designed the Mario character, trees, and backgrounds. The programmers programmed Mario to jump 50 pixels after the player pressed the "A" button. The jump frames were rendered using the traditional graphics pipeline. This resulted in a highly deterministic and computational game architecture that the developer had full control over.

Interactive video, on the other hand, relies entirely on neural networks to generate frames in real time. No assets need to be uploaded or created other than creative prompts (which can be text or representative images). The real-time AI graphics model receives player input (such as the "up" button) and probabilistically infers the next generated game frame.

The promise of interactive video lies in merging the accessibility and narrative depth of television and film with the dynamic, player-driven systems of video games. Everyone knows how to watch television and follow a linear story. By adding video generated in real time based on player input, we can create personalized and limitless gaming experiences - potentially enabling media titles that engage fans for thousands of hours. Blizzard's World of Warcraft has been around for over 20 years and still retains approximately 7 million subscribers today.

Interactive video also offers multiple ways to consume content - viewers can easily enjoy the content as if they were watching a TV show, or actively play it on a mobile device or controller at other times. Allowing fans to experience their favorite IP universes in as many ways as possible is at the heart of transmedia storytelling, which helps to foster a sense of intimacy with the IP.

Over the past decade, many storytellers have attempted to realize the vision of interactive video. An early breakthrough was Telltale’s The Walking Dead — an interactive experience based on Robert Kirkman’s comic book series where players watch animated scenes unfold but make choices at key moments through dialogue and quick-reaction events. These choices — such as deciding which character to save during a zombie attack — create personalized variations of the story, making each playthrough unique. The Walking Dead launched in 2012 to huge success — winning multiple Game of the Year awards and selling more than 28 million copies to date.

In 2017, Netflix also entered the interactive video space, starting with the animated feature The Cat Book and culminating in the release of the critically acclaimed Black Mirror: Bandersnatch, a live-action film in which the audience makes choices for a young programmer as he adapts a fantasy book into a video game. Bandersnatch became a holiday phenomenon, attracting a cult following who created flowcharts to document every possible ending to the film.

However, despite positive reviews, both Bandersnatch and The Walking Dead faced existential crises - manually creating the countless branching stories that defined the format was too time-consuming and costly. As Telltale expanded across multiple projects, they established a culture of crunch among developers, who complained of "fatigue and burnout." Narrative quality suffered - while The Walking Dead initially had a Metacritic score of 89, four years later Telltale released one of their biggest IPs, Batman, with a less-than-satisfying 64. In 2018, Telltale declared bankruptcy, having failed to establish a sustainable business model.

For Bandersnatch, the crew shot 250 video segments, including more than 5 hours of footage, to explain the film's 5 endings. The budget and production time were reportedly twice that of a standard Black Mirror episode, and the show's producers said the complexity of the project was equivalent to "making 4 episodes at once." Finally, in 2024, Netflix decided to shut down its entire interactive specials division - and switch to making traditional games.

Until now, the content cost of interactive video projects has scaled linearly with play time — there’s no way around this. However, advances in generative AI models may be the key to driving interactive video to scale.

Generative models will soon be fast enough to power interactive video

Recent advances in image generative model distillation are stunning. In 2023, the release of the latent consistency model and SDXL Turbo significantly improved the speed and efficiency of image generation, making high-resolution rendering only one step instead of the previous 20-30 steps, and the cost was reduced by more than 30 times. The idea of ​​generating video—a consistent series of images with variations from frame to frame—suddenly became extremely feasible.

Earlier this year, OpenAI made a lot of headlines by announcing Sora, a text-to-video model that can generate videos up to 1 minute long while ensuring visual consistency. Not long after, Luma AI released Dream Machine, an even faster video model capable of generating 120 frames (about 5 seconds of video) in 120 seconds. Luma recently shared that they attracted a staggering 10 million users in just 7 weeks. Last month, Hedra Labs released Character-1, a character-focused multimodal video model that can generate 60-second videos in 90 seconds, showing expressive human emotions and voice-overs. And Runway recently launched Gen-3 Turbo, a model that can render a 10-second clip in just 15 seconds.

Today, an aspiring filmmaker can quickly generate several minutes of 720p HD video content from a text cue or reference image, paired with a start or end keyframe for added specificity. Runway has also developed a suite of editing tools that provide more granular control over the diffuse-generated video, including intra-frame camera control, frame interpolation, and motion brushes. Luma and Hedra will also soon launch their own suites of creator tools.

While it’s still early days for production workflows, we’ve already met several content creators who are using these tools to tell stories. Resemblance AI created Nexus 1945, a compelling 3-minute alternate history story of WWII, produced by Luma, Midjourney, and Eleven Labs. Independent filmmaker Uncanny Harry created a cyberpunk short with Hedra, and creators have also made music videos, trailers, travel vlogs, and even a fast food burger commercial. Since 2022, Runway has held an annual AI Film Festival to select 10 outstanding AI-made short films.

It’s important to note that there are still some limitations - there is still a clear gap in narrative quality and control between a 2-minute clip generated by a prompt and a 2-hour feature produced by a professional team. Generating what the creator wants based on a prompt or image is often difficult, and even experienced prompt engineers often abandon most of the generated content. AI creator Abel Art reports that it takes about 500 videos to generate 1 minute of coherent video. Image consistency usually starts to break down after one or two minutes of continuous video playback and often requires manual editing, which is why most generated videos today are limited to about 1 minute in length.

For most professional Hollywood studios, the videos generated by diffusion models can be used for storyboards in pre-production to visualize what a scene or character will look like, but they are not a replacement for on-set filming. There are also opportunities to use AI for audio and visual effects in post-production, but overall, the AI ​​creator toolkit is still in its early stages of development compared to traditional workflows that have undergone decades of investment.

In the short term, one of the biggest opportunities for generative video lies in the development of new media formats, such as interactive video and short films. Interactive video has been segmented into short 1-2 minute segments, based on the player's choices, and is often animated or stylized, allowing the use of lower resolution assets. What's more, the cost of creating these short videos through the diffusion model is much more cost-effective than in the Telltale/Bandersnatch era - Abel Art estimates that 1 minute of video from Luma costs $125, which is equivalent to the cost of renting a day's worth of film footage.

While the quality of generated videos today can be inconsistent, the popularity of short vertical videos like ReelShort and DramaBox has proven audience demand for episodic short-form television with low production values. Despite critics complaining about amateurish cinematography and formulaic scripts, ReelShort has driven more than 30 million downloads and more than $10 million in monthly revenue, launching thousands of mini-series like Forbidden Desire: Alpha’s Love.

The biggest technical hurdle facing interactive video is achieving frame generation rates fast enough to generate content in real time. The Dream Machine currently generates about 1 frame per second. The minimum acceptable target for modern consoles is a steady 30 FPS, with 60 FPS being the gold standard. With the help of technologies like PAB, this can be increased to 10-20 FPS on certain video types, but it's still not fast enough.

Current situation: the interactive video landscape

Given the rate of improvement we’re seeing in the underlying hardware and models, we estimate that commercially viable fully generative interactive video is approximately 2 years away.

Today, we saw progress in the research space from players like Microsoft Research and OpenAI, working on end-to-end ground-truth models for interactive video. Microsoft’s model aims to generate fully “playable worlds” in 3D. OpenAI showed a demo of Sora, a model capable of “zero-shot” Minecraft simulations: “Sora can simultaneously control the actions of a player in Minecraft, rendering the world and its dynamics with high fidelity.”

In February 2024, Google DeepMind released its own end-to-end interactive video base model, Genie. Genie is unique in its latent action model, which infers the latent action between a pair of video frames. Trained on 300,000 hours of platform videos, Genie learned to recognize character actions, such as how to cross obstacles. This latent action model is combined with a video segmenter and input into a dynamic model, which predicts the next frame to build an interactive video.

At the application level, we have seen some teams exploring new interactive video experiences. Many companies are working on making generative movies or TV shows, designed and developed around the limitations of current models. We have also seen some teams adding video elements to AI-native game engines.

Ilumine’s Latens is developing a “lucid dream simulator” where content is generated in real time as the user walks through their dreams. The slight delay helps create a hyper-realistic experience. Developers at open-source community Deforum are creating real-world installations of immersive interactive video. Dynamic is developing a simulation engine where users can control robots in first-person perspective, using fully generated video.

In the TV and film space, Fable Studio is developing Showrunner, an AI streaming service that allows fans to adapt their own versions of popular shows. Fable’s proof-of-concept project, South Park AI, received 8 million views when it premiered last summer. Solo Twin and Uncanny Harry are two AI filmmaking studios on the cutting edge. Alterverse has created an interactive video role-playing game inspired by D&D where the community decides what happens next. Late Night Labs is a new top-tier film production company integrating AI into the creative process. Odyssey is developing a visual storytelling platform powered by 4 generative models.

As the lines between film and games blur, we’ll see the emergence of AI-native game engines and tools that give creators more control. Series AI has developed the Rho Engine, an end-to-end platform for AI game development, and is using its platform to co-develop original works with major IP holders. We’re also seeing AI creation kits from Rosebud AI, Astrocade, and Videogame AI that allow people new to programming or art to quickly get started making interactive experiences.

These new AI creation kits will create market opportunities for storytelling, enabling a new class of citizen creators to bring their imaginations to life using prompt engineering, visual sketching, and voice recognition.

Who will create the interactive version of Pixar?

Pixar was able to leverage foundational technological changes in computers and 3D graphics to create an iconic company. Today, a similar wave is being experienced in the field of generative AI. However, it is important to remember that Pixar’s success is largely due to Toy Story and the classic animated films created by a world-class storytelling team led by John Lasseter. Human creativity combined with new technology creates the best stories.

Likewise, we believe the next Pixar will need to be a world-class interactive storytelling studio as well as a top technology company. Given the rapid pace of AI research, creative teams will need to work closely with AI teams to fuse narrative and game design with technical innovation. Pixar has a unique team that blends art and technology, and has a partnership with Disney. Today’s opportunity lies in a new team that can bring together the disciplines of games, film, and AI.

To be clear, this will be a huge challenge, and not just limited by technology. The team needs to explore new ways for human storytellers to work in partnership with AI tools to enhance rather than diminish their imaginations. In addition, there are many legal and ethical hurdles that need to be addressed - unless creators can prove ownership of all the data used to train the model, the legal ownership and copyright protection of AI-generated creative works remains unclear. Compensation for the original writers, artists, and producers behind the training data also needs to be addressed.

However, it’s also clear today that there’s a huge appetite for new interactive experiences. In the long term, the next Pixar could be about creating not just interactive stories, but entire virtual worlds. We’ve previously explored the potential of endless games — dynamic worlds that blend real-time level generation, personalized storytelling, and intelligent agents — similar to what HBO is envisioning with Westworld. Interactive video solves one of the biggest challenges in bringing Westworld to life — quickly generating large volumes of personalized, high-quality interactive content.

One day, with the help of AI, we might start the creative process by building a story world — a world of intellectual property that we envision fully formed, complete with characters, narrative lines, visuals, and so on — and then generate whatever media products we want for an audience or a specific situation. This would be the ultimate development of transmedia storytelling, completely blurring the lines between traditional media forms.

Pixar, Disney, and Marvel have all been able to create unforgettable worlds that become a core part of their fans’ identities. The opportunity for the next interactive Pixar lies in using generative AI to achieve the same goal — to create new story worlds that blur the boundaries of traditional storytelling formats to create worlds never seen before.