Friday, June 14, 2024

The 'Netflix of AI' that makes you a movie Director

Film and video production is big business. Movies are still going strong, Netflix, Prime, Disney, Apple and others have created a renaissance in Television. Box sets are the new movies. Social media has also embraced video with the meteoric rise of Tik Tok, Instagram, Facebook shorts and so on. YouTube is now an entertainment channel.

Similarly in learning. Video is everywhere. But it is still relatively time consuming and expensive to produce. Cut to AI…

We are on the lip of a revolution in video production. As part of a video production company then using Laserdiscs with video in interactive simulations, I used make corporate videos and interactive video simulations in the 80s/90s. The camera alone cost £35k, a full crew had to be hired, voiceovers in a professional studio (eventualy built our own in our basement), an edit suite in London. We even made a full feature film The Killer Tongue (don’t ask!).

With glimpses and demos of generated video, we are now seeing it move faster into full production, unsurprisingly from the US, where they have embraced AI and are applying it faster than any other nation.

1. Video animating an Image or prompt

I first started playing around with AI generated video from stills and it was pretty good. It’s now very good. Here’s a few examples. 

Now just type in a few words and it's done.

Turned this painting of my dog into a real dog...

Made skull turn towards viewer...

Pretty good so far...

2. Video from a Prompt

Then came prompted video, from text only. This got really good, really fast, with Sora and new players entering the market such as Luna.

Great for short video but no real long-form capability. In learning these short one-scene videos could be useful for performance support and single tasks or brief processes, even triger video as patients, customer, employees and so on. This is already happening with avatar production.

3. Netflix of AI

Meet Showrunner, where you can create your own show. Remember the Southpark episode created from AI? Same company has launched 10 shows where you can create your own episodes.

Showrunner released two episodes of Exit Valley, a Silicon Valley satire starring iconic figures like Musk, Zuck and Sam Altman. The show is an animated comedy targeting 22 episodes in its first season, some made by their own studio, the rest made by users and selected by a jury of filmmakers and creatives. The other shows, like Ikiru Shinu and Shadows over Shinjuku, are set in Neo-Tokyo, are set in distinct anime worlds, and will be available later this year.

They are using LLMs, as well as custom state-of-the art diffusion models, but what makes this different is the use of multi-agent simulation. Agents (we’ve been using these in learning projects) can build story progression and behavioural control.

This gives us a glimpse of what will be possible in learning. Tools such as these will be able to create any form of instructional video and drama, as it will be a ‘guided’ process, with the best writing, direction and editing built into the process. You are driving the creative car but there will be a ton of AI in the engine and self-driving features that allows the tricky stuff to be done to a high standard behind the scenes. Learners may even be able to create or ask for this stuff through nothing more than text requests, even spoken as you create your movie.

The AI uses character history, goals and emotions, simulation events and localities to generate scenes and image assets that are coherent and consistent with the existing story world. There is also behavioural control over agents, their actions and intentions, also in interactive conversations. The user's expectations and intentions are formed then funneled into a simple prompt to kick off the generation process.

You may think this is easy but the ‘slot-machine effect’, where things become too disjoined and random to be seen as a story, is a really difficult problem. So long-term goals and arcs are used to guide the process. Behind the scenes there is also a hidden ‘trial and error’ process, so that you do not see the misfires, wrong edits etc. The researchers likened this to Kahneman’s System 1 v System 2 thinking. Most LLM and diffusion models play to fast, quick, System 1 responses to prompts. For long-form media, you need System 2 thinking, so that more complex intentions, goals, coherence and consistency are given precedence.

Interestingly hallucinations can introduce created uncertainty, a positive thing, as happy accidents seem to be part of the creative process, as long as it does not lead to implausible outcomes. This is interesting – how to create non-deterministic creative works that are predictable but exciting, novel works.

This is what I meant by a POSTCREATION world, where creativity is not a simple sampling or remixing but a process of re-creation.

4. Live action videos

The next step, and we are surely on that Yellow Brick Road is to create your own live action movies from text and image prompts. Just prompt it with 10 to 15 words and you can generate scenes and episodes from 2 - 16 minutes. This includes AI dialogue, voice, editing, different shot types, consistent characters and story development. You can take it to another level by editing the episodes’ scripts, shots, voices and remaking episodes. We can all be live-action movie Directors.


With LLMs, in the beginning was the ‘word’, then image generation, audio generation, then short form video, now full-form creative storytelling. Using the strengths of the simulation, co-creating with the user, and the AI model, rich, interactive, and engaging storytelling experience are possible.

This is a good example of how AI has opened up a broad front attracting investment, innovation and entrepreneurship. At its hear are generative techniques but there are also lots of other approaches that form an ensemble of orchestrated approaches to solve problems.

You have probably already asked the question. Does it actually need us? Will wonderful, novel creative movies emerge without any human intervention. Some would say ‘I fear so’. I say ‘bring it on’. 

No comments: