How one person could create an entire movie by themselves thanks to AI
How one person could create an entire movie by themselves thanks to AI
Entertainment media generated by artificial intelligence has provoked predictions of job losses, impressed whistles from would-be creators looking to take advantage of a new paradigm and howls of outrage from dedicated fans fearing a future of soulless sub-par art. But will our future entertainment really be created by machines?
Between language models such as GPT-4 that can write lengthy scripts from just a brief prompt, deep-learning image generators such as DALL-E that can produce visual art in any style, and audio clones that can read text in any voice given enough training data, it’s easy to imagine every aspect of a video production being handed off to software. Or, more likely, the production could be handled with significantly fewer human artists.
While today it’s nowhere near as simple as describing content to an AI program and having it spit out a result, that could eventually be the reality.
“For now, using natural language processing to instruct a generative AI model such as DALL-E to create video content is still relatively difficult,” said Lourens Swanepoel, Australian-based data and AI lead at global professional services company Avanade.
“However, with the rapid evolution of these models and the underlying compute, it should be possible for a single person to create a TV-show or movie in the near future.”
As for whether that movie would be any good, Swanepoel said it would depend on the skill of the person entering the commands and finishing the product.
“People are still critical. Generative AI is not all about cost-cutting and automation, it is about augmentation,” he said.
“Ongoing change enablement will be required in helping users work iteratively from generated concepts that need to be tweaked, refined, enriched and approved. This is the productive assistant, not the replacement.”
But the constant and surprising raft of new applications for AI gives the impression that the future of entertainment is hard to predict. My general opinion on AI-generated “photographs” has been that they’re like early CGI in movies: impressive at a glance but only because we haven’t learnt the telltale signs to look for yet. Yet every new version of deep-learning models such as Midjourney produces images with more natural-looking people and more believable surroundings, even if (for now) there’s still a general Lynchian vibe, and regularly horrifying mistakes in the fingers and teeth, or objects that float or collide with each other in the wrong ways.
The new version of the OpenAI’s language model has only been out for a week, and already one user has discovered that it can read and interpret the source code of a video game, and repackage it as a sort of choose-your-own-adventure novel. Who’s to say it won’t soon be able to create its own games from scratch based on requests?
Voice models of the most prominent US celebrities are so easily accessible that creators only need to provide a written script to have audio content of them saying anything. Specifically, for some reason, particularly popular are synthetic recordings of US President Joe Biden and former presidents Barack Obama and Donald Trump swearing at each other and ranking everything from Marvel characters to Super Mario games.
This month, US-based web video production company Corridor made headlines with Anime Rock, Paper Scissors, a short film it created with rotoscoping; a technique that uses video as a basis for animation frames. It’s been implemented for decades in cartoons (Max Fleischer invented the technique), films (A Scanner Darkly) and video games (Prince of Persia).
The difference is that Corridor used Stable Diffusion, a well-known text-to-image model, rather than human illustrators for the task. It said it trained the model on frames from the anime film Vampire Hunter D: Bloodlust, resulting in a finished product that retains the movements and actions of the actors but appears colourful and animated. It also claims to be the first to have created a film in such a way.
By anime standards the video is frankly pretty bad, with the characters’ pupils, hair and the shadows across their bodies flickering and disappearing in a way that’s characteristic of computer vision but which a human animator would never opt for as a stylistic choice. Human movement on film is also very far away from the expressive limited animation of most anime, which gives the whole thing an uncanny valley vibe. Though once again, the machines will only get better.
More interesting than the animation itself is the process behind the video, which Corridor spells out in an hour-long feature, and its claims about what the technique means for the future. Unsurprisingly, its claims that it “just changed animation forever”, and that its technique could democratise an industry that traditionally has relied on highly skilled artists, didn’t sit well with a lot of commentators and animation fans.
“Not only is this a terrible, terrible idea, but it actually hurts my eyes to look at it,” wrote one detractor, with another saying, “y’all are just lazy thieves spitting on an entire art form”. Others were excited for the technology’s potential to make hyper-customised content in any style.
Taylor Blackburn, of comparison site Finder, said these developments pointed to a future of creative content with faster production times and lower costs.
“Even if it just allows you to automate a repetitive task like resizing images or transcribing audio, having it done in seconds rather than minutes can make a huge difference when you are working to a timeline,” he said.
“One of the strengths of AI is its ability to learn and adapt to new inputs, allowing it to create unique and personalised content that is tailored to individual preferences.”
It’s easy to speculate on potential future implications. Perhaps Netflix or a competitor could create a much cheaper streaming service filled with AI-made knock-offs of popular shows. Or maybe AI generation will be just like CGI and digital art is today; most products use it, but there remains a market for hand-painted portraits or traditional works like Guillermo Del Toro’s Pinocchio. Or, maybe, regulators will come to view some AI capabilities as more akin to plagiarism than generation, and limit their use.
But in the here and now, the argument between creators who want access to more powerful tools, and consumers who resent the work of dozens of experts over years being poorly replicated in a day, highlights a key challenge for AI-generated entertainment.
While many proponents claim an AI-enhanced future for content creation will let artists focus on the “what” and “why” while leaving the “how” to machines, the truth is that in many art forms, the “how” is an instrumental part of the appeal.
To use the AI-rotoscoped anime as an example, the idea was to have the filmed footage presented in a way that stylistically resembled Japanese animation. But while the AI process more or less achieved this, the final result is missing many of the hallmarks that come with authentic anime production.
In most anime, including Vampire Hunter D, characters expressively change shape, are rendered in entirely different styles depending on the situation, or have different rates of animation that add texture to the story. Employing these techniques properly would require an AI model to not only know what anime looks like, but why it looks that way.
And you see the same tension across the spectrum of generative models for speech, text, images and sound. These models are fed on the results of human creativity and skill, and are becoming adept and replicating those results. But the jury’s out on whether they could ever replicate the thought processes, theories, skills and imagination themselves.
Get news and reviews on technology, gadgets and gaming in our Technology newsletter every Friday. Sign up here.