A few weeks back, I wrote a post citing the positioning of generative AI in the Gartner hype cycle. For context, Gartner annually publishes this research that distils more than 1700 unique and transformative technologies into a list of must-know trends and predicts the timelines of its maturity.

Artificial intelligence highlighted this year’s list with Generative AI and GANs catching everyone’s attention.

Image: Gartner

At AthenasOwl, we have recently ventured into this branch of AI to solve challenging problems for the media industry. Given its vast potential for bringing about a completely new paradigm of creative thinking, we wanted to start this article series with a conversation about this technology, some interesting applications and its utility to the media business.

The topic of Creativity through AI has been ever-debated, and I thought we share our 2 cents on it. As the subject is broad and couldn’t be given due justice within an article, we would limit the scope to explaining:

  • What is Generative AI?
  • Generative Adversarial Network (GANs)
  • Five exciting use cases for the Media & Entertainment Industry
  • What does it mean for creators?

What is Generative Artificial Intelligence?

What I cannot create, I do not understand. – Richard Feynman.

Generative AI- widely known as the creative side of AI – is a collective term for machine learning algorithms capable of producing, understanding and enhancing content. Here content refers to images, video, audio and text. 

Our first impression of AI, when thought in the context of Media & Entertainment is mostly of automation. That is, eliminating the ‘drudge’ work in creative workflows. But Generative AI distinguishes itself through algorithms which can mimic the creative process. While improving over time and with data.

Examples abound – Creating an anime version of a particular character, enhancing or artistically improving an image, writing stories or even generating AI-influencers with over a million followers.

According to the research, even though the predicted time for its plateau of productivity is another 2 to 5 years – the concept of generative models and the quest for creative algorithms isn’t new.

Restricted Boltzmann machines, Variational auto-encoders belong to this class. 

But, Generative Adversarial Networks (henceforth, GANs) are relatively new and deserve a special mention here. They are a promising family of generative models. Their uniqueness lies because there are two neural networks with a creator-critic relationship through which they create/enhance content. This uniqueness has generated interest in its research and has kept them popular for the past 3 to 4 years.

The most interesting idea in the last 10 years in machine learning – Yan LeCun (on GANs)

Currently, over 100 GANs models exist in solving a variety of use-cases.

What are Generative Adversarial Networks?

GANs are unsupervised deep learning models which consist of two competing neural networks which are trained on the same data. While one acts as a creator, the others act as a critique.

The two neural networks are termed as the generator and the discriminator. This creator-to-critique relationship is crucial and also a hallmark of most GANs.

While the generator creates new examples from the problem domain, it is the discriminator which evaluates its authenticity. Both these networks improve each other via back-propagation.

Image: https://developers.google.com/machine-learning/gan/gan_structure

What makes these models suited for content creation applications for media and entertainment businesses – their sheer ability to generate content conditionally from a variety of domains.

Five Generative AI use-cases for Media & Entertainment

Now since we have some context around Generative algorithms, we can look at 5 different use cases for Media & Entertainment businesses.

#1. Upscaling content through super-resolution:

Super-resolution is a class of machine learning techniques used to upscale a low-resolution video/image to a higher resolution. Two significant drivers are leading the requirement for high definition content:

  1. Wide-scale availability of HD devices.
  2. Industry leaders like Netflix and Prime Video are creating almost all their content in 4K, setting the norms of quality.

As a response to these drivers, legacy content shot in lower resolution needs to be remastered to keep up with the consumer benchmarks of today. This is an effort-intensive process. 

Moreover, specific content genres like documentaries, nature shows and vintage movies have inherent upscaling requirements of their own. Achieved through ESRGANs, the technique is most apt for content owners with a specific content type. For example -anime/sports etc.

To see super-resolution in action, check out this video

Credits: YouTuber Devis Shiryaev used AI-based tools to recreate a low-resolution short film from 1896 to a detailed version of the movie with 4K and 60fps.

#2.Using deep fakes to localize content

The growth of OTT platforms has boosted the distribution of content all over the world. Requirements for localization and regionalization of content is now significantly more than ever.

Some of the most common tasks for localization are dubbing and content moderation. However, disparities in the audio and the character’s lip movement during speech can create a lesser than ideal experience. Deep fake technology can help in making these differences disappear. 

The technology superimposes voice from a source to target face with a perfect lip-sync. This is achieved through face synthesis. Even if the source audio is different, it can still be matched with the artist’s/actor’s original voice. This is achieved through voice cloning.

A recent AI-enabled campaign by Malaria Must Die featured David Beckham speaking in 9 different languages to generate awareness for the cause.

Video Credits: Maleria Must Die (Youtube)

#3. Deep fakes for moderating content

For distributing content, adhering to the content norms and compliance is mandatory. Now, these content moderation rules are highly contextual. They vary internationally, depending on the platform of viewing and culture.

Sudden masking of video/audio segments compromises on viewer experience. Deepfakes can edit parts of the videos consisting of profanity and expletives with similar-sounding words or the appropriate replacement without compromising on the narrative.

#4. AI-augmented VFX

Earlier, many effort-intensive and repetitive tasks in VFX workflows such as match move, tracking, rotoscoping, and animation was outsourced to foreign studios. But as deep learning has progressed, we are now looking at a massive time & effort reduction in the overall content pipeline.

A famous example can be taken in case of creating the 9 foot – Titan from Avengers Endgame. Here AI was used to create a wide range of emotional shots which resembles closely to the actor. 

Video Credits: SlashFilms

Here, artificial intelligence learns the source face from different angles and transposes this face on to a target (here Thanos) as if it were a mask. Without AI we are actually looking at some tedious work in Marvel VFX division.

#5.Film preservation & Colourization: 

This is something which would excite film patrons and cinephiles.AI-enabled Colourization is the process of adding realistic colour to monochrome videos.

AI’s ability to colourize black and white cinema to the hues of today has always been a much-discussed prospect. When done manually, it involves Colourization of individual video frames and collates them together. Often estimating the right colour of a video scene from its grayscale frame is a challenging process.

AI-based colourization techniques (DeOldify GANs) streamline this process by disseminating colours between similar pixels in an individual frame. 

Credits: Jason Antic (Youtube)

Most importantly, the deterioration and damage to vintage film reels and content from yesteryears call for restoration efforts, and AI is a potent tool to achieve that as well.

What does it mean for creators?

No alt text provided for this image

Now, this brings us to some difficult questions. 

What does generative AI mean for creators/creative professionals? 

Does it make us any less than an algorithm?

Let me explain this with an analogy. The invention of photography in 1839 radically changed art during that time. The technology changed the mandates and liberated art from its ties to realism. 

When creating art which was a replica of reality was easily achieved, other forms like cubism were developed. Creativity transformed from copying to a higher level of expression.

Technology elevates creative bands by making the process of creation easier. Such is the case with AI. 

Here, AI is making creators experiment and making them push their own boundaries.

While the techniques mentioned above can remove the drudge work involved in content creation, there would always be a human, preferably a creative one, in the loop.

You cannot imagine the algorithm becoming a blocker in a creative pursuit. The idea here is not to replace creativity but Making Creativity Easy.

The question remains… to embrace or to wait?

The early investment in Generative AI technology is an excellent opportunity for media and entertainment businesses to tune these technologies to their use cases. Early adoption of this technology in their creative workflows means two things:

  • Ability to work at higher creative levels & facilitate creative experimentations.
  • Create and enhance content with inexpensive algorithms.

To conclude, these algorithms are here to stay and will make us push our own boundaries for creativity.

A ren[AI]ssance awaits.

For more pieces around Media & Artificial intelligence, follow us at AthenasOwl and follow #OwlThoughts #AthenasOwl

Learn how AthenasOwl is helping media & Entertainment business to localize their content via deep fakes.

Contributor: Sankalp Chaudhary