The TED AI Show: How AI will transform dubbing in Hollywood w/ Scott Mann

If you've ever cringed at a poorly-dubbed film, you are not alone. That's why Scott Mann founded Flawless, a company that’s transforming the world of dubbing using AI. He talks with Bilawal about why good dubbing is essential for movie making and shares the mind-blowing technology that not only lets Robert DeNiro speak perfect Spanish, but radically changes how films might get made. The two also discuss what to keep in mind as creativity, industry, and AI technology continue to intertwine -- and what we need to protect artists' rights.

Hosted on Acast. See acast.com/privacy for more information.

[00:00:01] Audio Collective.

[00:00:09] When you decide to watch a movie or TV show in a language you don't understand, you essentially

[00:00:13] have two options – subs or dubs.

[00:00:17] In neither option is ideal.

[00:00:20] Subtitles of Earth Your Gaze away from the action and could lead you to missing critical

[00:00:24] plot points.

[00:00:25] Meanwhile watching dubbed movies is even worse.

[00:00:29] The poor translation and the distractingly bad lip sync just pulls you out of the action.

[00:00:34] As anyone who's seen an old kung fu movie or spaghetti western knows, the most distracting

[00:00:40] aspect of traditional dubbing is the really bad lip sync.

[00:00:44] In honestly both of these issues are interconnected.

[00:00:48] I mean think about it – accurate translations that take into account sentence structure,

[00:00:52] cultural context and slang is already a tall order.

[00:00:56] But trying to fit that translation into the actor's original mouth movements?

[00:01:01] Bessaluck.

[00:01:02] So is this a problem that AI could help solve?

[00:01:07] Pretty soon you could watch your favorite K-Drama Stars speaking perfect English.

[00:01:11] And Tom Cruise could sound like he's fluent in any Korean dialect.

[00:01:15] But dubbing is just scratching the surface.

[00:01:18] How might this technology change the very arts of filmmaking?

[00:01:22] What does it mean for the vast labor pool involved?

[00:01:25] And what kind of impact is this going to have on our viewing experience?

[00:01:33] I'm Beloved Sedu and this is the Tete Aisha where we figure out how to live and thrive in a world

[00:01:39] where AI is changing everything.

[00:01:47] Imagine this.

[00:01:48] In 2030, the CFO of a Fortune 100 company is a bot.

[00:01:52] I'm Paul Michaelman and on Imagine This will be exploring possible futures

[00:01:56] and the implications they hold for organizations.

[00:01:59] Joining me will be B.C. G's top experts as well as my co-host G, B.C. G's conversational

[00:02:05] Gen AI agent.

[00:02:06] Blending human creativity with AI innovation, this podcast promises an unmasked listening journey.

[00:02:13] Join us on Imagine This from B.C. G.

[00:02:17] Hi listeners, it's Cheryl, your host of Tetech.

[00:02:19] I want to share a podcast with you.

[00:02:21] I think you'll love me myself and AI is a podcast featuring AI leaders from organizations like NASA,

[00:02:27] Upwork, GitHub and Meta who explore with expert hosts what success looks like with

[00:02:32] a interactive AI.

[00:02:33] And what challenges and ethical considerations they face along the way?

[00:02:37] Whether you're leading a strategic technology function or are simply curious about what's

[00:02:41] behind the hype of AI, me myself and AI delivers actionable insights while sharing the

[00:02:46] back stories of the people who make this technology work.

[00:02:49] Listen to me myself and AI wherever you stream podcasts.

[00:02:56] Computer graphics have been changing the way it's possible to tell a story since the 1970s.

[00:03:02] At the SIGRAF conference in 1986, a new studio called Pixar unveiled one of the first 3D computer

[00:03:09] animated films starring two lamps you most definitely recognize.

[00:03:15] Attendees of the 1993 SIGRAF were some of the first to see groundbreaking CGI of a little

[00:03:21] old movie about dinosaurs.

[00:03:25] Jump to 2018 at SIGRAF when a paper called Deep Video portraits used generative AI

[00:03:31] to show how we might be able to use computer graphics to manipulate human facial expressions.

[00:03:38] Tech that could make anyone say anything with extremely convincing mouth movements, popularly

[00:03:44] known as Deep Fakes.

[00:03:47] Hollywood filmmaker Scott Mann came across this paper at that conference and immediately began

[00:03:52] seeing the potential for how we could profoundly change how movies are made.

[00:03:58] Today Scott is the founder of flawless, a company that specializes in what he calls

[00:04:03] Vubbing and later this year he will release the first fully AI dubbed or Vubbed feature

[00:04:09] length movie.

[00:04:12] Scott, welcome to the show.

[00:04:13] Thank you for coming.

[00:04:14] So let's go back to 2018.

[00:04:17] You've cited coming across the Deep U portrait paper at the SIGRAF conference as one

[00:04:22] of the main reasons you got into the AI field.

[00:04:25] Can you explain for the audience why it was so significant for you?

[00:04:28] Yes, it was definitely a very kind of life-changing moment seen up here.

[00:04:32] I'll set some context maybe because I think where I was coming from this is as I've

[00:04:37] been a filmmaker right director my whole life.

[00:04:40] I've pursued that with passion wanting to express something, share experience, tell

[00:04:45] it story.

[00:04:46] I think I kind of like many artists and artists that I think we're all striving for

[00:04:50] similar things but for probably the decade of working professionally in the film industry

[00:04:55] I'd really experienced the real kind of hurdles and problems of the industry.

[00:05:00] In that it is so expensive and so difficult to make a film right, it's hundreds of people

[00:05:06] it costs a huge amount of money.

[00:05:09] You've got an army of people gathering together to make a film.

[00:05:12] I love the process.

[00:05:13] I love shooting.

[00:05:13] I love the work but it's so expensive that you kind of get boxed out an originality kind

[00:05:20] of gets booted out.

[00:05:21] You can't afford a originality in the right way so you end up with all these compromises

[00:05:25] to whether it's the film you're writing to make or the film that you make itself and

[00:05:29] then that was kind of further exacerbated by the other existential problem that I

[00:05:33] kind of recognized which was when I saw one of my movies as a film called HIGHST I

[00:05:37] done it with I had this incredible cast including Robert and Ero.

[00:05:41] We'd literally crafted this from the court together and so had crafted a character dialogue

[00:05:46] new one so I had the privilege of working with these great artists and we really kind

[00:05:51] of crafted the film into something that were very proud of.

[00:05:55] And then when I saw a foreign dub of the movie I was devastated really because that

[00:06:01] was when I first saw that when a film in its traditional kind of dubbing process go out

[00:06:06] to wider audiences beyond a home language, the fix of that is the hundred-year old fix which

[00:06:11] is rewrite the dialogue to try and fit the wrong mouth movements.

[00:06:13] A format differently to try and get an idea of synchronicity and it really kind of ruins

[00:06:18] the cherished baby that you've done in the first place so you got all this trouble,

[00:06:22] all this difficulty of making trying to make a movie and you're principally making it for

[00:06:26] one language and both of those things kind of conflict together into this problem of you don't

[00:06:31] have enough audience reach and the cost is and the time it takes to make a movie is so high

[00:06:36] that that's why it was kind of that was the problem that was kind of eating the industry as I

[00:06:41] entered it but when I saw the cigarette paper, a devidio protract it kind of blew my mind because

[00:06:48] it fundamentally was going to transform what was possible in the filmmaking process.

[00:06:53] I think seeing human renderings at such a level that you were able to change things and edit

[00:06:59] things in a way that just wasn't possible previous to that moment and for me I think we'll

[00:07:04] probably look back at that particular paper which is like a headline paper of cigarette.

[00:07:07] That was one of the first uses that people might see at today's generative AI right the

[00:07:12] first times that there's been a rendering of something that looks so human and it was this yeah

[00:07:18] it was a crazy moment seeing it and thinking wow that is going to completely change how I

[00:07:23] make films and how we might distribute films and the experience that you can then deliver to an audience.

[00:07:29] I mean it's interesting like the visual effects world has been doing digital humans and it usually

[00:07:33] involved as you said you know, immaculate the 3D scanning people and sort of using the classical

[00:07:38] computer graphics pipeline. I'm wondering you know with these in your totally rights

[00:07:43] deep video portraits was this like first instantiation of generative AI that looks so photorealistic

[00:07:49] it like passed that visual turning test. I'm curious when you came across that like how quickly

[00:07:54] did your mind go to AI dubbing for film and video? I to be able to straight away because I the

[00:08:01] dubbing solution I tried to solve with these all the traditional pipelines before so I kind of

[00:08:07] recognize that problem back I say in 2014 when I did heist and I'd always thought there's got to

[00:08:13] be a better way of doing this. So typically how it works is a director like myself right we'll

[00:08:18] finish a film will deliver it to whichever studio or owner who who is representing it and then

[00:08:24] from that point it's handed off to localization houses around the world. Those are networks of

[00:08:31] of studios around the world that typically how's these experts who have expertise in adaptation

[00:08:37] so their multilingual expertise in adaptation of rewriting the script to try and fit it into the

[00:08:42] mouth movements whilst also capturing the cultural nuance and difference of expression which is

[00:08:47] kind of impossible to be fair like so you can be the grit like you know there are some very

[00:08:53] talented people in this field it is so difficult and and those constraints they're kind of

[00:08:59] making authentic translation impossible really by in large and that's why you're kind of wrestling

[00:09:04] between the lack of immersion and the kind of stringentness of not having someone's expression

[00:09:10] in sync which has a which has another problem that I think maybe people are not fully aware of

[00:09:15] when you experience a movie and when it's in sync it's kind of the magic of movies was all

[00:09:23] all all came from this like it comes alive when it's in sync right you kind of but pulled in

[00:09:28] and you're immersed into it and when you watch a movie that's in sync where someone's expressing

[00:09:33] through their fears you're then pulled into a like a human connection where you're looking at

[00:09:38] their eyes the mouth you're reading them like we do as a human being like me you me talking to

[00:09:42] now as soon as it attaches and it's out of sync you don't get pulled into the same level

[00:09:47] of merging it's almost like a voiceover you're looking at the film you watching it in a different

[00:09:52] way and the experience is changed so what Scotch has said about things not syncing up totally

[00:09:59] resonates with me I mean even if you're in like a zoom or video chat conference call right

[00:10:04] when the audio slightly lags behind the video I am totally distracted but what about subtitles

[00:10:10] we're pretty used to subtitles by now and then if you've just got a subtitles you're actually

[00:10:15] looking at the bottom of the frame and when when you make a movie like when you edit a

[00:10:20] movie in the editorial process I'll direct a movie and edit a movie working without kind of where

[00:10:25] the audience's attention is adjacent to the frame so I'll take a lot of care for example that

[00:10:30] if someone pulls a gun at this end of the frame when we frame it up between myself it's in a

[00:10:34] photographer the editor and then the cut-to might be an eyeball reaction now if the eyeball reaction

[00:10:39] is down here in the frame of the bottom right instead of whether you think the audience don't

[00:10:43] have time to travel across and the experience is different between the shots and so as a filmic

[00:10:48] you're kind of trying to orchestrate an attention kind of movement and a rhythm that is

[00:10:54] important to the experience which subtitles like a merge do we all just kind of knocks that out

[00:10:59] completely and meanwhile like it sounds like you know creative such as yourself have to choose

[00:11:04] between a rock and a hard place it's either you have completely unsynced dialogue which

[00:11:09] totally is jarring and throws off a viewer or you're looking at subtitles which means

[00:11:13] you're not able to guide the gaze of the viewer like you intended and that's why I did

[00:11:18] kind of road test a few pipelines myself on traditional VFX work and again it just it just wasn't

[00:11:24] going to scale but but more the point it wasn't like you said it didn't pass the test it didn't

[00:11:28] pass the sniff test it was it was just not going to work that way so I think the click for me when

[00:11:33] I saw the cigarette pepper, the video portrait was oh my god this is a weird to solve that problem

[00:11:38] right that is going to solve that issue and the possibility is that my mind went to I just

[00:11:43] remember getting goosebumps and being so excited and it was the first time in my industry

[00:11:49] where I'd felt hope again where I felt actually wow this is going to enable us to

[00:11:56] to make things that we've did I've dreamt of making and and I ran after it like a crazy

[00:12:00] physics I thought this is how you can really make an impact you know I mean like me doing one film

[00:12:06] is kind of one way but but actually enabling everyone with this stuff is it's a very exciting idea

[00:12:12] it sounds like you came across this amazing piece of research you identify it a pretty cool

[00:12:17] problem and then set out to fix it so how did you go from seeing this tech at a computer graphics

[00:12:23] conference to unveiling your own technology and a product company around it just a couple years later

[00:12:29] at well I think um I think the first was was really getting to understand it a bit better the

[00:12:36] cigarette pepper, the video portrait I was stunned when I saw I didn't understand it and that

[00:12:41] that kind of set me off on a journey to try and understand that honestly that was you know I reached

[00:12:45] out to the authors of that paper a Jungu Kim Pablo Gerido and fresh Christian Theo bolt over the

[00:12:50] max plank Institute definitely signs an academia was not my area but I reached out to those guys

[00:12:56] oh my god this is so exciting I'm a filmmaker I would love to find out more and then it kind of

[00:13:01] led to founding the company with those site very scientists who now you know are still at the heart

[00:13:06] of flawless today leading the science group Christian Jungu and Pablo all are kind of filmmakers

[00:13:12] at heart we do they have a passion for film and when I sport to them that kind of came across

[00:13:17] and when I met with them and I started to understand when I look back now in a very rudimentary

[00:13:22] where how these things were working and what was the blockers from their side because I kind of

[00:13:28] I kind of pitch them what I envisaged it might be a help with with like a dubbing pipeline they

[00:13:33] like Pablo in particular as a you know chilling with a German mother he'd experienced lots of

[00:13:37] different kind of language versions of films and then that work was kind of handed over

[00:13:42] and and to Jungu Kim who brought in Neural Rendering aspect to it and they kind of jointly brought this

[00:13:48] out together and so they'd kind of dealt toward that end or ready to some degree but they had

[00:13:53] clear problems I remember Christian Thibon particular was saying there was clear issues that

[00:13:58] that they couldn't solve that that meant that the pipeline wasn't going to work for that purpose

[00:14:02] and what was quite fascinating is the problems that they couldn't solve to me were really easy

[00:14:07] problems that I knew how to fix and have the effects pipeline so in terms of collisions and stabilizing

[00:14:13] videos and these kind of things that I was used to so the journey that we went on was why don't

[00:14:19] we propose a new type of pipeline it sounds like you're working with these talented computer vision

[00:14:25] researchers and they're like well we can't do this in a fully automated fashion you're like

[00:14:29] hold up how to do this yeah it was a thing called visual effects yeah that's true it's like

[00:14:35] I think you know because you come in from two different you've got a cross demand going on right

[00:14:39] the fundamentals of scientifically solving you know foundational problems is different

[00:14:45] and maybe kind of some of the traditional processes and the effects workflow but I think I think

[00:14:50] I've noticed scientists in the film here because share the common bond of creative

[00:14:54] that creative imagination of creative problem solving that both sides are so passionate for it

[00:14:59] so even though you have two very different demands there is a real intersection

[00:15:03] crossing down the center of that and so I think you know my suggestions kind of got

[00:15:08] they're excited and then that folded into the work and it was you know and it was and so

[00:15:13] so we kind of worked through it together I reached out to the folks I'd made lines get

[00:15:18] who had made a high-swift and I thought let's test this properly let's test it on the problem

[00:15:23] that caused me to get into this in the first place so I got footage from my film Bob Deneer

[00:15:28] through lines get and use that to test it at a level where it's in a film it's doing these

[00:15:34] challenges and that became the kind of test bed and the kind of inspiration of what ultimately

[00:15:39] it was built out but we had to go through that process and kind of really work a real world

[00:15:44] POC to kind of prove out it was working and that was the very first thing that was ever as we

[00:15:48] could now call it phobed but visually translated it was a clip of Robert Deneer during his scene

[00:15:54] from high school so it was quite a cool one to do it on. Now on this journey from making movies to

[00:16:01] starting a software company you had some help along the way tell us about your co-founder and co-CEO

[00:16:08] because that is a very unconventional setup in the clients who is the co-founder co-CEO of flawless

[00:16:14] alongside me and I remember he just the reaction when he saw it he was like holy shit if this is real

[00:16:23] this is going to change everything and he had the similar kind of scale of reaction

[00:16:27] and what was great is Nick knew what to do with it. He didn't look at it like one film he looked at

[00:16:34] it as a scale of serving everyone on a global wear in a scaled-up wear which is a very different

[00:16:39] kind of mindset to approach the problem from and how to get this into software product

[00:16:45] development he'd been through all that before and various different journeys and so literally within

[00:16:50] the course of five days we should hand and said let's just do this together let's split right down

[00:16:56] the middle it's our beer beer let's build this up and actually do something with it. So that's very cool you

[00:17:01] know you've got this amazing intro disciplinary team now several years later how do you describe

[00:17:07] what flawless does how does this tech work walk us through the process? Basically how this is working

[00:17:14] is it's allowing for perfectly lip synematic versions of films in every language and it does

[00:17:23] that by understanding the kind of personal nuances of performing actors fierce and also

[00:17:31] understanding the nuances of a recording actors dialogue of a foreign language dialogue let's say

[00:17:38] track and it's able to map the mouth movements so that they are genuine to that person but

[00:17:45] but they are making the performance sounds of at the new dialogue so it syncs those things up

[00:17:50] and becomes a very kind of authentic version of a movie and we've built this out as a software

[00:17:56] principally that plugs into major localisation houses and really we've built the tools for

[00:18:03] localisation for the actors and for the artists that are here and really what it opens out is a

[00:18:10] much better way to not compromise on the vision and like your story they're language is kind of the

[00:18:16] tagline that's there for a true sync and it's it's creating an authentic version and it's really

[00:18:23] it's as if the actors performed in that language on set it's that authentic so it kind of it

[00:18:28] is just as immersive as the original is ultimately the output of that so does your tool need to

[00:18:34] train on the actors' performance in that particular movie or is it looking at other data sets

[00:18:38] to be able to deliver in the film industry or any kind of professional engineers a transactional

[00:18:44] commercial product you have to be able to have a very clear understanding of the rights of what's

[00:18:49] created or what's modified and especially in the film industry where the film industry is built in

[00:18:53] a economy of rights and you have to be able to essentially grant rights to a studio or to

[00:18:58] whoever it is so that they can then take it and be it can be distributed so what that comes back to

[00:19:04] really is part of clean data strategy it ties into technology somewhat as well in that with

[00:19:12] true sync for example it's looking at the finished film so the finished film from start to end

[00:19:17] what it's essentially doing is analyzing using a lot of deep learning processes it's analyzing

[00:19:23] and labeling that film. Simple as we're going to understand it it's almost like doing a light

[00:19:27] stage scan of the actors but just by looking at the film itself so it has this really deep

[00:19:33] intricate understanding of the actors how they've performed the you know how they're seeing

[00:19:37] certain words how they've performed in the scene and it's able to take that and then when you run

[00:19:44] new dialogue through there it's essentially kind of interjecting and mapping that and using that

[00:19:48] map but it's using what's the the new mouth movements that's called it or all based on the mouth moves

[00:19:55] of the actor in the film itself and then underneath it all you have you have elements of

[00:20:01] generative manipulation let's call it there all needs to be trimmed on clean data that's kind of all

[00:20:08] all been curated short or licensed very specifically for say adjusting mouth movements

[00:20:14] so we don't combine for example we don't kind of go back and let's say it's a film with

[00:20:19] rubber to me where we don't go back and find other files of rubber to narrow because again that

[00:20:23] would be breaking up unless we had the explicit right to do that that would be a non-skillable

[00:20:28] kind of very applicable way but we do have what we call narrow models which are very deep but

[00:20:34] very focused on very particular things so lip synchronicity is a great one to kind of highlight I'd

[00:20:39] where we have that's a I think we have over 50 million cinematic level it every frames that have

[00:20:46] trend that we've shot for purpose that we've done actually shot in houses a production house and

[00:20:51] really just kind of made this the right way where we've got the I say we've got clean data

[00:20:56] underpinnings to every stage of the process so that you're able to say okay well that's here's

[00:21:00] the film that's whether kind of copyright layer and then the modification came from here and

[00:21:04] that's the underpinnings of that training did set and that where you're able to use it and you're

[00:21:09] able to sell it like a big problem with practically all of the AI tools out there at the moment

[00:21:13] is there is no way to really transact with that because it's all really from stolen script data

[00:21:19] like and yeah you know what's the provenance of this well yep it really is it's really bad and I

[00:21:25] think that you know there is a huge difference between research did set in a commercial

[00:21:30] product right and what's happened is a lot of folks who've worked in the research player

[00:21:34] area which can be tremendous exciting research don't get me wrong but working the research

[00:21:39] area and then put out products that are best on the research data sets that there's no right

[00:21:45] to do that and that's what you're kind of seeing the reckoning of that I'd say right now is

[00:21:49] that's what's happened so I know it's taken us many years to kind of do this right it's taken us

[00:21:55] really a long time to build this up from the roots to do this in a clean way and that's why we

[00:22:00] why we kind of build the artistic rights treasury really is that's the mechanism that allows

[00:22:04] that to happen that's very cool so your tech as basically two pieces one is where you can analyze

[00:22:09] the actress performance in a movie and sort of like encapsulate that you know as a let's just call it

[00:22:15] a model and then you've got another data set that lets you you know kind of pull on figuring out how

[00:22:21] re-animate that performance where you do own all the rights for and with those things combined you get

[00:22:26] this superpower but it's not just dubbing right you've got this tool that lets you transfer the

[00:22:31] performance in one shot of an actor to a different shot how does that help filmmakers because this

[00:22:37] idea of transferring performances versus creating it from scratch seems very powerful yeah it's absolutely

[00:22:43] it's it's kind of elemental editing right you're able to perfect the performance you're able to

[00:22:48] edit things in a way that you weren't able to edit them before and it's really about for me it's

[00:22:54] it's the most exciting part of this right because it's it's changes the way you shoot a movie

[00:22:59] like when you shoot a film or when I'm shoot a lot of time is spent in repetition for example

[00:23:03] right where you will shoot a scene and then you'll throw the camera to different folks and

[00:23:09] and part of the difficulty of doing a film is that once you've kind of done a performance well

[00:23:13] your repeating it and trying to keep the continuity of that and it becomes a very kind of

[00:23:18] like mechanical difficult process there's not terribly creative for the for the folks on set and

[00:23:23] and as we kind of cover scenes we've got all this wastage of this repetitive nature being able

[00:23:29] now to take a performance this how it nailed up that exactly the performance we all want we love it

[00:23:34] and we're able to copy and pierce that into a different shot or sometimes even a different scene

[00:23:39] it really frees up that creative process and knowing that ability to be able to do that changes

[00:23:45] the way you shoot and it also offers a number of things in the post-production phase of

[00:23:50] movies and television in particular where you don't realize kind of what you have until you see

[00:23:55] the finished outcome and that's when you make sure oh we need to change that line and typically

[00:23:58] you can only do that by going back to set and re-shooting but being able to do it whether it's just

[00:24:03] having the actor say the line and then kind of see it or take it from another place it's this new

[00:24:08] era of kind of agile filmmaking I'd say and there's also a lot of other things kind of come

[00:24:13] down the pipe that feed into that as well and I think that go beyond the performance and I think

[00:24:19] we've just had a very kind of clear focus on the fundamental things that can make a real difference

[00:24:25] as tools for artists and filmmakers and that's where it's coming from so it gets me excited

[00:24:29] the possibilities and it just frees you out to do get this busy due to flawless the perfect

[00:24:36] at such an amazing tool in the toolkit for creatives have you done like actual A.V. tests of

[00:24:46] taking the all the way of dubbing something and this new way of dubbing something they are being

[00:24:52] done now so there's a bunch going on right now with different studios that we're working with but

[00:24:58] we've done testing of course like when we've seen audience reactions and audience tests

[00:25:03] part of the difficulty on this is to get a true read across it all you can only see one thing

[00:25:08] once is part of your problem but I think in terms of the advancement over subbing and dubbing

[00:25:13] it's definitely in the multiplier of a completely different level of experience. As these films come out

[00:25:18] you know there's this film screen out there in this year there's going to be things on platform

[00:25:22] at the end of this year and I think we're going to start seeing that and whether people

[00:25:26] want to see upfront that what you're about experience has had this done it actually kind of

[00:25:31] might definitely experience like almost a blind experience is probably the true one that you'd want

[00:25:37] I think how distributors and how folks want to introduce these things there's a lot of

[00:25:42] kind of commercials as well that you know we've done things where it's gone out in like 35 different

[00:25:47] languages for example around the world and I'm sure to everyone who watches it just will think

[00:25:52] it's a local language commercial right but measuring it measuring like from an art form it's

[00:25:58] difficult to truly measure because you can't or something twice once if you don't it's a little bit

[00:26:02] totally yeah how do you make them forget exactly what they saw and then watch it again yeah

[00:26:06] so it feels really exciting that you've got this technology and as it keeps getting

[00:26:11] democratized right it feels like there's like almost silos of content that are sitting around

[00:26:16] that are going to find a whole new audience I mean I can just only imagine how much latent

[00:26:20] value is sitting in catalogs around the globe yeah absolutely and it also what's going to be

[00:26:24] super interesting I think is it's going to expose so much talent from everywhere you're going to

[00:26:32] incredible actors for example who'll be able to be exposed to a world of audience you know

[00:26:38] and to be extreme you've got like it billion people on the planet and rather than having a few hundred

[00:26:42] thousand per year language you can really kind of be exposed to a lot more people and that in turn

[00:26:48] it kind of changes some of the dynamics of how films are put together and who stars in them

[00:26:53] an opportunity for people I think the big change we're about to see is going from a like a local

[00:26:59] stage to a global stage where filmmakers start thinking about making films for a global audience

[00:27:07] the knock on effect is actually quite beautiful because you the old me making a movie would be

[00:27:14] considering I'd be writing a moving and directing a movie with a consideration primarily

[00:27:18] with my own language and culture I wouldn't be thinking too much about how a choice I make is

[00:27:25] interpreted red or impactful in various different ways in different cultures now making a film

[00:27:32] that can travel globally and can let's say even have a global cast that all speak different languages

[00:27:37] put together in a film but but making a film for a global audience you then start really considering

[00:27:43] of the cultures and you start like walking in the shoes of other people by experiencing their movies

[00:27:49] and I think one of the big problems we've got in the world is if we don't understand someone

[00:27:54] we polarize right I think over the next few years as we break down that language barrier completely

[00:27:59] and we're really able to immerse and enjoy global cinema we'll start looking at each of

[00:28:05] the differently and start kind of appreciating the human component that lies at the heart of all

[00:28:11] of these stories which is kind of what everyone's been after I think. I asked the hero with a

[00:28:19] full this technology is a bridge builder you've talked about all the positives right like it

[00:28:23] really does make the world feel smaller and let you sort of appreciate content the way it was meant to be

[00:28:28] but there's a dark side to all of this too and obviously I have to ask you about that I mean AI has

[00:28:34] been seen as a job eliminator in almost every industry right and so how do you think about these

[00:28:40] concerns of potential job displacement in the dubbing industry due to the rise of this AI technology

[00:28:47] when I start this journey I'd say we had two existential threats to the film industry we had the

[00:28:53] lack of audience reach out to sheer kind of cost of production and the time it took to make

[00:28:58] production there is a third existential threat that's been introduced which is AI the biggest

[00:29:04] theft that's ever occurred is occurring right now with AI stealing artistic works and actually

[00:29:10] hurting the artistic community so I think there's a kind of broads us that that it's a huge thing

[00:29:15] and I think that how you go about that and what you do about that is really it's been fundamental

[00:29:23] to us with flawless because if you're destroying the very industry that you love and you're trying

[00:29:28] to build it's it's a pointless endeavor it's you kind of filled your mission at step one so

[00:29:32] that's really why understanding the value of artistic rights has been a big part of everything

[00:29:39] that we've built out it's an arguably the biggest thing and an understanding that the value is

[00:29:44] really laying with the artistic community the actual value is in the creatives that bring this stuff

[00:29:51] together there are these enormous talented pools whether it's a Korean Indian talent as

[00:29:57] filmmakers that are at the level of Hollywood but the stories just don't get the chance to travel

[00:30:02] we'd rather particularly in English speaking territories typically remake things you know you'll

[00:30:07] have an incredible film like the guilty for example right great movie a very kind of tight thriller

[00:30:12] incredible film that instead of people really enjoying that we're being able to enjoy it in the

[00:30:17] way that they do you know there's a 34 million dollar remake of that and if it doesn't have much value

[00:30:22] you don't get much money or time to do it so you have a lot of these localization houses are having

[00:30:28] to rush through everything which then further kind of reduces the quality so let's talk a little

[00:30:33] bit about sort of the art and the labor aspect of all of this right you're building tools with

[00:30:38] creatives at the center which really comes through very clearly in your messaging and also all the

[00:30:43] work that you've done I want to talk about some of the challenges I'm sure you face right

[00:30:48] especially you set up the perfect first hand experience you had with the box translation job

[00:30:54] for many you know a significant concern regarding dubbing is sort of like the essence or meaning

[00:30:59] of what is being said is lost in translation what are some of the challenges in maintaining

[00:31:04] their quality and accuracy of dubbing across so many different languages and like kind of capturing

[00:31:08] those cultural nuances I think it's yeah you're absolutely right it's it's a very there is a lot

[00:31:15] of nuance that when it comes down to it it's obviously different cultures have different contexts

[00:31:22] have different ways to express and that that obviously comes out and really you know it is vital

[00:31:30] for a story to kind of convert to an audience I think the the important kind of component we followed

[00:31:36] was the experts in this have worked in this field for a long time and so it was very kind of key

[00:31:44] to make this technology to be additive to them and and look at the the whole localisation process

[00:31:52] with a fresh tick on okay how can we make this this obviously more efficient but really like how do

[00:31:59] make it so we end up with something as authentic as original and that's always been the end but

[00:32:04] AI is a derivative process right it doesn't have a heart it doesn't feel something it can't

[00:32:11] come up with a new idea or a surprise I think all of those things without it's derived from feeling

[00:32:17] and being and much more kind of elemental real things and if you look at all the kind of great works

[00:32:22] and great stories over the centuries they've typically kind of gone to a place that is the human

[00:32:27] that where kind of stories and ideas emerge in creativity comes from that place so I think the

[00:32:34] change in our industry is actually going to be that even though it looked like creativity was the

[00:32:41] first thing that was being impacted when AI kind of rose to public visibility even though it felt

[00:32:47] like the creatives were getting wiped out the first I actually think that in the long term that's

[00:32:55] one of the most valuable asset because that's the one thing AI can't replace it whereas actually

[00:33:00] non creative processes other ones that AI is actually really good at right the non creative

[00:33:06] components aware I think you'll see a change and especially like heavy liver some non creative

[00:33:12] work is where it's going to impact them all from our perspective as a company it's the reason why

[00:33:18] we've made tools for artists right so if you're making a tool that is impacting let's say

[00:33:25] dubbing actors work you make it for them and give them the benefit and if you look at like AI

[00:33:31] influence on industry the conversation world people grapple with is like how do you

[00:33:37] transfer the benefit to the right people and and how does that work and and I think the framework

[00:33:43] of filmmaking strangely offers a similar framework that can benefit the artists so if

[00:33:48] if you're using let's say an artist for something or enabling something that hadn't had before

[00:33:54] they should be the ones compensated they should be the one that gets the benefit of that and for

[00:33:59] the filmmaker let's say you're getting it faster you're getting it it tweaked to the way that

[00:34:04] you're kind of directing to once it but you should be making the tools for those folks and really

[00:34:10] given that their work is the underlying work that is powering those models they should be

[00:34:16] the owners of it right so the big thing that everyone's trying to pull over over eyes at

[00:34:20] the moment is like oh yeah well any eye company that looks very valuable a lot of them

[00:34:26] look very valuable just because of all the stuff the stall and that's hidden and Nathan

[00:34:29] spewing out kind of similar looking stuff and and and really the value of those companies is laying

[00:34:35] in the underlying training data and it's not really the algorithm yeah it's like hey you just

[00:34:42] distilled all human knowledge and creativity but without the human knowledge and creativity there

[00:34:47] would be nothing to distilled exactly and if you're just scripting it right and if you're just

[00:34:51] scripting books or art of videos and just dealing it and then spin it back out again it's kind of like

[00:34:56] well that's just theft you know with copyright for example only humans should have copyright

[00:35:01] right companies shouldn't have creative copyright should hold with a human because that's it

[00:35:07] as long as it holds with a human that allows us to kind of end at the industry in a way

[00:35:12] that the gross economy of the artistic industry rather than destroy it viewing this is sort of like

[00:35:18] you know to your point there's what's ethical isn't always legal and what's legal isn't always

[00:35:24] ethical and just because you can doesn't mean you should and I think a great example of that

[00:35:28] I'd love you to talk about it briefly is you know AI voice synthesis the ability to just generate

[00:35:33] voices has gotten really really good but it seems like you all don't use it right now you actually

[00:35:38] prefer to have an actual like talent do it because it's is it is it the feeling point that you

[00:35:44] brought up earlier yeah I think that's there's two reasons I would say yes there are very like

[00:35:49] great advancements in that area in order but a bit to the point earlier in terms of understanding kind of

[00:35:57] cultural context language performance and emotion and what you want to convey there is tremendous

[00:36:04] nuance in the way someone can deliver something in human beings actors let's say right

[00:36:09] doing actors I such a level better than the very best stuff out there right now for sure but then

[00:36:17] the double point I would make is that the lift and benefit of any system that has taken a lot of

[00:36:25] performances and generators and new performance out of it should be benefiting those very actors so

[00:36:30] if an actor wants to use synthetic voices that have been created from their voice they should be

[00:36:37] rewarded for that actually a good example would be in in fall I mean if I made where we had

[00:36:42] we didn't use synthetic for this but I'll use it as an example where we did a bunch of like about

[00:36:47] 40 line changes, swear words to non swear words so art PG-30 versions now the actors

[00:36:54] going into a recording booth and recording those words to try and match to the original audio and

[00:36:59] replace them and things that's that's a lot of work it's not the most interesting work let's say

[00:37:04] for them and if they were given a choice where that could be synthetic lead on and they were

[00:37:11] rewarded and paired for the use of that that's a great benefit to them and that's a great benefit

[00:37:15] to the director to the produce to everyone else right they really helps everyone in that chain

[00:37:20] but the important bit is they are compensated for it and it's had their consent that they're

[00:37:24] happy that that performance is their performance and it's incredibly simple actually when it

[00:37:28] comes down to it isn't it like just rewarding the very people that they use it yet rather than say

[00:37:33] all you that's mine and I think that's totally very used hand wave yeah exactly so I

[00:37:40] but yeah I say obviously the courts are grappling with this right now with a number of large

[00:37:45] AI models I think it's coming out of the wash the side of this that it needs to be at and hopefully

[00:37:49] they'll continue to do that and it's a really important piece of this whole new world we're living in

[00:37:53] so you've talked about authentic translation and that's clearly what your company is doing

[00:37:58] but going full circle back to Segraff I'm also seeing your company's latest research is focused on

[00:38:03] creating 3D avatars like these implicit representations how far do you think we're from a world where

[00:38:09] we can have these like fully generated performances and maybe you have your Apple Vision Pro headset

[00:38:14] on and you're literally like you know kind of directing the talent if you will and of course if

[00:38:20] it's a Robert DeNaro he gets a massive cut of the performance there how far we from the sci-fi

[00:38:25] holiday film making that's really it's just a question yeah I think I think I would look at it like

[00:38:32] I don't think without far away but I will also say again it depends how it's built right and it

[00:38:37] depends who it's for if I'm in a holiday and didn't he resource or in a holiday back and he's

[00:38:42] that's a way of communicating or creating something that's one thing but I think I think

[00:38:50] I think it's about who kind of plugs into that new world honestly you know right now the

[00:38:54] approach you're taking is sort of this human plus machine approach right there's like the best

[00:38:58] of both worlds and I can't help but think you know we were talking about you know rock solid data

[00:39:04] with rock solid provenance I can't help but think you're essentially collecting it by process

[00:39:09] of doing human plus machine work a ton of this label training data they could automate a bunch

[00:39:14] of these tasks and I'm kind of curious do you envision a future or more and more of these tasks

[00:39:19] yet automated would you shift your focus to something else you know or is this a case of just

[00:39:25] because you can do it you shouldn't I think it's always about it's a basket that question like

[00:39:30] should you if the benefit is the folks who are using it if the benefit is the artist that using it

[00:39:34] then that I think is good for everyone I think if the benefit ends up falling to kind of others

[00:39:39] then that's where it's in question if you're offering you say tools practice like

[00:39:43] that should be in the hands of actors and that's who should be benefiting from that again I

[00:39:48] think that's the way to make the system flourish you do not want a world I said you know

[00:39:53] want to world wear the creative sort of washed out of process it should be enabling to the

[00:39:57] cribs and we should be creating more well I couldn't have said it any better myself

[00:40:02] Scott thanks for being here in the rapidly evolving landscape of AI and creativity we're seeing

[00:40:12] refreshing approach emerge instead of positioning AI as a replacement for human talent some

[00:40:18] visionaries like Scott Man are developing tools that keep creatives at the center of the process

[00:40:24] and this isn't necessarily the most lucrative decision but one that feels right for the art

[00:40:30] and science of filmmaking one where we value collaboration over disruption working with existing

[00:40:36] industries rather than trying to replace them one with ethical considerations prioritizing

[00:40:42] consent and fair compensation for creatives baking that into the business model from the start

[00:40:48] and finally one where we create empowering tools developing AI that turns creative visions

[00:40:54] into reality from facial avatars to potentially full body renderings this is in about AI versus humans

[00:41:02] but AI and humans collaborating to push storytelling boundaries together to me the future

[00:41:09] creativity is exciting as these tools develop we could be on the brink of storytelling a renaissance

[00:41:16] limited only by our imagination the tate i show is a part of the TED audio collective

[00:41:25] and is produced by TED with cosmic standard our producers are Ben Montoya and Alex Higgins

[00:41:31] our editors are band-band-cheng and alhambra salazar our showrunner is Ivana Tucker

[00:41:37] and our engineer is Asia Polar Simpson our technical director is Jacob Winick

[00:41:42] and our executive producer is Eliza Smith our researcher and fact checker is Christian Aparta

[00:41:48] and I'm your host Belabel Sedu see y'all in the next one

The TED AI Show: How AI will transform dubbing in Hollywood w/ Scott Mann

Search Episodes