The TED AI Show: How AI digital doppelgängers could change the way we communicate w/ Synthesia CEO Victor Riparbelli

As AI technology advances, it’s becoming harder and harder to distinguish between work done by humans and work done by computers. But is AI becoming more human, or are we becoming more digital? Synthesia is a video platform that uses AI to generate lifelike video avatars, further blurring the lines between humans and their digitized lookalikes. In this episode, Bilawal sits down with Synthesia’s CEO, Victor Riparbelli, to discuss the benefits of having your own AI avatar, how companies are using this tool to improve communication, and why media literacy is more important than ever in a world of ever-thinning lines between real and fake. They dissect the risks that come with making this technology available to the public, the strict rules Synthesia has in place to protect their users, and question the ethics of having a digital clone. Tune in to see if you’ll be sending your own AI avatar to your boring meetings in the near future.

For transcripts for The TED AI Show, visit go.ted.com/TTAIS-transcripts

Hosted on Acast. See acast.com/privacy for more information.

[00:00:00] Hey, Bilalvo here. Before we start the show, I have a quick favor to ask. If you're enjoying the TED AI Show, please take a moment to rate and leave a comment in your podcast app. Which episodes have you loved? And what topics do you want to hear more of? Your feedback helps us shape the show to satisfy your curiosity, bring in amazing guests, and give you the best experience possible.

[00:00:28] If you could jump online and be able to chat with your favorite musician anytime you like, for as long as you'd like, what would that be worth to you? What if you could connect with a personal dating coach, as often as you wished, to help sharpen up your online dating skills? Would that be appealing? Or what if you could make a digital copy of yourself and release your doppelganger to the web to take care of some of your online identity work for you?

[00:00:52] Much of this is actually within reach. Companies are learning to pair AI tech with video, audio, and animation tools to effectively mimic real people and real-ish interactions all at the same time. Musician FKA Twigs, for instance, built a digital clone of herself and uses it to let fans interact with a version of her. The founder of Bumble, the dating app, talked about how the future of dating might begin with digital avatars pre-interviewing each other.

[00:01:19] And that sort of flips the AI argument on its head a little bit, doesn't it?

[00:01:24] We've talked a lot about the potential and risks of AI becoming too human-like, but this is the reverse story. This is about human beings becoming more digital-like to become, in a sense, digital humans.

[00:01:37] If that's something you'd find useful, there's a handful of companies ready to help you create the digital version of you. One of those is called Synthesia.

[00:01:46] Using a short five-minute video you can record with your phone or webcam, you can build a reasonable facsimile of a human being.

[00:01:53] You can then choose a voice, give it a script, get it translated to dozens of languages, add a few design flourishes, and now you can push relatively pro-looking video content to your followers, your employees, whoever.

[00:02:06] No sets, no actors, no sweat.

[00:02:09] Many of Synthesia's clients aren't individual people. They're massive global companies like Heineken, Zoom, Xerox.

[00:02:17] Synthesia says more than 50,000 customers have built digital avatars into their comm strategies.

[00:02:23] In today's demanding market, we as team leaders need to be more than just experts at our jobs.

[00:02:29] This means that we need to be a leader, a coach, and a trainer.

[00:02:32] And we also need to embody the values, mission, and vision of our company.

[00:02:37] That probably sounds to you like a generic, typical computer-generated voice.

[00:02:42] And sure, it is.

[00:02:43] But it's also the voice of a Synthesia avatar that Electrolux, the global appliance company, uses to distribute video modules to help train its workforce.

[00:02:52] Be open and listen. Be transparent and available. Is that difficult?

[00:02:57] The tech is impressive enough that last summer, investors lifted Synthesia's valuation to unicorn status, hitting that vaunted $1 billion valuation.

[00:03:07] It seems like a lot of people are very interested and now very invested in seeing digital humans take off and take over how we communicate with each other now and into the future.

[00:03:17] But in this quest to build lifelike, useful digital avatars of ourselves, are we rewriting our understanding of what communicating human-to-human looks like?

[00:03:27] Who are we in a world that could soon be dominated by digital doppelgangers?

[00:03:35] I'm Bilal Siddhu, and this is the Tad AI Show, where we figure out how to live and thrive in a world where AI is changing everything.

[00:03:51] What does it mean to be human in a world of digital doppelgangers?

[00:03:56] Big, juicy, philosophical question, I know.

[00:03:59] But Victor Ripperbelli is one of those real humans who thinks about this a lot.

[00:04:03] He's the co-founder of Synthesia.

[00:04:06] Hey, Victor, welcome to the show.

[00:04:08] Thanks, man. Glad to be here.

[00:04:10] Just to level set this conversation first, we already have so many tools for communication, and you've talked about how text was the original data compression for human communication.

[00:04:20] But now we have video calls, messaging, social media, podcasts, newsletters, emails, the list goes on.

[00:04:26] Why are digital avatars necessary?

[00:04:29] I think at its very core, almost any technology we've invented for communication kind of abstracts something away, right?

[00:04:37] Like text being the most obvious example, where if you take the experience of me talking to you in real life and delivering some kind of a message versus the way you kind of perceive that message, interpret that message, would be very different than if I sent you exactly the same words written in a text message.

[00:04:53] Totally.

[00:04:54] I mean, even kind of pre-text, right?

[00:04:56] We had cave paintings with all sorts of other technologies that essentially helped us kind of like store information and deliver it to someone else kind of in a different time and different space.

[00:05:06] And what we've been doing since then is really just trying as much as we can to make these technologies appear to be as close to the experience we have in real life as possible.

[00:05:16] And I think we have lots of ways we've sort of gone around that.

[00:05:20] But obviously, you know, the ultimate way of doing this is that it can replicate the actual human experience of speaking to someone.

[00:05:26] And digital humans and digital avatars, of course, is an important part in that.

[00:05:31] And on that note, I've heard you refer to your avatars as digital humans.

[00:05:35] What's the difference in your mind?

[00:05:36] I think there's a lot of different words that kind of goes around.

[00:05:40] AI clones, AI avatars, AI humans.

[00:05:42] I think ultimately, I think they all represent roughly the same thing.

[00:05:46] If you say it's an avatar or face or character, that kind of implies that it's a non-human entity where I think if you use the word human, that that does imply something kind of different about it.

[00:05:56] And with the era we're living through right now with computational intelligence improving very, very rapidly, maybe I think the reason people are talking about digital human now is because it actually feels tangible that we can create something that very closely resembles human life.

[00:06:12] Both in the real world, but I think before that in digital world.

[00:06:16] Like all of us have interacted with chat GPTs and LLMs.

[00:06:19] We've seen firsthand the power, how much they can actually pretend to be a human.

[00:06:23] And if we can give them the kind of visual expression and the audio expression of that as well, digital humans, it actually does feel like we're going to get pretty close to be able to create something that feels like a digital human.

[00:06:38] Not just because we use the word, but because it actually feels like we're in track with it.

[00:06:42] So next year we'll launch a real-time avatar we can actually talk to.

[00:06:45] And I think there's probably something there where that's when we begin to think of it more as a human than we think of it as just a technology.

[00:06:54] And I think maybe a good way to anchor that is when you think about like a chatbot and chat GPT.

[00:06:59] One thing that's very interesting is that I do this myself and I think most people do is that when you're interacting with these systems, people are actually quite polite.

[00:07:07] Yeah, definitely.

[00:07:08] You talk to chat GPT like it's actually like a coworker.

[00:07:11] You say, please.

[00:07:12] And it's kind of weird, right?

[00:07:13] Because you're interacting with a computer that has no feelings as far as we're aware.

[00:07:17] But because technology is now so powerful, it's very hard for us, despite constantly knowing that we're interacting with a large language model, to feel that way, right?

[00:07:26] And I think that is our relationship with machines that's about to change quite dramatically.

[00:07:31] And digital humans is going to be the most obvious expression of that.

[00:07:34] There are two ways that a person can use Synthesia to create a digital human.

[00:07:38] They can pick from these off-the-shelf avatars that you own and build, or they can get custom avatars made of themselves.

[00:07:46] I'm curious, which is the more popular route?

[00:07:48] It's actually like roughly 50-50.

[00:07:50] In the beginning, we were kind of like, which one is the most important, right?

[00:07:54] And I think as kind of time has gone on, it's very clear that there's no answer to that.

[00:07:58] They both serve different purposes.

[00:08:00] One of the things we learned very early on when we started the company was that one of the big reasons that people love the product so much was because they didn't have to be on camera themselves.

[00:08:11] They don't like how they sound.

[00:08:12] They don't like their accent.

[00:08:13] And so a big part of the value proposition around Synthesia was actually that people could make video without having to be themselves, right?

[00:08:20] And that was a pretty big unlock.

[00:08:22] But then it's also very obvious that there's also a bunch of use cases where you want it to be yourself, right?

[00:08:27] So if you're a CEO creating a video about your company's strategy for next year, that's kind of weird coming from an anonymous avatar.

[00:08:34] If you're a salesperson sending out videos to your prospects or to your existing customers to update them on something that's happening in the product, it makes a lot of sense that it's you and so on and so forth.

[00:08:43] So I think it's just there will be many different types of use cases.

[00:08:46] And I think we'll see a mix of people's own avatars.

[00:08:50] We'll see entirely generated avatars that are specific to companies and our customers, right?

[00:08:56] So you can build your own kind of like IP, if you will.

[00:08:59] And there's also going to be existing real celebrities that's going to have, there's going to be a big unlock in terms of how they can work with brands in a much more scalable way than they could before.

[00:09:08] Look, even for myself, I would love to have my digital avatar, digital human be delegated to do a bunch of this stuff, especially the setup process of recording a video, I think is painful.

[00:09:19] But I'm curious for the demographic that you talked about that is super excited about not having to go through that pain or perhaps didn't grow up with selfie culture in this world with cameras all around them.

[00:09:30] When those folks first encounter their digital avatars, what kind of reactions do you typically see?

[00:09:34] A lot of people are very self-conscious, like they would be if they recorded, you know, just a screen recording of themselves or like a selfie video.

[00:09:42] But people like it when they like the result.

[00:09:45] And I think one interesting anecdote here is, you know, in the early days of Instagram, for example, the big road hack that Instagram employed was actually filters on images and on videos, right?

[00:09:55] It's actually very simple.

[00:09:56] It's like you take a picture and you make it, you know, slightly more saturated to make it like black and white or whatever.

[00:10:01] But that makes that picture appear to look much, much better than before.

[00:10:05] Whereas every single image that people are taking before on their home cameras would look fairly crappy without having someone actually edit it, which was like out of bounds for like most people.

[00:10:15] And so I think what we see a lot of that is the same with avatars.

[00:10:19] People want to kind of like touch themselves up.

[00:10:21] They want to make sure that they're like, you know, being shot in a nice environment with nice lighting, that they're like wearing their best clothes.

[00:10:27] They want to be like the best representation of themselves.

[00:10:29] But I think in general, people love it, right?

[00:10:31] People, especially people who doesn't want to be in video, once they're happy with their avatar, unlock so much for them.

[00:10:36] Like executives who otherwise are asked to record videos several days a week.

[00:10:41] They now don't have to do that.

[00:10:42] They can work with their team to just create the content automatically.

[00:10:44] And then I think also people have this sort of on a personal level, right?

[00:10:48] It's kind of, it's pretty odd the first time you see your avatar.

[00:10:51] It's pretty odd the first time you hear yourself speaking a language that you don't actually speak.

[00:10:56] And it's clearly your voice.

[00:10:57] It sounds like you.

[00:10:59] And I think that's a very interesting glimpse for people into kind of like the future, right?

[00:11:04] A lot of these, what I love about Gen AI as kind of a cultural movement and technology movement is that it's so accessible that all of us actually gets to feel firsthand what these technologies mean, right?

[00:11:14] What can they do?

[00:11:15] How powerful are they?

[00:11:15] And this is just such a visceral, I think, experience of something that AI can do.

[00:11:21] And I think also everyone feels like, well, this is only going to get better and better and better.

[00:11:24] Even though, of course, we've made a lot of progress, there's still so much more to go.

[00:11:29] I mean, these avatars are really cool.

[00:11:31] And I will say, I mean, especially coming from a VFX and CG background, you can at this stage tell that they're still an avatar.

[00:11:39] There's that whole uncanny valley question.

[00:11:41] And I'm curious, on the consumption end of this, what are the reactions like?

[00:11:44] And does the context matter there?

[00:11:47] Like if people are reacting to a video in like a sales inbound email versus, you know, and countering it on a banking website versus a virtual CEO address, how do people react to these digital humans in these various contexts?

[00:12:00] So I think you nailed it there, right?

[00:12:02] It's very much about the context.

[00:12:03] I'm pretty sure that if I use my avatar to record a love letter to my girlfriend, she's going to be a bit disappointed that I sent my avatar to do that and not my real self.

[00:12:17] But if you're a user trying to understand like your mortgage application on a banking website, and you're presented with like 10 pages of text of very kind of complex, with very complex information, almost everyone prefers to watch a video that just simplifies it for them, right?

[00:12:31] So what we generally see a lot of our customers, if not almost all of our customers, I think, is that they introduce the avatar like, hey, this is your virtual facilitator.

[00:12:38] This is not a real person.

[00:12:40] This is an avatar, and they're going to help you through the buying process.

[00:12:43] They're going to help onboard you to your company, whatever.

[00:12:45] And then what we see overwhelmingly is just that people really love interacting with these videos, especially if the alternative is text, right?

[00:12:54] We just did a big study with UCL here in London because we wanted to investigate how do people actually react to these videos.

[00:13:01] There's a few kind of interesting stats.

[00:13:03] One of them is that people actually completed the videos with avatars faster than the one with humans.

[00:13:11] That's because when they watch the videos of humans, humans are more imperfect, right?

[00:13:16] Like, you know, we kind of use a few too many words or we say something a little bit clunky or whatever.

[00:13:22] And so people kind of scroll back into the video to watch a section again.

[00:13:26] But with the avatars, because it's kind of like perfect in the sense that the script has been kind of written from the get-go,

[00:13:32] the information is actually more concise.

[00:13:33] And also very overwhelming shows that people by far prefer to learn by watching AI videos rather than ingesting text.

[00:13:41] The study that you just, the stats that you mentioned make total sense to me, right?

[00:13:45] It's like you're distilling down the information and just like communicating it in a far crisper fashion than, say, you know, a long meandering conversation from a human.

[00:13:53] Though, you know, like some humans are more concise than others.

[00:13:57] When it comes to that CEO example, though, how important is photorealism to you?

[00:14:02] Maybe to level set, if I had to ask you to grade, you know, the photorealism of your avatars right now on a scale of 1 to 10, where would you put it?

[00:14:11] I think if you, I think you have to dissect it a little bit.

[00:14:14] I think if you take the photorealism as in like how real does it kind of look, I think it's very close to 10.

[00:14:20] I think what's, like if you took a still frame of the video, right?

[00:14:26] I think it's very difficult to tell that it's an avatar, which is very large part due to AI very good like rendering.

[00:14:32] I think where avatars still have a bit of a way to go, right?

[00:14:34] Is the body language matching what you say.

[00:14:38] There's a beat to what we say.

[00:14:40] When I speak to you now, right?

[00:14:41] Like my eyebrows move in a specific way, my hands move in a specific way.

[00:14:45] We have this whole language with our bodies.

[00:14:46] And we don't notice that in the real world because all of us, you know, do this.

[00:14:50] But we notice it when we see a video of a little avatar whose body language is kind of like out of sync.

[00:14:56] So what most avatar products in the market today and not ours, but most avatar kind of companies usually do is that you take a real video of someone and then you loop it and you just change the lips.

[00:15:08] Right.

[00:15:08] This illusion sort of works pretty well in shorter bursts, but you begin to get this kind of weird sense where the head movement is out of tune, what they're saying the hands doesn't match what's being said.

[00:15:17] And that kind of throws you off quite a bit, right?

[00:15:19] And I think there there's a little bit to go.

[00:15:21] Our new model that we're launching soon has kind of full body language, including hands.

[00:15:26] That makes a big difference.

[00:15:27] And then I think there's still in the voice, there's a little bit of imperfections.

[00:15:31] But I can think that the visual quality is more or less there.

[00:15:33] It's more about like the last percentage of like the body language and the kind of emotional expressiveness in these avatars.

[00:15:40] Right.

[00:15:40] What you're saying makes sense to me.

[00:15:42] So it's almost like the visual fidelity, if you just look at it that way, is pretty cool.

[00:15:46] It's kind of crossed the uncanny valley.

[00:15:48] But on the other hand, yeah, you're totally right.

[00:15:49] Like that emotive quality in the body language, like in motion, that still needs a little bit of work there.

[00:15:57] And that part is like, again, AI will, I think the models we have in-house have more or less solved that.

[00:16:02] But basically, I think what we've seen is that no matter how many human animators you throw at like animating a digital human, we cannot animate it to perfection.

[00:16:09] And as humans, we are so, so, so sensitive to even the slightest inconsistency.

[00:16:14] And what's amazing about AI and generative AI is that the old school way of doing this, right, is that you sit down as a human being and we try to make a list of instructions of exactly how someone should move.

[00:16:24] And of course, with AI, what we're doing is kind of like the opposite way around.

[00:16:28] We're saying, we're not going to tell you what to do.

[00:16:29] We're just going to show you a bunch of examples of how people actually move.

[00:16:32] And you can yourself learn what that means, right?

[00:16:34] So we don't tell the computer, hey, there's like six, seven facial bones and muscles and, you know, all those kind of abstractions in some sense that we as humans have built to animate digital humans.

[00:16:48] We can kind of throw those out the window and say to the machine, you know, you figure out your own taxonomy of how the body works and how people move.

[00:16:55] And that can be like a five billion parameter model that a human being would never be able to sit down and comprehend.

[00:17:00] But if the computer understands it, who cares, right?

[00:17:03] It can produce an output that actually looks and feels very realistic.

[00:17:06] And I think that's what we've seen in every modality, right?

[00:17:08] It's just that AI is extremely good at this because it can think way more abstract and in way more kind of parameters and dimensions than human beings ever could, right?

[00:17:32] I love this because this is certainly what you're describing as a huge difference to the way Hollywood has traditionally done it, where it's like, you know, crazy light stage scan where you're essentially in this dome with a bunch of lights pointed at you or, you know, a Medusa scan where you have to do these explicit expressions.

[00:17:47] So that really makes me curious, you know, for a lot of these off-the-shelf avatars you offer, you do capture a ton of your own training data when generating those.

[00:17:57] And of course, there's a process for folks to make their own digital twin, their own replica as well.

[00:18:02] Yeah, what does that process look like now?

[00:18:04] And what is it going to look like in the future?

[00:18:06] So right now, we need around three to four minutes of footage of someone.

[00:18:10] And that's just, I mean, that can be recorded with your webcam.

[00:18:12] You can record with your phone.

[00:18:13] You can go into a studio.

[00:18:15] Today, you're still, basically the input is the output, as we generally say.

[00:18:19] So if you record with your webcam, you're going to get a video back.

[00:18:22] Your avatar is going to be you sitting, recording yourself on a webcam.

[00:18:25] If you go into a studio, it's going to be you in a studio and so forth.

[00:18:27] The big thing we're launching very soon is being able to essentially create an avatar if you want,

[00:18:33] and then create new variations of your avatar in different environments.

[00:18:37] So let's say you've recorded one, we're sitting at home in your podcast studio,

[00:18:41] but now you actually want to record a video where you're on top of a mountain,

[00:18:44] or you're flying a plane, or you're skydiving, or you're doing like a million different other things.

[00:18:48] We can then create that avatar for you by you just essentially using text to prompt yourself into new scenarios.

[00:18:54] Cool.

[00:18:55] Cool.

[00:18:55] This is going to be a big, big, big unlock.

[00:18:57] So the way it works is that we still need some video of you.

[00:19:01] And the reason we need some video of you is because if we started from just an image of you,

[00:19:05] which is, that's basically the modality you want this to work in, right?

[00:19:08] You take a single image and from that you can generate a scene of you.

[00:19:11] Then we don't know anything about how you look, how you move, how your head kind of goes around, right?

[00:19:18] Even my team, you know?

[00:19:20] Even your team, the way you talk, we can never infer this from just a single image, right?

[00:19:24] Because the information is just not there.

[00:19:26] But what we want to be able to do is we want to build a model that says,

[00:19:28] this is exactly, you know, like how you move and how you speak

[00:19:31] and how your hands kind of work in conjunction with what you're saying.

[00:19:34] And then once we have that model, then we can much easier to say,

[00:19:38] okay, here's a picture of you standing on top of a mountain.

[00:19:40] Here's you in a supermarket.

[00:19:42] Here's you behind a bar or whatever.

[00:19:44] And then we can begin to create these kind of new scenes.

[00:19:46] And I think, you know, this is going to be one of those advancements

[00:19:50] that's going to have like a huge impact in terms of what people use the product for

[00:19:54] and how much fun you can have with it.

[00:19:56] I love that.

[00:19:57] It's kind of replacing the whole kind of green screen visual effects workflow, right?

[00:20:00] If you just go capture it in reasonably diffused, decent lighting,

[00:20:04] and suddenly you can kind of, you know, choose a bunch of different backgrounds.

[00:20:08] Like that's like virtual production democratized.

[00:20:11] Before I get carried away and get too excited about that,

[00:20:14] I do have a question like, so if someone creates this avatar, let's say I made it,

[00:20:17] who owns it and can I license my digital doppelganger?

[00:20:22] So you own it 100%.

[00:20:23] And if you wanted to delete it, we'll of course fully delete it.

[00:20:26] No questions asked and that'll always be the case.

[00:20:29] We are thinking about what to do with kind of likenesses

[00:20:32] and should we create a marketplace where people can rent out their likeness

[00:20:34] to work with like brands or creators.

[00:20:37] It's not a functionality we had yet.

[00:20:38] What's exciting about it is that it opens up like so many new ways of using your likeness, right?

[00:20:42] So let's say that you're a celebrity, for example.

[00:20:44] The traditional way a celebrity would engage with a brand is you say,

[00:20:48] okay, this big celebrity, we're going to go into this warehouse.

[00:20:51] We're going to shoot an advertising with you.

[00:20:53] We're going to take a bunch of still photos.

[00:20:55] And this is then sort of material for all of our campaigns moving forward, right?

[00:21:00] And maybe they'll record some social media clips as well.

[00:21:02] And then you're kind of done.

[00:21:03] You've recorded all the content and now the brand can then use that.

[00:21:05] What this unlocks is what if you have an e-commerce store

[00:21:09] and every time someone buys a product,

[00:21:11] you want to send a thank you message from a well-known celebrity.

[00:21:14] All of a sudden it doesn't necessarily need the celebrity to do much else

[00:21:17] than just say, yes, I'm fine with this.

[00:21:19] I'll license out my likeness.

[00:21:21] And maybe instead of that being kind of like a big upfront payment to the celebrity,

[00:21:25] celebrity is just paid $1 every time someone buys a product in that store, right?

[00:21:30] And the store can quickly switch out the celebrity with someone else

[00:21:33] if they want to try someone else.

[00:21:34] Or maybe they think that for one segment of their customers,

[00:21:37] the celebrity A is the best choice.

[00:21:39] For another group of customers, celebrity B is the right choice.

[00:21:42] And because everything here is generated with code,

[00:21:44] you can actually begin to do these kind of things.

[00:21:46] And so what I think we'll see is actually a democratization

[00:21:49] of working with celebrities in some sense,

[00:21:51] where today you need to have millions of dollars and big budgets

[00:21:54] and whatever to work with a big celebrity.

[00:21:56] In this way, the celebrity will actually pick who they want to work with, right?

[00:21:58] Maybe a celebrity would prefer to work with 500 small artisanal shops

[00:22:03] all over the US that each pay them less,

[00:22:07] but in aggregate pays the same as like one big Coca-Cola campaign.

[00:22:10] I think that's actually pretty interesting

[00:22:11] because my guess would be if you ask a lot of celebrities

[00:22:14] who they would prefer to work with,

[00:22:16] they probably would prefer to work with small artisanal shops

[00:22:19] with products that they actually love

[00:22:21] rather than some mega brand who'll just throw millions at them, right?

[00:22:23] So I think we'll see a lot of new business models kind of emerge.

[00:22:26] And I personally think that's pretty exciting.

[00:22:29] That is exciting indeed.

[00:22:30] And it brings me back to sort of the B2B focus for your company.

[00:22:34] Given that most of your customers are businesses,

[00:22:37] you know, what are the types of things that they're using it for?

[00:22:40] And, you know, in the past, you've described this sort of as like,

[00:22:43] you know, it was a vitamin for like the entertainment industry,

[00:22:45] but it's really a painkiller for businesses.

[00:22:47] Why is that?

[00:22:48] So when we started the company, we initially, as you said,

[00:22:52] we set up to actually build tooling for video professionals

[00:22:55] to be more efficient.

[00:22:56] And the first thing we did was build this like AI dubbing functionality.

[00:22:58] So you kind of take a real video.

[00:23:00] We did a very famous one, David Beckham, speaking obviously in English.

[00:23:03] And then we could take that advertisement

[00:23:05] and we could create it in 10 different languages.

[00:23:07] And so it looks like David Beckham, in this case,

[00:23:09] was speaking in a different language.

[00:23:11] And it's definitely a very cool product.

[00:23:13] And there was a lot of interest in it.

[00:23:14] And it did like okay in the marketplace,

[00:23:17] but just had this kind of feeling that if we disappear tomorrow,

[00:23:21] they will find another way of solving the problem, right?

[00:23:23] And it was kind of like a cool thing,

[00:23:25] but it wasn't really a painkiller, right?

[00:23:26] It was like, it was a nice thing to have.

[00:23:28] And it's very difficult to build a big company

[00:23:30] around something that's nice to have.

[00:23:32] You want to sell something that people really, really need to have.

[00:23:35] And so as we kind of went through the motions

[00:23:37] of taking that product to market

[00:23:38] and really just trying to build an understanding of video

[00:23:42] from first principles,

[00:23:43] we suddenly had this feeling that

[00:23:45] there's a lot of people in the world

[00:23:47] who are not making video today

[00:23:48] and they're desperate to make video.

[00:23:50] And when we spoke to those people,

[00:23:52] they obviously did not work in the video industry, right?

[00:23:53] They work in big companies.

[00:23:54] They're like a marketing manager,

[00:23:56] training instructor, sales professional,

[00:23:58] something like that.

[00:23:59] And they're all telling us

[00:24:00] that they are desperate to make video.

[00:24:02] They have a lot of great content,

[00:24:03] a lot of great knowledge

[00:24:04] that they want to share with their customers,

[00:24:06] with their employees,

[00:24:06] but nobody reads anymore, right?

[00:24:08] They send out these emails

[00:24:10] that just ends up in the archive.

[00:24:12] So they wanted to make videos.

[00:24:14] They tried to make videos.

[00:24:15] The thing if you work in a big company

[00:24:17] is that often there's a lot of content to produce,

[00:24:19] which means the quantity of videos

[00:24:20] you have to make is very high.

[00:24:22] There's often need to translate them.

[00:24:23] There's need to update them after you've shot them

[00:24:25] because something changed in your business.

[00:24:27] And that's just impossible to do with a real video.

[00:24:29] And so for these people,

[00:24:31] if we can give them a way to make video,

[00:24:33] which is a thousand times easier

[00:24:34] and a thousand times more affordable

[00:24:36] than shooting it with a camera,

[00:24:38] they would probably be okay

[00:24:39] with the quality of those videos being lower

[00:24:41] than what the video industry would.

[00:24:44] Because for these people,

[00:24:45] the alternative is not a real video from a camera.

[00:24:48] The alternative is text.

[00:24:50] And so it's like,

[00:24:50] you compare this to a real video

[00:24:52] or you compare it to text.

[00:24:53] It's not like people are saying,

[00:24:54] you know,

[00:24:55] all this content we use to shoot with a camera

[00:24:57] will now make with Synthesia instead.

[00:24:59] It's people saying,

[00:25:00] well, all this text that we have

[00:25:01] and all these slide decks

[00:25:02] and all this kind of static information,

[00:25:03] we can now turn that into video content.

[00:25:05] And that became the kind of inflection point for us

[00:25:08] once we kind of figured that out.

[00:25:09] And I think there's,

[00:25:11] I love what you said before

[00:25:12] because we had the same kind of feeling, right?

[00:25:14] It's like,

[00:25:14] how weird is it

[00:25:15] that potentially the biggest market for visual effects

[00:25:18] is actually going to be corporate communication

[00:25:20] in a couple of years,

[00:25:22] not Hollywood, right?

[00:25:23] That's very contradictory.

[00:25:25] Like no one would have thought that to ever happen.

[00:25:27] But in many ways,

[00:25:29] I think the biggest ideas,

[00:25:30] the most impactful ideas

[00:25:32] always feel very weird

[00:25:33] and very contradictory, right?

[00:25:34] Like Airbnb, I think it's like,

[00:25:36] what if people just like invite straight

[00:25:38] to sleep in their home

[00:25:39] for a bit of money?

[00:25:41] Like everyone would be like,

[00:25:41] they're absolutely crazy, right?

[00:25:42] But I think that's what technology kind of does.

[00:25:44] It challenges a lot of these kind of inherent assumptions.

[00:25:46] And I think in our little world,

[00:25:47] this is a pretty good example of that

[00:25:49] because ultimately what we do,

[00:25:50] to your point,

[00:25:51] is special effects, right?

[00:25:52] It's visual effects.

[00:25:53] We call it AI because we use AI,

[00:25:55] but at its core, right?

[00:25:56] It's not too different

[00:25:58] from what Hollywood has been trying to do

[00:25:59] for many years.

[00:26:00] Definitely is the art and science of visual effects.

[00:26:02] And I'm kind of curious, right?

[00:26:03] Like on the consumer side,

[00:26:04] there's this short form video fatigue

[00:26:07] and just video fatigue.

[00:26:08] Everyone's doing video all the time.

[00:26:10] But on the enterprise side,

[00:26:11] as you mentioned,

[00:26:12] there's a bunch of this content

[00:26:13] that just would never have been converted

[00:26:15] into video form.

[00:26:16] If you take that to the limit,

[00:26:18] do you think there's a similar risk

[00:26:20] where we just end up polluting our feeds

[00:26:22] with a bunch of throwaway content?

[00:26:24] It's just going to be like an onslaught

[00:26:27] of enterprise B2B video content.

[00:26:30] But I think what's going to happen

[00:26:31] is that video is going to become the table stakes.

[00:26:32] So today, email is table stakes, right?

[00:26:35] You don't operate a company

[00:26:36] without sending out emails.

[00:26:39] At one point,

[00:26:39] if you're sending me like email

[00:26:41] with lots of text on them,

[00:26:42] you're just not going to open them, right?

[00:26:43] Your inbox in the future

[00:26:44] is going to look more like your TikTok feed

[00:26:46] where you just kind of quickly scroll

[00:26:47] through what's interesting.

[00:26:48] And as always,

[00:26:50] just like it is with email today,

[00:26:52] and just because something

[00:26:53] gets easier to produce,

[00:26:55] you still have to be a great storyteller.

[00:26:56] You still have to figure out

[00:26:57] what's the right hook

[00:26:58] to get my attention,

[00:26:59] to watch your video

[00:27:00] all the way through

[00:27:01] and get in contact

[00:27:02] or whatever it is

[00:27:02] that you want me to do.

[00:27:03] I think all those things

[00:27:05] around storytelling

[00:27:05] and building a good product

[00:27:07] and being good at communicating it,

[00:27:09] none of that goes away.

[00:27:10] So I think what's true now,

[00:27:11] what is going to be true in the future,

[00:27:12] it's about curation

[00:27:13] and standing out.

[00:27:14] So we are seeing an explosion of content.

[00:27:16] And of course,

[00:27:17] every time tools like the ones

[00:27:19] that you're creating come out,

[00:27:21] people use it for misinformation

[00:27:22] and disinformation, right?

[00:27:23] And so there have been instances

[00:27:24] in the past

[00:27:25] where Synthesia avatars

[00:27:27] were used to spread misinformation.

[00:27:29] How much of those incidents

[00:27:30] pushed you to sort of lock down

[00:27:32] or put rails on the abilities

[00:27:34] of these avatars?

[00:27:35] So the safety aspect

[00:27:37] has always been very important to us.

[00:27:38] And since we found the company

[00:27:40] in 2017,

[00:27:41] we did so on an ethical framework

[00:27:43] called the three Cs,

[00:27:44] consent, control, and collaboration.

[00:27:46] And consent is about

[00:27:47] we never create avatars of anyone

[00:27:48] without explicit consent.

[00:27:50] And that's kind of like a hard stop.

[00:27:52] Which means we kind of lose out

[00:27:54] on some virality

[00:27:54] because we don't make funny videos

[00:27:56] for satire of like celebrities

[00:27:57] or whatever, right?

[00:27:58] But that's a choice we decided to make.

[00:28:01] The second one is from control, right?

[00:28:02] So that's basic content moderation,

[00:28:04] which is we take a very strong view

[00:28:05] on what you can use the platform for

[00:28:07] or what you can't use the platform for.

[00:28:08] We're a B2B product,

[00:28:10] we work with the enterprise.

[00:28:11] And so we're probably,

[00:28:13] a bit overly strict

[00:28:14] in some senses.

[00:28:15] You know,

[00:28:16] there's legal categories of content

[00:28:18] that we kind of are very restrictive around.

[00:28:20] And we put a lot of effort

[00:28:21] both with machines and with humans

[00:28:22] into making sure

[00:28:23] that people don't use our platform

[00:28:24] and think they shouldn't.

[00:28:25] I think with these incidents

[00:28:28] that happened in the past,

[00:28:29] we'll always get judged

[00:28:30] by the one video

[00:28:30] that makes it through

[00:28:31] and we learn something from that

[00:28:32] every single time.

[00:28:33] In many ways, right?

[00:28:35] Like when you do consent moderation,

[00:28:36] a lot of people disagree with you

[00:28:38] no matter what direction you go in.

[00:28:40] Yeah, you're not going to make everyone happy.

[00:28:41] Exactly.

[00:28:42] And especially, of course,

[00:28:43] when it comes to things like news and politics,

[00:28:46] religion, etc.

[00:28:47] This gets very, very hairy

[00:28:48] and no matter what you do,

[00:28:49] there'll be people who don't like it, right?

[00:28:51] And so there was specifically

[00:28:53] one of these instances

[00:28:54] which I think was something

[00:28:55] we discussed a lot internally

[00:28:56] when someone made a video

[00:28:58] and I'll leave out

[00:28:59] kind of like the details of it,

[00:29:01] but essentially a video

[00:29:01] about like a pretty hairy topic, right?

[00:29:03] A topic that'll divide people in two,

[00:29:05] either you're very pro

[00:29:06] or you're very against.

[00:29:07] And the video was actually entirely factual,

[00:29:10] but it was not perceived

[00:29:11] at this one big newspaper

[00:29:12] as being kind of a piece

[00:29:13] of sort of propaganda information.

[00:29:15] And that was a very interesting one for us

[00:29:16] because we fact-checked it

[00:29:18] and there's nothing

[00:29:19] that wasn't factual in there.

[00:29:21] You could argue that

[00:29:22] talking about it in a specific way

[00:29:23] was kind of like a ploy

[00:29:25] to make people believe

[00:29:26] something specifically,

[00:29:27] but I mean,

[00:29:28] all communication has those properties.

[00:29:29] And so what we've decided to do

[00:29:31] is just to be,

[00:29:32] again,

[00:29:33] kind of overly restrictive.

[00:29:33] So we don't allow news

[00:29:34] and current events content

[00:29:35] unless you're an enterprise customer,

[00:29:36] for example.

[00:29:37] That's actually a shame

[00:29:38] because we had a lot of like NGOs,

[00:29:41] citizen journalists,

[00:29:41] and so those kind of folks

[00:29:43] making great content on the platform,

[00:29:45] but it's just like too difficult

[00:29:46] to manage eventually.

[00:29:47] And so we decided to make that rule.

[00:29:51] So it's something we always work on.

[00:29:53] As I said,

[00:29:53] you know,

[00:29:54] we're not claiming we're perfect,

[00:29:56] but I think we've,

[00:29:56] I think we have very,

[00:29:58] very good systems in place today

[00:29:59] that keeps bad people

[00:30:00] out of the platform.

[00:30:02] I got to say,

[00:30:03] the stance you're taking

[00:30:04] is indeed more restrictive.

[00:30:05] I hear most platform creators

[00:30:07] sort of punting this

[00:30:08] to the point of distribution

[00:30:09] where they're like,

[00:30:10] well,

[00:30:10] the creation tool

[00:30:11] shouldn't be responsible for this.

[00:30:13] The distribution platforms

[00:30:14] should be the ones,

[00:30:15] you know,

[00:30:16] bringing the hammer down.

[00:30:17] Look,

[00:30:17] I think these questions

[00:30:18] are like so difficult,

[00:30:19] right?

[00:30:19] And there's so many different

[00:30:20] ways you can think about them.

[00:30:22] If you think about them

[00:30:22] philosophically,

[00:30:23] if there's a question

[00:30:23] of like freedom of speech,

[00:30:25] from a practical perspective,

[00:30:27] is this,

[00:30:28] you know,

[00:30:28] just about keeping out

[00:30:30] the bad people

[00:30:30] that we all agree

[00:30:32] are bad people?

[00:30:33] Is it an economical question?

[00:30:35] You know,

[00:30:35] am I hindering my growth

[00:30:36] as a company

[00:30:36] because I'm overly restrictive

[00:30:38] and leaving the door open

[00:30:39] for other competitors?

[00:30:40] Like there's so many angles.

[00:30:41] There's so many,

[00:30:42] it's not an easy question,

[00:30:43] right?

[00:30:43] And what we have talked

[00:30:44] a lot about is that

[00:30:45] there is a shift

[00:30:46] happening right now,

[00:30:47] specifically in AI,

[00:30:48] where we're actually,

[00:30:49] a lot of companies

[00:30:50] are moving the point

[00:30:51] of moderation

[00:30:52] to the point of creation,

[00:30:54] right?

[00:30:54] Where,

[00:30:55] of course,

[00:30:55] with the big language models,

[00:30:56] we see this all the time,

[00:30:57] right?

[00:30:57] There's a bunch of things

[00:30:59] they just won't talk about

[00:31:00] and they'll definitely

[00:31:01] not help you

[00:31:02] with the recipe

[00:31:03] for a bomb

[00:31:04] or something like that,

[00:31:04] but even also

[00:31:05] more vanilla topics

[00:31:05] like obviously politics

[00:31:06] being the obvious one,

[00:31:07] there'll also be kind of like

[00:31:09] tiptoeing very much

[00:31:10] around those kind of things.

[00:31:11] In our case,

[00:31:12] it's sort of the same thing

[00:31:14] where we actually limit you

[00:31:15] from actually creating

[00:31:15] the content

[00:31:17] and I always explain

[00:31:18] this as like,

[00:31:18] that is actually very new,

[00:31:19] right?

[00:31:20] Imagine that

[00:31:20] when you're using PowerPoint,

[00:31:22] Microsoft Word,

[00:31:23] it would stop you

[00:31:24] from making a slide

[00:31:25] about how to do

[00:31:27] something horrible,

[00:31:27] right?

[00:31:28] That's a very weird thought

[00:31:29] for most people,

[00:31:29] but in many ways,

[00:31:31] that's actually

[00:31:31] what we're doing

[00:31:32] and what we're building,

[00:31:33] right?

[00:31:34] And no one has ever

[00:31:35] held Microsoft responsible

[00:31:37] for the fact

[00:31:37] that a school shooter

[00:31:38] can write their manifesto

[00:31:39] in Microsoft Word,

[00:31:40] right?

[00:31:40] Or that I'm sure

[00:31:41] there's been made PowerPoints

[00:31:42] about how to do

[00:31:43] evil, horrible things

[00:31:44] in wars and so on,

[00:31:45] but we've never seen

[00:31:46] that as being

[00:31:47] Microsoft responsibility.

[00:31:48] We've always seen

[00:31:49] that as being

[00:31:51] the distribution platform's

[00:31:52] responsibility

[00:31:53] once that content

[00:31:54] actually gets uploaded

[00:31:54] somewhere.

[00:31:55] But I do think

[00:31:56] that as a society,

[00:31:57] it's probably good

[00:31:57] that we're like

[00:31:58] extra careful

[00:31:59] when we roll out

[00:32:00] these things

[00:32:00] in the beginning

[00:32:01] and then,

[00:32:01] you know,

[00:32:01] maybe in 10,

[00:32:02] 15 years,

[00:32:02] we'll have a different

[00:32:03] view on how

[00:32:06] these technologies

[00:32:06] should be used

[00:32:07] and governed.

[00:32:08] But as a starting point,

[00:32:10] I mean,

[00:32:10] my own kind of

[00:32:11] moral inclination

[00:32:12] and the rest of the companies

[00:32:13] is that it's good

[00:32:14] to be a little bit

[00:32:15] on the back foot

[00:32:15] and be a little bit

[00:32:16] more restrictive

[00:32:16] than when some people

[00:32:18] will feel comfortable with.

[00:32:19] Now, building off

[00:32:20] the discussion

[00:32:21] and looking towards

[00:32:22] the future,

[00:32:23] you talked about

[00:32:23] next year you're going

[00:32:24] to have these avatars

[00:32:25] that you can talk to

[00:32:26] in real time.

[00:32:27] There's an interesting

[00:32:28] thing that we came across.

[00:32:29] We did this episode

[00:32:30] with ChatGPT Advanced

[00:32:31] Voice Mode

[00:32:31] where sort of the guardrails

[00:32:34] and restrictions

[00:32:34] that are put on it

[00:32:35] almost prevent

[00:32:36] the avatar

[00:32:37] from being like

[00:32:38] fully human-like,

[00:32:39] you know?

[00:32:39] Like if it's

[00:32:40] too much

[00:32:41] in a box,

[00:32:42] you can kind of

[00:32:43] see those seams

[00:32:44] and that kind of

[00:32:45] pops the illusion.

[00:32:46] How do you think

[00:32:47] about that tension,

[00:32:48] especially as you're

[00:32:49] moving towards

[00:32:50] these more expressive

[00:32:50] product experiences?

[00:32:53] I totally agree

[00:32:53] with you

[00:32:54] and I think

[00:32:55] it's so deeply

[00:32:56] fascinating to me

[00:32:57] how as humans

[00:32:58] we're so good

[00:32:59] at detecting something

[00:32:59] that's non-human.

[00:33:00] Like when you talk

[00:33:01] to the voice mode chat,

[00:33:03] right?

[00:33:03] Like you understand,

[00:33:04] okay,

[00:33:04] this will help you

[00:33:05] answer like kind of

[00:33:06] practical,

[00:33:07] factual questions

[00:33:08] and every time

[00:33:09] you ask it for an opinion

[00:33:10] or to be a little bit

[00:33:10] human,

[00:33:11] it'll just default

[00:33:12] to, you know,

[00:33:12] back to the kind

[00:33:14] of like robot speech

[00:33:16] to some extent.

[00:33:17] At some point,

[00:33:18] you know,

[00:33:18] I think these restrictions

[00:33:19] will be lifted.

[00:33:19] There's a big market

[00:33:21] and there's a big appetite

[00:33:21] for interacting

[00:33:23] with computers

[00:33:24] that feels very,

[00:33:25] very lifelike,

[00:33:25] right?

[00:33:26] So I think we will see

[00:33:27] that kind of boundary

[00:33:28] disappear over time.

[00:33:29] As for us,

[00:33:30] I think,

[00:33:31] again,

[00:33:31] you know,

[00:33:31] we've made a decision

[00:33:32] to be a B2B company

[00:33:34] and so we're not

[00:33:36] going to be offering

[00:33:36] like virtual boyfriends

[00:33:37] and girlfriends

[00:33:38] any time in the near future

[00:33:40] but I think a lot

[00:33:42] of those properties

[00:33:42] are also very interesting

[00:33:44] in a business context,

[00:33:45] right?

[00:33:46] For example,

[00:33:46] if you're a salesperson

[00:33:47] and you do sales training,

[00:33:49] if you can role play

[00:33:51] with a prospect

[00:33:52] that can be programmed

[00:33:54] and prompted

[00:33:55] to act in a specific way,

[00:33:57] you can probably ramp

[00:33:58] a lot faster

[00:34:00] than if you have

[00:34:01] to read documents

[00:34:01] about how to,

[00:34:02] you know,

[00:34:03] come back

[00:34:03] from different objections

[00:34:04] and I think

[00:34:05] there's a lot of other

[00:34:06] and potentially also

[00:34:07] more controversial

[00:34:08] applications of this.

[00:34:09] Think about like

[00:34:09] psychology,

[00:34:10] therapists,

[00:34:12] and doctors.

[00:34:13] I think we'll see

[00:34:14] a lot of those

[00:34:14] pop up in the next

[00:34:16] couple of years

[00:34:17] and I think ultimately

[00:34:19] for a lot of these

[00:34:20] use cases to really work,

[00:34:21] it has to feel

[00:34:22] very lifelike,

[00:34:23] you know?

[00:34:23] I think if you're

[00:34:24] interacting with like

[00:34:25] a sales simulator

[00:34:27] which looks like

[00:34:28] a computer game

[00:34:29] from the 90s,

[00:34:30] you're just going

[00:34:31] to disconnect from it.

[00:34:32] It's not going to work,

[00:34:33] right?

[00:34:34] And I think right now

[00:34:34] we're very,

[00:34:35] very close to like

[00:34:35] passing through that

[00:34:36] uncanny valley

[00:34:37] where it actually

[00:34:38] will feel very,

[00:34:38] very close to having

[00:34:39] a zoom call

[00:34:40] with a real human being.

[00:34:41] It's interesting

[00:34:42] even with your

[00:34:43] B2B focus,

[00:34:43] you just outlined

[00:34:44] a bunch of these

[00:34:45] scenarios where the

[00:34:46] box is large enough

[00:34:47] where you can have

[00:34:48] a very meaningful

[00:34:49] interactive experience.

[00:34:51] So I have to ask you

[00:34:52] how far away

[00:34:53] are we

[00:34:54] where we can have

[00:34:55] these AI avatars

[00:34:56] that can feel

[00:34:57] indistinguishable

[00:34:58] from a human conversation?

[00:34:59] I don't think

[00:35:00] we're very far

[00:35:01] to be honest.

[00:35:02] I think

[00:35:03] in 12 months time

[00:35:04] you could probably

[00:35:06] simulate zoom calls

[00:35:07] at a pretty good fidelity.

[00:35:09] I think device component

[00:35:10] of this is

[00:35:11] kind of getting

[00:35:12] to full maturity.

[00:35:13] There's a lot of

[00:35:13] great technologies

[00:35:14] out there

[00:35:14] and the video part

[00:35:15] of it,

[00:35:16] depending a little bit

[00:35:17] what you're trying

[00:35:17] to simulate,

[00:35:18] but if you look

[00:35:19] at the videos

[00:35:20] that we're watching

[00:35:21] each other on

[00:35:21] right now,

[00:35:22] right?

[00:35:22] And that's a

[00:35:24] compressed

[00:35:24] like zoom feed,

[00:35:26] then that's not

[00:35:27] like the most

[00:35:27] challenging thing

[00:35:28] to replicate

[00:35:28] and you're already

[00:35:29] going to expect

[00:35:30] a whole bunch

[00:35:30] of artifacts

[00:35:31] and compressions

[00:35:32] and all those

[00:35:33] sort of things,

[00:35:33] right?

[00:35:33] So if that's

[00:35:34] kind of like the goal,

[00:35:35] then I think

[00:35:36] you're not very far

[00:35:37] from it.

[00:35:37] Let me ask it

[00:35:38] in a slightly

[00:35:39] different way,

[00:35:39] especially in the

[00:35:40] visual fidelity

[00:35:41] and to use your

[00:35:42] example from earlier,

[00:35:43] how long before

[00:35:44] you can send

[00:35:44] that digital love

[00:35:45] letter to your

[00:35:46] girlfriend

[00:35:46] and she believes

[00:35:47] it was actually

[00:35:48] from you?

[00:35:50] I think next year,

[00:35:52] like I really,

[00:35:53] I don't think

[00:35:53] it's far away.

[00:35:54] I think looking

[00:35:55] at what we're

[00:35:56] building right now,

[00:35:57] we have the components,

[00:35:58] we've taught a system

[00:35:59] how to predict

[00:36:01] the correct body language,

[00:36:03] facial expressions,

[00:36:04] gestures that goes

[00:36:05] with what you're saying.

[00:36:06] We can generate

[00:36:07] the voice

[00:36:07] in high enough

[00:36:08] quality where

[00:36:09] it sounds

[00:36:09] deep-filled and emotional,

[00:36:10] so I really don't

[00:36:11] think that it's

[00:36:12] more than 12 months

[00:36:14] away.

[00:36:15] And it'll be

[00:36:15] very interesting,

[00:36:16] I usually,

[00:36:17] internally we talk

[00:36:17] about this as like

[00:36:18] the chat GPT

[00:36:19] moment for video.

[00:36:20] I think what's so

[00:36:20] powerful about

[00:36:21] chat GPT is that

[00:36:22] it truly kind of

[00:36:22] broke with

[00:36:23] uncanny value,

[00:36:23] right?

[00:36:24] The first time

[00:36:24] you used

[00:36:25] chat GPT,

[00:36:25] it's so human

[00:36:26] that you begin

[00:36:27] talking to it

[00:36:27] like a human

[00:36:28] subconsciously

[00:36:29] without even

[00:36:29] thinking about it.

[00:36:31] I think for audio

[00:36:32] and text-to-speech

[00:36:34] kind of got there

[00:36:35] and for video

[00:36:36] I think this is

[00:36:36] getting very close.

[00:36:37] So internally

[00:36:38] we think of this

[00:36:38] like when you can

[00:36:39] generate a video

[00:36:40] of like a vlogger

[00:36:41] on YouTube,

[00:36:42] like, you know,

[00:36:42] the traditional styles

[00:36:43] like sitting in my

[00:36:44] bedroom kind of like

[00:36:44] talking at you,

[00:36:45] when you can generate

[00:36:46] that in high enough

[00:36:47] quality,

[00:36:48] high enough fidelity

[00:36:49] that you would come

[00:36:50] home after work

[00:36:51] one day and you'd

[00:36:52] put on an avatar video

[00:36:52] and just sit down

[00:36:53] and watch an avatar

[00:36:54] talk for 18 minutes

[00:36:55] like a lot of people

[00:36:56] do with vloggers,

[00:36:57] that's where the

[00:36:58] total market

[00:36:59] for these technologies

[00:36:59] explodes by a thousand.

[00:37:02] I think when that

[00:37:03] happens,

[00:37:04] Pandora's box is open,

[00:37:05] there's going to be

[00:37:05] lots of ethical

[00:37:06] questions,

[00:37:07] lots of cultural

[00:37:08] questions,

[00:37:09] lots of art

[00:37:10] questions about

[00:37:11] what does this mean

[00:37:13] and I think

[00:37:14] it'll be a pretty

[00:37:15] meaningful and

[00:37:16] powerful moment.

[00:37:17] So let's get into

[00:37:18] those ethical questions.

[00:37:19] I mean,

[00:37:19] it's fascinating,

[00:37:21] right?

[00:37:21] Let's say you have

[00:37:22] these photorealistic

[00:37:23] avatars that you can

[00:37:24] talk to in real time,

[00:37:26] you know,

[00:37:26] could this tech

[00:37:27] eventually replace

[00:37:29] humans completely

[00:37:30] in, let's say,

[00:37:31] like customer service

[00:37:31] roles and how do you

[00:37:33] think about that

[00:37:33] tension, right?

[00:37:34] It's like,

[00:37:35] how do you ensure

[00:37:35] this tech enhances

[00:37:37] rather than replaces

[00:37:38] human interactions?

[00:37:39] Because the thing

[00:37:39] that keeps popping

[00:37:40] into my head

[00:37:41] is like pulling up

[00:37:42] to a hotel

[00:37:43] at like 11 p.m.

[00:37:44] and instead of a

[00:37:45] human there,

[00:37:45] there's like a

[00:37:46] freaking iPad,

[00:37:47] you know,

[00:37:47] it's multimodal,

[00:37:48] it can see me,

[00:37:49] it'll check me

[00:37:49] and I'll do everything.

[00:37:50] It's perfect.

[00:37:51] It can work around

[00:37:52] the clock,

[00:37:52] but there's not a

[00:37:53] human and you're

[00:37:54] already seeing some

[00:37:55] hotels try this

[00:37:56] where they've got,

[00:37:57] you know,

[00:37:58] essentially a remote

[00:37:58] worker playing that

[00:37:59] role right now,

[00:38:00] but eventually it'll be

[00:38:01] autonomous and that's

[00:38:02] just one example.

[00:38:03] So how do you think

[00:38:04] about that Pandora's

[00:38:05] box opening?

[00:38:06] I think there are

[00:38:07] ultimately two types

[00:38:08] of use cases.

[00:38:10] If you're calling a

[00:38:11] customer support,

[00:38:12] for example,

[00:38:13] you don't really care

[00:38:14] about who the

[00:38:14] customer support

[00:38:15] agent is,

[00:38:16] right?

[00:38:16] You just care about

[00:38:17] solving your problem

[00:38:18] the fastest way

[00:38:18] you possibly can

[00:38:19] and I think if we

[00:38:20] replace that with an

[00:38:21] agent or a bot,

[00:38:23] I think no one

[00:38:24] will care about that

[00:38:25] and I think that'll

[00:38:25] definitely happen.

[00:38:26] It's a matter of

[00:38:26] like when the

[00:38:27] technologies are good

[00:38:28] enough.

[00:38:28] If you take the

[00:38:29] example of a

[00:38:31] salesperson or maybe

[00:38:32] a hotel receptionist,

[00:38:34] I think some hotels

[00:38:36] will want to sell the

[00:38:37] cheapest room.

[00:38:38] They'll want you to

[00:38:39] have the fastest

[00:38:40] experience and just

[00:38:41] like getting the

[00:38:42] key card and just

[00:38:43] getting into your

[00:38:44] room.

[00:38:44] Other hotels will

[00:38:45] put a lot of

[00:38:46] emphasis on meeting

[00:38:48] and greeting you at

[00:38:48] the door,

[00:38:49] taking your luggage

[00:38:50] for you,

[00:38:50] explaining what's

[00:38:51] happening in the city

[00:38:51] this weekend and so

[00:38:52] on and so forth,

[00:38:53] that's a product

[00:38:54] that's pretty

[00:38:55] heavily service

[00:38:56] dependent and I

[00:38:57] think for those

[00:38:58] kind of things,

[00:38:58] we'll really value

[00:38:59] the human connection.

[00:39:00] I think it's a bit

[00:39:00] the same thing with

[00:39:01] a salesperson.

[00:39:02] A lot of people

[00:39:02] want to talk to a

[00:39:03] salesperson because

[00:39:05] it's a relationship

[00:39:06] that you build with

[00:39:06] someone else and I

[00:39:07] don't think we can

[00:39:08] replace that.

[00:39:09] I think that the

[00:39:09] human touch and the

[00:39:10] human element will

[00:39:11] become much more

[00:39:11] important in the

[00:39:12] future.

[00:39:13] AI is going to be

[00:39:14] much faster at

[00:39:15] replacing people typing

[00:39:16] in Excel spreadsheets

[00:39:17] all day than a

[00:39:18] waiter giving you a

[00:39:19] great experience at

[00:39:20] the local restaurant.

[00:39:21] I think that's

[00:39:22] well said.

[00:39:23] But I want to ask

[00:39:23] you, do you foresee

[00:39:25] a world where

[00:39:26] having a digital

[00:39:27] avatar is as common

[00:39:28] as somebody having

[00:39:30] a social media

[00:39:30] profile?

[00:39:31] Like Meta

[00:39:32] recently announced

[00:39:33] digital avatar

[00:39:34] tools for creators

[00:39:35] on their platforms

[00:39:36] for instance.

[00:39:37] Absolutely.

[00:39:38] I think it's just

[00:39:39] an evolution of

[00:39:40] the profiles we

[00:39:41] all have today.

[00:39:42] In some sense,

[00:39:42] your profile on a

[00:39:44] social media network

[00:39:44] is also a clone of

[00:39:45] you.

[00:39:46] It's maybe not as

[00:39:47] visceral as an avatar

[00:39:48] of yourself, but that

[00:39:49] is what it is.

[00:39:49] It's a digital

[00:39:50] representation of who

[00:39:51] you are.

[00:39:52] And if I go back

[00:39:53] to my childhood

[00:39:54] when I was on

[00:39:54] forums, we'd have

[00:39:55] a username and

[00:39:56] then the next

[00:39:57] iteration of forums

[00:39:58] would have a

[00:39:58] username and a

[00:39:58] profile picture.

[00:40:00] And then you'd

[00:40:00] have a profile

[00:40:01] picture with a

[00:40:01] profile page where

[00:40:02] you can write

[00:40:02] something about

[00:40:03] yourself and your

[00:40:04] interests or

[00:40:04] whatever.

[00:40:05] And then we all

[00:40:05] graduated to

[00:40:06] social media and

[00:40:07] now we have not

[00:40:07] just one picture of

[00:40:08] ourselves, we have

[00:40:09] a whole gallery of

[00:40:10] pictures that talks

[00:40:11] about us.

[00:40:11] And on TikTok, we

[00:40:12] have a whole

[00:40:13] library of videos

[00:40:14] that explain something

[00:40:15] about ourselves and

[00:40:16] who we are and our

[00:40:16] place in the world and

[00:40:17] so on and so forth.

[00:40:18] So I think in many

[00:40:18] ways it's just a

[00:40:19] natural evolution of

[00:40:20] that we will have

[00:40:21] kind of digital

[00:40:22] personas that

[00:40:24] represent us in

[00:40:25] the kind of digital

[00:40:26] space.

[00:40:27] So are you

[00:40:27] imagining this tech

[00:40:29] evolves to a level

[00:40:30] where let's say my

[00:40:30] digital self not only

[00:40:32] represents me in the

[00:40:33] virtual world but in

[00:40:34] a sense kind of

[00:40:35] lives my virtual

[00:40:37] life for me?

[00:40:41] I don't think it's

[00:40:42] off the table, you

[00:40:43] know.

[00:40:43] I think, again, I

[00:40:45] don't think that I

[00:40:45] will enjoy interacting

[00:40:47] with my friends

[00:40:47] bot as much as

[00:40:49] and enjoy

[00:40:50] interaction with my

[00:40:51] friend in the

[00:40:52] flesh and knowing

[00:40:53] that it's actually

[00:40:53] him.

[00:40:53] I think it'll be,

[00:40:54] again, more

[00:40:54] probably practical

[00:40:56] and maybe we'll

[00:40:56] have agents that

[00:40:57] say, hey, you

[00:40:59] haven't seen Simon

[00:40:59] for six months,

[00:41:00] why don't we

[00:41:01] arrange something?

[00:41:02] And I'll say, yeah,

[00:41:02] that's a good idea,

[00:41:03] right?

[00:41:03] Then my AI will go

[00:41:04] to Simon and say,

[00:41:05] hey, these guys

[00:41:06] haven't met up for

[00:41:06] a while, why don't

[00:41:07] we set up something

[00:41:09] for them in a couple

[00:41:09] of months' time,

[00:41:10] right?

[00:41:10] We know that they

[00:41:11] both love listening

[00:41:12] to techno music,

[00:41:13] so let's find a

[00:41:14] concert or rave

[00:41:16] somewhere close by

[00:41:17] and set that up

[00:41:18] for you, right?

[00:41:18] So I think, again,

[00:41:19] it's more utilitarian,

[00:41:20] I think.

[00:41:21] I don't think it's

[00:41:21] going to be like our

[00:41:21] AI is catching up

[00:41:23] on behalf of us

[00:41:24] and then giving each

[00:41:25] of the humans

[00:41:26] the lowdown of what

[00:41:27] was discussed

[00:41:27] in how Stuart Simon's

[00:41:28] life.

[00:41:29] I hope that's not

[00:41:29] going to be the

[00:41:30] case.

[00:41:30] But I think those

[00:41:31] kind of things, I

[00:41:31] think we will see a

[00:41:32] lot more, right?

[00:41:33] And for one, as

[00:41:34] someone who has a

[00:41:34] pretty busy life, I

[00:41:35] think that'd be

[00:41:35] pretty awesome,

[00:41:36] actually.

[00:41:37] But I think from a

[00:41:38] very philosophical

[00:41:38] perspective, you can

[00:41:39] argue that basically

[00:41:41] everything online is

[00:41:42] already not real,

[00:41:43] right?

[00:41:44] Like your Instagram

[00:41:44] profile is not a

[00:41:45] real representation

[00:41:46] of you.

[00:41:46] We present ourselves

[00:41:47] in the best life

[00:41:48] possible.

[00:41:49] And I think our

[00:41:49] avatars and all the

[00:41:50] digital content we

[00:41:51] create around

[00:41:52] ourselves will probably

[00:41:52] just be an extension

[00:41:53] of that.

[00:41:54] I think what we'll

[00:41:55] have to learn, and

[00:41:56] what I can send the

[00:41:57] younger generation to

[00:41:58] some extent are

[00:41:59] learning, is that

[00:42:00] this is like, it's

[00:42:01] fiction grounded in

[00:42:02] reality, right?

[00:42:03] And I usually use

[00:42:04] the example, like,

[00:42:04] when you go to a

[00:42:05] dinner party, and

[00:42:06] when your parents went

[00:42:07] to a dinner party,

[00:42:08] also when you do

[00:42:09] for that matter, but

[00:42:09] in a different time

[00:42:10] and age, right?

[00:42:11] You sit down at

[00:42:12] the dinner table and

[00:42:12] you ask them, how's

[00:42:13] it going?

[00:42:13] And people do exactly

[00:42:14] the same thing in

[00:42:15] real life as they

[00:42:15] do on Instagram,

[00:42:16] right?

[00:42:16] It's very few people

[00:42:17] sit down at the

[00:42:18] table and say,

[00:42:18] actually, you know

[00:42:19] what?

[00:42:19] I'm really tired of

[00:42:20] my wife.

[00:42:21] I want a divorce.

[00:42:22] I hate my job.

[00:42:23] Like, most people

[00:42:23] are like, yeah, it's

[00:42:24] going pretty well.

[00:42:24] Like, we project a

[00:42:25] version of ourselves

[00:42:26] to the world.

[00:42:27] And so I think it's

[00:42:28] like this idea of

[00:42:29] projecting yourself is

[00:42:30] not something that

[00:42:30] Instagram has created.

[00:42:31] It's always been the

[00:42:32] case.

[00:42:34] It's amplified,

[00:42:35] perhaps.

[00:42:35] It amplifies it, and

[00:42:35] it makes it more

[00:42:36] concrete in many ways.

[00:42:37] But I think most

[00:42:38] human behavior has

[00:42:39] been the same for

[00:42:40] like thousands,

[00:42:41] thousands of years.

[00:42:42] We just express it

[00:42:42] in a different way.

[00:42:44] So in this future

[00:42:45] where these digital

[00:42:45] humans are photorealistic,

[00:42:46] they've crossed the

[00:42:47] uncanny valley, what

[00:42:49] does that mean for

[00:42:50] individuality?

[00:42:50] Like, will we be

[00:42:51] confused by the fact

[00:42:53] like I can't even

[00:42:54] tell if this is like

[00:42:55] Victor that I'm

[00:42:56] interviewing or you

[00:42:57] delegated your deep

[00:42:58] fake to like come and

[00:42:59] do the interview?

[00:43:00] And it's like

[00:43:00] indiscernible to me

[00:43:02] like what is going to

[00:43:02] happen to transparency

[00:43:03] in that context and

[00:43:05] individuality?

[00:43:05] I think that if you

[00:43:07] look at text, like

[00:43:09] you have been able to

[00:43:10] produce text and

[00:43:12] share with anyone

[00:43:12] online for the last

[00:43:14] many years.

[00:43:15] And I think by now

[00:43:16] most of us have some

[00:43:17] sort of critical sense

[00:43:18] that just because

[00:43:19] something exists as

[00:43:21] text or the internet

[00:43:22] somewhere does not

[00:43:23] make it true.

[00:43:24] If you see a tweet from

[00:43:25] some random account

[00:43:26] saying World War 4

[00:43:27] just, you know,

[00:43:28] kicked off or

[00:43:29] whatever, your first

[00:43:30] instinct is going to

[00:43:31] be that's probably not

[00:43:32] true, right?

[00:43:32] You've got to triangulate

[00:43:33] that information with,

[00:43:34] you know, a news source

[00:43:35] or you go through

[00:43:37] whatever, right?

[00:43:38] And I think what's

[00:43:39] going to happen now

[00:43:40] is that we're going

[00:43:42] to have to move

[00:43:42] from a world in which

[00:43:43] in general, if someone

[00:43:44] has been recorded

[00:43:45] with a microphone

[00:43:46] with a camera,

[00:43:47] most people assume

[00:43:48] that that means that

[00:43:49] just the fact that it

[00:43:50] exists means that

[00:43:51] it's true.

[00:43:52] That's not going to

[00:43:53] be the case anymore,

[00:43:54] right?

[00:43:54] And so it'll be even

[00:43:55] more important that

[00:43:56] all of us learn

[00:43:58] how to be literate

[00:44:00] with media.

[00:44:00] We need to look at

[00:44:01] things from different

[00:44:02] angles.

[00:44:02] Who created this

[00:44:02] piece of content?

[00:44:03] When was it created?

[00:44:04] Is this from a

[00:44:05] reputable source?

[00:44:05] And I think

[00:44:06] these technologies

[00:44:07] are developing

[00:44:07] very fast.

[00:44:08] I think it's going

[00:44:09] to bridge into

[00:44:10] a world where

[00:44:11] we just,

[00:44:12] per definition,

[00:44:13] believe nothing

[00:44:14] of what we see

[00:44:14] online.

[00:44:15] Yeah.

[00:44:16] We presume that

[00:44:16] everything is fiction.

[00:44:17] Everything is a

[00:44:18] Hollywood film,

[00:44:18] right?

[00:44:19] And I think also

[00:44:20] just that we

[00:44:23] basically go back

[00:44:24] to saying we can

[00:44:25] only trust things

[00:44:26] just because it

[00:44:28] happened in front of

[00:44:29] us if we saw it

[00:44:29] in real life.

[00:44:30] That doesn't mean

[00:44:31] we can't trust

[00:44:32] anything we read

[00:44:32] or see online.

[00:44:33] We're just going

[00:44:33] to have to be

[00:44:34] more critical

[00:44:34] around presuming

[00:44:35] that just because

[00:44:36] something exists,

[00:44:37] it does not

[00:44:37] actually make it

[00:44:38] true, right?

[00:44:39] And I think

[00:44:39] that's actually

[00:44:40] going to be a

[00:44:40] good thing that

[00:44:40] we just per

[00:44:41] definition think

[00:44:41] that almost

[00:44:42] everything is

[00:44:43] fake and we

[00:44:43] work backwards

[00:44:44] from that.

[00:44:44] And there's a

[00:44:45] couple of ways

[00:44:45] we can work

[00:44:46] backwards from

[00:44:46] that.

[00:44:47] We're working

[00:44:48] with Adobe

[00:44:49] and some

[00:44:49] tech companies

[00:44:50] called C2PA,

[00:44:51] which is the

[00:44:52] idea that you

[00:44:52] fingerprint and

[00:44:54] watermark content

[00:44:55] essentially.

[00:44:57] I think we'll

[00:44:57] move into a

[00:44:58] world where

[00:44:59] content is

[00:45:00] per default

[00:45:01] verified.

[00:45:01] So when you

[00:45:02] take a picture

[00:45:02] with your phone

[00:45:03] and you make

[00:45:03] a video on

[00:45:04] Synthesia,

[00:45:05] when you create

[00:45:05] an image in

[00:45:06] Photoshop,

[00:45:06] you choose to

[00:45:07] register that

[00:45:08] piece of content

[00:45:09] in the global

[00:45:09] database of all

[00:45:10] the world's

[00:45:11] content.

[00:45:11] I hate the

[00:45:12] word, but I

[00:45:13] actually think

[00:45:13] a blockchain

[00:45:13] can be a good

[00:45:14] solution here

[00:45:14] because it's

[00:45:15] immutable.

[00:45:16] When you then

[00:45:17] upload it to

[00:45:18] whatever your

[00:45:18] social media

[00:45:19] platform is,

[00:45:19] it will look

[00:45:21] at the content,

[00:45:22] it'll identify

[00:45:22] it in the

[00:45:23] database for

[00:45:23] all the world's

[00:45:24] content and

[00:45:25] say this video

[00:45:25] was created by

[00:45:26] Victor originally

[00:45:27] in 2019,

[00:45:27] it was made

[00:45:28] with Photoshop,

[00:45:29] with Synthesia,

[00:45:30] whatever.

[00:45:30] Here's some

[00:45:31] information around

[00:45:32] it, we know

[00:45:32] where this came

[00:45:33] from originally.

[00:45:34] And that will

[00:45:34] move us into

[00:45:34] an internet I

[00:45:35] think where

[00:45:36] most content

[00:45:37] will be

[00:45:38] verified.

[00:45:38] That'll help

[00:45:39] you make a

[00:45:39] decision as to

[00:45:42] evaluate every

[00:45:43] single piece of

[00:45:43] content essentially

[00:45:44] and will then be

[00:45:45] in a world in

[00:45:46] which the content

[00:45:46] is not verified

[00:45:47] it will stick

[00:45:48] out like a

[00:45:49] sore thumb.

[00:45:50] I think you're

[00:45:50] right.

[00:45:51] We are going

[00:45:51] into a world

[00:45:52] where authenticating

[00:45:53] content will be

[00:45:54] the default and

[00:45:54] we'll have

[00:45:55] provenance for

[00:45:56] most pieces of

[00:45:56] content that are

[00:45:57] created.

[00:45:58] Leaving aside

[00:45:59] for the concerns

[00:46:00] about the

[00:46:00] technology,

[00:46:01] what is it

[00:46:01] about the

[00:46:02] potential of

[00:46:03] digital avatars

[00:46:03] that excites

[00:46:04] you most about

[00:46:05] humans wanting to

[00:46:06] interact, live,

[00:46:08] work, and play

[00:46:08] in this future?

[00:46:09] What can go

[00:46:10] right if you

[00:46:11] execute your

[00:46:12] mission correctly?

[00:46:13] I think the

[00:46:14] beautiful thing

[00:46:15] about technology

[00:46:16] is that it

[00:46:17] enables everyone

[00:46:18] to essentially

[00:46:19] have a voice,

[00:46:20] to be able to

[00:46:21] bring their

[00:46:21] ideas to life,

[00:46:23] share their

[00:46:23] knowledge with

[00:46:23] the world.

[00:46:24] The two main

[00:46:25] vectors there is

[00:46:26] of course

[00:46:26] distribution,

[00:46:27] which is that

[00:46:27] you can share

[00:46:28] the content once

[00:46:28] you've created,

[00:46:29] and the other

[00:46:29] one is creation.

[00:46:31] I think we've

[00:46:32] seen in many

[00:46:33] modalities how

[00:46:33] powerful it is

[00:46:34] when you allow

[00:46:36] more people to

[00:46:36] create.

[00:46:37] If you look at

[00:46:38] more recent

[00:46:39] examples,

[00:46:39] just in my

[00:46:39] own life,

[00:46:40] I love music,

[00:46:41] and I've seen

[00:46:42] firsthand how

[00:46:43] the fact that

[00:46:43] we've been able

[00:46:44] to produce

[00:46:44] digital instruments

[00:46:46] and we can

[00:46:46] sample things

[00:46:47] has led to

[00:46:47] new genres

[00:46:48] like electronic

[00:46:49] music,

[00:46:49] house and

[00:46:49] techno,

[00:46:50] for example.

[00:46:50] That would

[00:46:51] have been

[00:46:51] possible with

[00:46:52] real instruments.

[00:46:53] And when you

[00:46:54] see just more

[00:46:55] recently camera

[00:46:56] technology being

[00:46:56] very accessible,

[00:46:58] like YouTube

[00:46:58] and podcasts

[00:47:00] like we're

[00:47:00] doing right

[00:47:00] now,

[00:47:01] those are

[00:47:01] essentially

[00:47:01] formats that

[00:47:02] didn't exist

[00:47:02] before we

[00:47:03] invented

[00:47:03] technologies that

[00:47:04] massively

[00:47:04] democratized

[00:47:05] that.

[00:47:05] And so for

[00:47:05] me,

[00:47:06] the prompts

[00:47:06] of all

[00:47:06] this is

[00:47:07] like,

[00:47:07] well,

[00:47:07] what if

[00:47:08] everyone

[00:47:08] could be

[00:47:08] a Spielberg?

[00:47:09] What if

[00:47:09] anyone,

[00:47:10] a film student

[00:47:11] can go out

[00:47:12] and say,

[00:47:13] I have a

[00:47:13] great idea

[00:47:13] and all

[00:47:14] they need to

[00:47:15] realize that

[00:47:15] is a lot

[00:47:16] of time

[00:47:17] and a good

[00:47:17] idea.

[00:47:18] There'll be

[00:47:19] a whole bunch

[00:47:19] of content

[00:47:21] that's never

[00:47:21] going to be

[00:47:21] watched by

[00:47:22] anyone,

[00:47:22] it's going

[00:47:22] to be crappy

[00:47:22] content,

[00:47:23] but there

[00:47:23] will also

[00:47:24] be a film

[00:47:24] student

[00:47:24] from somewhere

[00:47:25] in some

[00:47:26] small country

[00:47:27] in the world

[00:47:28] that manages

[00:47:29] to produce

[00:47:29] amazing art

[00:47:31] despite not

[00:47:31] being connected

[00:47:32] to Hollywood.

[00:47:32] And I think

[00:47:33] that's really

[00:47:33] the thing

[00:47:33] that excites

[00:47:34] me the most.

[00:47:35] It's like

[00:47:35] free and

[00:47:36] creativity,

[00:47:37] culture,

[00:47:38] and art

[00:47:38] is such an

[00:47:39] important part

[00:47:39] of moving

[00:47:39] humanity forward,

[00:47:40] of creating

[00:47:42] peace in the

[00:47:42] world,

[00:47:43] bridging all

[00:47:43] the gaps

[00:47:44] that we

[00:47:45] have between

[00:47:45] us.

[00:47:46] And I think

[00:47:47] that's going

[00:47:48] to be a

[00:47:48] massively

[00:47:49] positive thing

[00:47:50] for the

[00:47:50] world.

[00:47:51] We've already

[00:47:51] seen it play

[00:47:52] out in

[00:47:53] many other

[00:47:53] types of

[00:47:54] media,

[00:47:54] and getting

[00:47:55] video there

[00:47:56] as well,

[00:47:56] I think

[00:47:57] is going

[00:47:57] to be

[00:47:57] transformational

[00:47:58] for the

[00:47:58] world.

[00:47:59] Love it.

[00:48:00] Victor,

[00:48:00] thank you so

[00:48:00] much for

[00:48:07] is the

[00:48:08] co-founder

[00:48:08] and CEO

[00:48:08] of

[00:48:09] Synthesia.

[00:48:09] And yes,

[00:48:10] I'm quite sure

[00:48:11] I spoke

[00:48:12] with a real

[00:48:12] victor,

[00:48:13] not as

[00:48:13] digital twin.

[00:48:15] Though,

[00:48:15] in a year

[00:48:16] or two,

[00:48:17] even that

[00:48:17] certainty

[00:48:17] might be

[00:48:18] up for

[00:48:18] debate.

[00:48:20] What fascinates

[00:48:21] me is how

[00:48:21] we've

[00:48:22] inadvertently

[00:48:22] paved the

[00:48:23] way for

[00:48:23] digital humans

[00:48:24] through our

[00:48:25] everyday tech

[00:48:25] compromises.

[00:48:26] I mean,

[00:48:27] think about

[00:48:27] it.

[00:48:27] We've

[00:48:28] grown

[00:48:28] completely

[00:48:29] comfortable

[00:48:29] with grainy

[00:48:30] video calls,

[00:48:31] audio glitches,

[00:48:32] and awkward

[00:48:32] zoom delays.

[00:48:37] We're

[00:48:38] already

[00:48:38] operating

[00:48:39] in a

[00:48:39] world

[00:48:39] where

[00:48:39] good

[00:48:40] enough

[00:48:40] video

[00:48:40] quality

[00:48:41] is,

[00:48:41] well,

[00:48:42] you know,

[00:48:42] good enough.

[00:48:43] But what

[00:48:44] Synthesia shows

[00:48:45] us is that

[00:48:45] this isn't

[00:48:46] just about

[00:48:46] making

[00:48:47] believable

[00:48:47] digital

[00:48:48] humans.

[00:48:48] It's

[00:48:49] about

[00:48:49] transforming

[00:48:49] how we

[00:48:50] create

[00:48:50] and share

[00:48:51] ideas

[00:48:51] at scale.

[00:48:52] When I

[00:48:53] started

[00:48:53] baking

[00:48:53] videos,

[00:48:54] it meant

[00:48:54] countless

[00:48:55] hours of

[00:48:56] shooting,

[00:48:56] reshooting,

[00:48:57] and painstaking

[00:48:58] editing just

[00:48:59] to get a

[00:48:59] simple message

[00:49:00] across.

[00:49:00] Now we're

[00:49:01] approaching a

[00:49:02] world where

[00:49:02] anyone with

[00:49:03] an idea can

[00:49:04] spin up a

[00:49:04] video presentation

[00:49:05] in minutes

[00:49:06] in any

[00:49:07] language with

[00:49:07] any number

[00:49:08] of perfectly

[00:49:09] delivered takes.

[00:49:10] And that

[00:49:10] power to

[00:49:11] create is

[00:49:12] incredible.

[00:49:12] But it also

[00:49:13] means we're

[00:49:14] racing towards

[00:49:15] a fascinating

[00:49:15] cultural

[00:49:16] crossroads.

[00:49:17] Soon,

[00:49:18] everything we

[00:49:19] see online

[00:49:19] might come

[00:49:20] with its

[00:49:20] own digital

[00:49:21] birth certificate,

[00:49:22] a verified

[00:49:23] chain of

[00:49:23] creation that

[00:49:24] tells us

[00:49:24] exactly where

[00:49:26] it came from

[00:49:26] and how

[00:49:27] it was made.

[00:49:28] It's like

[00:49:28] we're building

[00:49:29] a new

[00:49:29] trust architecture

[00:49:30] for the

[00:49:31] digital age.

[00:49:32] In a world

[00:49:33] where anyone

[00:49:33] can create

[00:49:34] any video

[00:49:35] featuring

[00:49:35] any person

[00:49:36] saying

[00:49:36] anything,

[00:49:37] maybe what

[00:49:38] becomes most

[00:49:39] valuable isn't

[00:49:40] the tech that

[00:49:41] makes it all

[00:49:41] possible,

[00:49:42] but the story

[00:49:43] underneath it all.

[00:49:47] The TED AI

[00:49:48] Show is a part

[00:49:49] of the TED

[00:49:49] Audio Collective

[00:49:50] and is produced

[00:49:51] by TED with

[00:49:52] Cosmic Standard.

[00:49:53] Our producers

[00:49:54] are Dominic

[00:49:55] Girard and

[00:49:56] Alex Higgins.

[00:49:57] Our editor

[00:49:58] is Banban Cheng.

[00:49:59] Our showrunner

[00:50:00] is Ivana Tucker

[00:50:01] and our engineer

[00:50:02] is Asia Pilar Simpson.

[00:50:04] Our researcher

[00:50:05] and fact checker

[00:50:06] is Christian

[00:50:06] Apartha.

[00:50:07] Our technical

[00:50:08] director

[00:50:08] is Jacob Winnick

[00:50:09] and our executive

[00:50:10] producer

[00:50:11] is Eliza Smith.

[00:50:12] And I'm

[00:50:13] Bilal Siddhu.

[00:50:14] Don't forget to

[00:50:15] rate and comment

[00:50:15] and I'll see you

[00:50:16] in the next one.

The TED AI Show: How AI digital doppelgängers could change the way we communicate w/ Synthesia CEO Victor Riparbelli

Search Episodes