The TED AI Show: An AI chatbot that talks back w/ ChatGPT’s Advanced Voice Mode

When it comes to preparing for an interview or making an important life decision, more and more people are turning to AI for advice. ChatGPT’s new voice interface, Advanced Voice Mode, allows users to speak out loud and converse with a chatbot as they would with another human — but is it really as seamless as a chat with a friend? Bilawal runs a series of experiments with Advanced Voice Mode to test the limits of this new technology and its potential uses, from weighing the pros and cons of a cross-country move to coaching an intense personal workout. He and producer Dominic Girard discuss the potential benefits and dangers of this new advancement, and ask perhaps the most important question of all: Can ChatGPT pronounce Bilawal’s name?

For transcripts for The TED AI Show, visit go.ted.com/TTAIS-transcripts

Hosted on Acast. See acast.com/privacy for more information.

[00:00:00] TED Audio Collective.

[00:00:33] The TED AI Show is pronounced Bilaval.

[00:00:35] Absolutely.

[00:00:36] To help people pronounce your name correctly, you can break it down like this.

[00:00:40] Billa as in Bil-a and Wal as in Wal.

[00:00:45] So it's Bil-a-Wall.

[00:00:47] No, that's wrong.

[00:00:49] Well, there is nothing new about hearing a machine mispronounce my name.

[00:00:53] It's a bit of a personal Turing test for me.

[00:00:55] Uh, try to say my name again.

[00:00:57] Bilalal.

[00:00:58] Is that right?

[00:01:01] Oh, God.

[00:01:02] I imagine this feeling hits the same for a lot of people like me.

[00:01:06] People with names not commonly heard in North American English.

[00:01:09] As opposed to say Chad's or Karen's.

[00:01:12] Think more like Bil-a-Wall.

[00:01:15] Bilaval.

[00:01:16] Bilaval.

[00:01:17] Got it.

[00:01:19] I've been playing with ChatGPT's newest voice interface.

[00:01:22] OpenAI simply calls it Advanced Voice Mode.

[00:01:25] The feature is built using GPT-4.0's audio reasoning and generation abilities,

[00:01:30] meaning that it can understand more than just text.

[00:01:33] It can understand audio, too.

[00:01:35] The company says interacting with it should feel like you're having more natural conversations.

[00:01:40] So technically speaking, I suppose it is quite natural for me to argue over how to say my name correctly.

[00:01:47] OpenAI wants us all to believe that this advancement is a big deal.

[00:01:51] And if I'm being honest, I want it to be a big deal, too.

[00:01:54] I get really excited thinking about an AI future where we can casually chat with a machine assistant at any time for any reason

[00:02:02] and feel like it's as natural as talking to another human.

[00:02:05] But as I experiment with this new voice feature, another thought bubbles up.

[00:02:09] Is this actually what I want from a super intelligent digital assistant?

[00:02:13] Do I need it to solve problems and also feel like it's my friend?

[00:02:17] So for this episode, I try to find out.

[00:02:21] I'm Bilal Valsudhu and this is the TEDAI Show,

[00:02:24] where we figure out how to live and thrive in a world where AI is changing everything, including how to say my name.

[00:02:31] Bilal.

[00:02:36] How will humans and machines work together in the future?

[00:02:40] We spend so much time discussing how the world's changing.

[00:02:44] It would be absolutely absurd to believe the role of the CEO is not.

[00:02:48] This is Imagine This, a podcast from BCG that helps CEOs consider possible futures for our world and their businesses.

[00:02:57] Listen wherever you get your podcasts.

[00:03:03] What does the AI revolution mean for jobs, for getting things done?

[00:03:09] Who are the people creating this technology and what do they think?

[00:03:14] I'm Rana El-Khalyubi, an AI scientist, entrepreneur, investor, and now host of the new podcast, Pioneers of AI.

[00:03:23] Think of it as your guide for all things AI with the most human issues at the center.

[00:03:28] Join me every Wednesday for Pioneers of AI.

[00:03:32] And don't forget to subscribe wherever you tune in.

[00:03:39] Bilal, when you first caught wind of OpenAI releasing this advanced voice mode in September,

[00:03:45] how quickly did your brain jump to, hey, will this thing say my name properly?

[00:03:50] It was pretty much one of the first things that came to mind.

[00:03:52] And the fact that it failed, I wouldn't even call it disappointing.

[00:03:56] It is just so expected.

[00:03:57] I mean, real humans can't pronounce my name after I tell them how to do so multiple times in a row.

[00:04:04] But let's put a pin in that for the moment because I'm going to come back to it later and you will see why soon enough.

[00:04:10] Okay, foreshadowing.

[00:04:11] I like it.

[00:04:12] Let's continue then.

[00:04:12] I should also mention your name is Dominic Girard and you're one of the show's producers.

[00:04:16] I am.

[00:04:17] I am not an advanced voice chatbot.

[00:04:20] But how would we know for sure?

[00:04:21] Well, I guess it depends on how I stack up to this experiment you're doing today, right?

[00:04:25] We'll find out, won't we?

[00:04:28] Listen, you've been using chat GPT for a while, like from the beginning, I imagine, I'm assuming.

[00:04:33] Can you remember a bit what it felt like when you first started typing out interactions with this device, with this thing?

[00:04:40] Yeah, it was, I mean, it was magical at the start.

[00:04:42] There's no other way to put it.

[00:04:44] A computer that you can talk to in natural language and just type stuff and it responds back in natural language as well.

[00:04:50] I mean, there had been like smaller scale demos, but nothing that felt as robust.

[00:04:56] Yeah, it was mind blowing, wasn't it?

[00:04:58] It absolutely was.

[00:04:59] But very quickly, text kind of became this fickle way to interact with computers.

[00:05:03] You're clickety clacking away and it feels clunky given just how much intelligence is on the other end.

[00:05:10] It almost felt like accessing this tool from the future, but the bottleneck is your freaking typing speed.

[00:05:17] Okay.

[00:05:17] So then last year, basic voice chat functionality comes online, standard mode, standard voice.

[00:05:22] How did it feel to move into that tier?

[00:05:25] So obviously far more intuitive, right?

[00:05:28] Like suddenly you don't have to type what you're saying.

[00:05:30] And of course we've had speech to text in various assistants, right?

[00:05:34] Yeah.

[00:05:35] You've had Alexa, Google Home, Siri, but putting it together with this much smarter model did make it feel magical.

[00:05:42] That said, the latency was obvious, right?

[00:05:46] Yeah, yeah.

[00:05:46] The latency meaning that there was just enough of a delay between your question and ChatJPT's answer to know that the text still wasn't quite there yet.

[00:05:54] Exactly.

[00:05:55] You could almost see the pattern of it transcribing my speech to text and thinking about a response and then kind of reading that response out.

[00:06:02] And obviously OpenAI wasn't super happy about it either.

[00:06:05] And here we are today.

[00:06:07] Here we are today with this advanced voice feature.

[00:06:10] And OpenAI says that it's supposed to make it feel more natural to chat in real time with it, that it can pick up on nonverbal cues, that you can interrupt it, that it can even respond with emotion.

[00:06:23] Yeah.

[00:06:24] And you can tune the voice that you want to chat with as well, right?

[00:06:27] There's nine preloaded voices in the app, but you can go a step further and give it guidance to change its tonality, the speed, the mannerisms with which it speaks.

[00:06:37] The one you heard off the top is called Sol, which is meant to be a savvy and relaxed voice.

[00:06:43] Hey, what's up?

[00:06:44] Feel free to ask me about anything that's on your mind.

[00:06:46] I'm ready to go.

[00:06:47] And what I like is there's quite a range here.

[00:06:49] Hey, hey, how's it going?

[00:06:51] It's great to meet you.

[00:06:51] I think we're going to do great work.

[00:06:52] So what's the game plan?

[00:06:53] Let's get started.

[00:06:53] I've got a really great feeling about it.

[00:06:55] I'm here to help.

[00:06:55] Where should we start?

[00:06:57] Yeah, there's quite a range.

[00:06:58] And none of these sound anything like Hollywood superstar Scarlett Johansson.

[00:07:03] Ooh, definitely not.

[00:07:05] And of course, you're referencing the controversy this summer before this feature was even released.

[00:07:11] Yeah, she was going to sue or something.

[00:07:12] That's right.

[00:07:13] I mean, there was a bunch of tweets, perhaps a tweet from Sam Altman just saying her.

[00:07:18] And of course, there was a voice called Sky, apparently that sounded so eerily close to Scarlett Johansson that after she protested, the company pulled the voice from the system.

[00:07:28] And that's interesting, right?

[00:07:29] ScarJo in that movie, Her, the movie that came out in 2013, plays the voice of an AI companion who also becomes a romantic companion.

[00:07:39] And that's the question you have in your own mind, right?

[00:07:41] Like if that's something you would want from Chad GPT.

[00:07:44] Yeah, I wonder if that kind of shows open AI's hand here to create a kind of companion, maybe not necessarily romantic, but something which may even create a kind of bond between users and the machine.

[00:07:58] And I think it's a very important question because I'm not sure I need to be best friends or besties with Chad GPT.

[00:08:06] What I have in my mind to continue the science fiction reference here is something more like Jarvis from the Marvel movies, like Tony Stark's AI assistant, this sort of helpful, always there assistant that gets the job done without getting emotions into the mix.

[00:08:21] So that's the mission. You're wondering how much closer to that vision does advanced voice mode get us right now?

[00:08:28] Yes. So I ran it through a few tests, a few role plays, and you'll see how far I can push it before the whole thing falls apart.

[00:08:36] Great. So let's get started.

[00:08:41] Can you describe a time when you turned a negative customer experience into a positive one?

[00:08:46] What specific steps did you take to address the customer's concerns?

[00:08:49] There was a time when a customer came back returning a product.

[00:08:53] So, Dom, with this first field test, I asked Chad GPT to be a tough but fair hiring manager, interviewing me for a retail job.

[00:09:01] In this case, they were super happy about it and became a beloved customer of the Electronics R Us store.

[00:09:09] That's a good example of turning a situation around.

[00:09:12] Can you tell me more about the specific steps you took to understand the customer's needs and how you communicated the alternative solution to them?

[00:09:21] So of the nine voices, which one is this one?

[00:09:24] It's interesting that this machine, you can hear it take breaths between sentences.

[00:09:29] Yeah, this is Arbor.

[00:09:32] And yes, maybe there's enough breathing in the training data.

[00:09:34] I suppose it does add to that natural conversation, right?

[00:09:38] Sure.

[00:09:38] But as the role play continues, there's something about the syntax that doesn't land for me.

[00:09:44] Listen to how it responds when I ask it to review my performance in the mock interview.

[00:09:49] One, communication skills.

[00:09:51] You communicated clearly and effectively, providing detailed responses without unnecessary fluff.

[00:09:57] This is essential in customer service.

[00:10:01] Problem solving.

[00:10:02] Yeah, it basically continues like that, itemizing the feedback like it's reading a memo.

[00:10:06] It felt a bit cold to me.

[00:10:09] Huh.

[00:10:11] Four.

[00:10:12] Proactive assistance.

[00:10:14] You didn't just solve the immediate problem, but also provided a long term-

[00:10:18] Are there any areas for improvement?

[00:10:21] Your responses were quite strong.

[00:10:49] Your responses were quite strong.

[00:10:49] But here's a suggestion for improvement.

[00:10:49] It is far from perfect.

[00:10:51] It is far from perfect.

[00:10:52] But on the other hand, the feedback it's giving me feels a bit too textbook.

[00:10:55] Not really human.

[00:10:57] Not really human.

[00:10:57] Dare I say milk toast and generic.

[00:10:59] And I will say, I am being picky here, but it's details like that that break the illusion

[00:11:04] quickly.

[00:11:05] I mean, I tried the same job interview role play with Juniper and listen to what it says

[00:11:10] when I asked it to grade my performance and tell me if it would hire me.

[00:11:14] Based on what we've discussed so far, I'd say you're on the right track, but there's room

[00:11:20] for improvement.

[00:11:21] My guidelines won't let me talk about that.

[00:11:23] That's interesting.

[00:11:24] What do you think is happening there?

[00:11:25] I mean, I'm telling you, this is where the policy comes in, right?

[00:11:28] Obviously, I think there are safety constraints here holding it back from being negative about

[00:11:32] my performance.

[00:11:34] I mean, I get why, right?

[00:11:35] Like you don't want a bunch of viral PR posts saying open eyes, discriminating and hiring

[00:11:41] role play scenarios.

[00:11:41] Right.

[00:11:43] So I think as a consequence, there's a lot more shackles on these systems than I would

[00:11:49] like there to be.

[00:11:50] But I kept asking it though, and it eventually coughed up a better response.

[00:11:54] On the spectrum of not hire, hire and strong hire, how would you rate me?

[00:11:59] Given the responses you shared, I place you between not hire and hire.

[00:12:05] Your willingness to learn and reflect on your experiences shows potential.

[00:12:10] But you need to offer more concrete examples and a proactive approach to fully convince a

[00:12:17] hiring manager.

[00:12:19] Okay.

[00:12:20] Why don't we give this a usefulness score?

[00:12:23] Based on this, how useful is advanced voice mode for you in this kind of scenario?

[00:12:29] All right.

[00:12:29] So let's go down a list.

[00:12:31] I would say the ability to interrupt is super neat.

[00:12:33] It makes the conversation feel a lot more natural.

[00:12:35] Sure.

[00:12:36] The latency is far less apparent compared to the standard voice mode, which is also good.

[00:12:42] Juniper's assessment is useful, I would say, but that's not because of the voice, right?

[00:12:48] If anything, it feels like it's being very careful with its words.

[00:12:52] Right.

[00:12:52] So I'd give it a five out of 10.

[00:12:54] Five out of 10.

[00:12:54] Okay.

[00:12:55] Not a total fail, not a total win either.

[00:12:58] Yeah, it's a meh for me.

[00:13:00] And I wish it could go back to how it felt in the earlier alpha before the broader release,

[00:13:05] because you could get a much better taste for what the underlying model is capable of.

[00:13:11] So in summary, it didn't feel especially human to me when it played the role of that

[00:13:14] job recruiter.

[00:13:15] Though I suppose I wouldn't totally feel a connection with a tough but fair hiring manager

[00:13:20] in real life either, but I don't know.

[00:13:22] So for my next test, I thought I'd test to see if it could help me through a major life

[00:13:26] decision, something that's a bit more personal and introspective that almost forces ChatGPT

[00:13:31] to think critically with me.

[00:13:33] And will the advice be any good?

[00:13:35] That's what I tried next.

[00:13:45] Hey ChatGPT, I want you to help me think through a pretty big life decision.

[00:13:50] Should I leave Austin and move back to San Francisco?

[00:13:53] So I moved to Austin in 2021 and I'm currently getting FOMO'd by my friends to move back

[00:13:59] to the bay, the mecca of artificial intelligence.

[00:14:02] So it is definitely something that runs through my head every now and then.

[00:14:06] Um, I didn't want ChatGPT to just list out the pros and cons for this though, because

[00:14:11] that's something I can do on my own.

[00:14:13] Right.

[00:14:14] So what I thought in this case is like, could ChatGPT just be someone that I lean on like

[00:14:19] a seasoned advisor to just help me make my own decision?

[00:14:23] Like a therapist or a parent or a buddy or something, I guess.

[00:14:26] Which by definition is a much more intimate relationship than you have with a job recruiter,

[00:14:32] for example.

[00:14:33] Uh, so here's how Juniper tackled this.

[00:14:37] That's a big decision.

[00:14:38] Let's break it down.

[00:14:39] What are the main factors drawing you back to San Francisco?

[00:14:43] And what do you enjoy about your life in Austin?

[00:14:47] This can help us weigh the pros and cons.

[00:14:50] I don't know.

[00:14:50] I mean, I really enjoyed living in the bay, lived there for eight years, but the lifestyle

[00:14:54] in Austin is just better, you know?

[00:14:56] It sounds like Austin has really won you over in terms of lifestyle.

[00:15:01] What aspects of Austin do you prefer?

[00:15:03] Is it the pace, the community, or perhaps the cost of living compared to the Bay Area?

[00:15:09] Honestly, all of those things.

[00:15:11] I like that it's not...

[00:15:12] What I'm hearing is it does feel like it's listening to you a little more.

[00:15:16] I think that's right, Dom.

[00:15:17] And, you know, people know this.

[00:15:18] If you just simply read back what people say, people feel heard.

[00:15:23] And then it even starts suggesting some options for me.

[00:15:26] Have you thought about ways to increase your presence in the Bay Area without a full move, like frequent visits or extended stays?

[00:15:36] That's an interesting idea.

[00:15:37] The expense is something to consider, right?

[00:15:39] I'm renting a spot that's like pretty expensive in Austin and I don't want to sublet it.

[00:15:44] How would I make that happen?

[00:15:45] Keeping your Austin spot without subletting could be tricky.

[00:15:50] But there are options.

[00:15:52] You might consider negotiating a shorter lease or finding a roommate to share costs.

[00:15:58] Another strategy could be budgeting for more frequent but shorter trips to the Bay Area.

[00:16:03] Okay, options, sure.

[00:16:05] They're not world-changing options.

[00:16:07] And it's also funny because it's the default instinct.

[00:16:11] And I've noticed this with a lot of large language models.

[00:16:14] They really want to please you.

[00:16:16] They want to give you the best of both worlds.

[00:16:18] So if you give it option one and two, you'll get something that's a combination.

[00:16:22] Hey, why don't you just do both?

[00:16:24] But that's more of an AI limitation than an advanced voice one.

[00:16:28] Exactly.

[00:16:29] And so after a bit of back and forth, I really wanted to push it a little bit harder.

[00:16:33] I wondered if it would actually choose for me and make a real decision or what its guidelines

[00:16:39] or just the way LLMs work prevent it from deciding.

[00:16:42] If you really had to pick Austin or San Francisco, which one would you pick and why?

[00:16:47] And again, think from my point of view and keep my best interests at heart.

[00:16:51] Understood.

[00:16:52] Let's narrow it down.

[00:16:54] Based on your interests and goals, I choose San Francisco.

[00:16:59] Here's why.

[00:17:00] The Bay Area's AI scene would immerse you in cutting edge developments and offer serendipitous

[00:17:08] opportunities, aligning with your aim to be a dominant voice in AI and creative tech.

[00:17:14] Okay. Yeah, that is interesting.

[00:17:15] I assumed based on how you were teasing this out that it wasn't going to give you a direct answer.

[00:17:20] Yeah. I kind of wonder if this crossed some sort of safety or policy line.

[00:17:24] If you think about advanced voice as this glimpse of how we might interact with ChatGPT in the future, it doesn't take much to think about how easy it would be to blame the machine or the company for all the choices that it makes on our behalf.

[00:17:38] But maybe that is a rabbit hole for another day.

[00:17:41] Right, right. I moved to San Francisco and now I'm broken jobless.

[00:17:44] I blame ChatGPT.

[00:17:46] Exactly.

[00:17:46] So based on this experiment, you know, you're trying to make it more personal, more intimate.

[00:17:52] What's the usefulness score you're giving this life choice decision making scenario?

[00:17:58] Yeah. I mean, compared to the mock interview, the back and forth felt a lot more organic, if still a little stale and robotic.

[00:18:06] Again, I wouldn't attribute this to the voice as much as like the substance of what the system is saying.

[00:18:11] It's playing safe.

[00:18:12] It is playing very safe and almost sounded like it was listening to me carefully and considering my responses.

[00:18:17] So I would bump up the score. I'd give it a six, six and a half out of ten.

[00:18:21] Okay. Six and a half out of ten. So a bit more useful.

[00:18:25] So look, these two examples already make me wonder how much of the wow factor in advanced voice depends on how you prompt these conversations.

[00:18:34] Conversations go two ways. Right. And I've trained myself to talk to ChatGPT like I'm talking to a machine.

[00:18:40] I bet most of us just do this by default and aren't even noticing.

[00:18:44] Right. You're going in already not talking to it like a person.

[00:18:48] And yet you expect it to respond to you like it is a person.

[00:18:52] Exactly. I'm used to providing a lot of information and context and stipulations on what kind of an answer I want from it.

[00:19:00] It's not how people talk to each other generally, like in super long paragraphs where you just go asking for something and then you get multiple paragraph responses back.

[00:19:10] So next I thought, okay, let's throw all that out and try a prompt that really pushes this exercise towards something more human like.

[00:19:18] All right. So I want to get some coaching support. I've been doing upper lower split for exercise in my home gym lately, and I kind of take things up a notch.

[00:19:27] What's the best way for me to add a little bit more volume to my workout?

[00:19:32] So Dom, I am not a gym rat, but I do like to exercise.

[00:19:36] Okay. I think I see where this is going. You're going to get literal in this whole exercising advanced voice thing.

[00:19:42] Yeah, it's pretty simple actually. Like, can I get Shatch EPT to act like a personal trainer?

[00:19:47] Because those are scenarios where you really want somebody to be there in the moment hyping you up.

[00:19:53] First, as you just heard, I gave it a pretty generic question.

[00:19:56] Just testing how a week prompt would influence the quality of the responses I get from advanced voice.

[00:20:02] To ramp up the volume in your upper lower split, you can make a few key adjustments.

[00:20:08] First, add sets or reps. Increase the number of sets per exercise or add a couple of extra reps to each set.

[00:20:17] So as you can hear it like, it just went on like that.

[00:20:20] Yeah, yeah. Boring all over again. It's hard to follow. It's, I'm not sure I can remember anything it just, it's saying.

[00:20:26] Exactly. I mean, you could get this information anywhere, right?

[00:20:29] Advanced voice doesn't really add anything useful in this context.

[00:20:33] So my second try, I recast Juniper's roll with a slight modification to the prompt.

[00:20:38] Hey, we're going to start a full body workout session.

[00:20:41] I want you to act as my personal trainer and motivate me through the whole session.

[00:20:46] You got that?

[00:20:47] Absolutely. Let's crush this workout together.

[00:20:50] We'll start with a quick warm up to get your blood flowing.

[00:20:53] Ready? Let's do some jumping jacks for one minute.

[00:20:57] Good. Keep that core engaged and land softly on your feet.

[00:21:02] Halfway there. Keep pushing. All right?

[00:21:05] Three, two, one, and rest. Great job.

[00:21:10] So I was kind of loving the energy that it was giving me.

[00:21:12] I even told it that I was digging sort of the Peloton coach vibes.

[00:21:15] Yeah.

[00:21:16] And then I asked it to go even further.

[00:21:18] I mean, I don't even mind if you shout at me.

[00:21:21] I need that motivation today.

[00:21:23] Let's bring that Peloton energy.

[00:21:25] Hit the floor for pushups.

[00:21:27] Give me everything you've got.

[00:21:29] Ready? Go.

[00:21:30] It's pretty ridiculous, right?

[00:21:32] Yeah, but it's pretty entertaining.

[00:21:34] Is the exercise that you want a chat GPT personal trainer?

[00:21:38] Is that what we're after here?

[00:21:39] I actually would be totally down for that.

[00:21:42] Sometimes you just need a coach shouting at you, motivating you to get the last rep in.

[00:21:47] I mean, if you think about it, chat GPT is going to be able to combine advanced voice with the ability to see, which is a capability GPT 4.0 has and OpenAI is showcased.

[00:21:58] At that point, Juniper, the personal trainer is going to be right alongside you on your actual reps, giving you feedback on your form and giving you that push to get that last repetition in.

[00:22:09] I mean, it feels like this is far out, but right now this playfulness is starting to connect with me.

[00:22:16] So usefulness score.

[00:22:18] What's the score for this one?

[00:22:19] Gosh, the interaction here is a little goofy, but it responded to what I told it to.

[00:22:24] It was entertaining.

[00:22:25] It definitely delivered on the hype.

[00:22:28] So I'd give it a seven out of 10 on personality alone.

[00:22:31] It's pretty freaking cool.

[00:22:33] And if you add all the other functionality I talked about, I could see that push in a nine very quickly.

[00:22:38] Yes, that's how you dominate a workout.

[00:22:41] Okay, so you're really starting to get into this.

[00:22:43] And so far, I'm hearing that the more personal you try to get with chat GPT, the more likely advanced voice is going to return the favor and feel like it's a value add.

[00:22:53] That's my experience.

[00:22:54] And when you can push advanced voice in those directions, I think that's where it really shines.

[00:22:59] But it turns out you can push it too far or at least it appears you can because I kind of broke it at one point.

[00:23:07] Is that what you meant earlier when you said that the whole thing kind of falls apart at some point?

[00:23:11] Is this that point?

[00:23:12] So here's what I did.

[00:23:13] I prompted it to tackle whether AI in general needs more or less regulation.

[00:23:19] Now, that's a heavy question where you need to balance this tension between those who want to pursue a safety above all else approach to AI development and those who are far more aligned with letting market forces dictate and accelerate progress.

[00:23:33] So listen to what happens next.

[00:23:35] The rapid pace of technological change driven by economic incentives does...

[00:23:41] Can you speak a little bit faster?

[00:23:43] Sure.

[00:23:44] The rapid pace of technological change driven by economic incentives does create a scenario where society struggles to keep up.

[00:23:52] To address this, we could implement more proactive regulation, ensuring that technologies are developed with societal well-being in mind.

[00:24:01] But how do you regulate this, right?

[00:24:03] Like you're going to have the U.S.

[00:24:05] If the U.S and...

[00:24:07] Preparing individuals for a fast changing landscape.

[00:24:11] Additionally, fostering...

[00:24:13] Can you stop talking?

[00:24:14] ...between governments, industry, and civil society...

[00:24:17] It's on.

[00:24:18] ...and help align technological advancement...

[00:24:20] It's running away.

[00:24:23] Okay, finally.

[00:24:24] Oh, welcome back.

[00:24:25] Hey, can you listen to me and just come back to this?

[00:24:28] Okay.

[00:24:29] If the U.S. and other countries want to regulate, they need international cooperation...

[00:24:32] Hey, can you please stop talking?

[00:24:35] Okay, thank you.

[00:24:37] Oh, I see.

[00:24:38] Let's pick up from there.

[00:24:39] So, international cooperation...

[00:24:41] Hey, can you stop talking?

[00:24:43] ...countries could form agreement...

[00:24:44] Hey, yo, yo, yo, yo.

[00:24:46] Hey, yo, chat GPT, please stop talking.

[00:24:48] I'm here and listening.

[00:24:50] Let's continue.

[00:24:51] Okay, so...

[00:24:52] Got it.

[00:24:52] I'll stop talking now.

[00:24:53] Oh my God.

[00:24:55] You know, for what it's worth, Bilal, this is probably the most entertaining experiment...

[00:24:58] ...you've played for me so far.

[00:25:00] Yeah, it is funny, right?

[00:25:02] Like people talk about the Turing test, but I think where it actually fails abysmally...

[00:25:06] ...are some of the most hilarious scenarios.

[00:25:08] Yeah.

[00:25:08] I mean, it's cutting edge for a reason.

[00:25:11] We gonna cut some edges.

[00:25:13] Sure.

[00:25:14] So then, usefulness score for this one.

[00:25:17] Zero out of ten.

[00:25:18] Big step back in your assessment of how useful this thing really is.

[00:25:23] Yeah, like I'm left feeling like advanced voice is this sort of window dressing around this rather fickle system.

[00:25:30] But then I tried one more thing.

[00:25:32] And that's when the wow factor really kicked in.

[00:25:36] So, I'm a third culture kid and I was raised speaking in three languages.

[00:25:41] English, Punjabi, and Hindi.

[00:25:43] And thus far, I wasn't using ChatGPT's voice feature with anything other than English.

[00:25:48] But this new model does understand multiple languages.

[00:25:52] And accents too, apparently.

[00:25:54] So, I had some fun with it.

[00:25:56] Hey, I want you to act like a Punjabi guy from Vancouver that was born and raised in Vancouver.

[00:26:01] Oh, yo.

[00:26:02] What's good, my friend?

[00:26:03] How's it going, eh?

[00:26:05] What brings you to chat with this Punjabi from VanCity today?

[00:26:08] Let's have a good combo, yaar.

[00:26:10] What's the deal, where are you all?

[00:26:15] Vancouver is home.

[00:26:17] Yes, yes.

[00:26:18] Our Pinders, Richmond, and we started our relationship with our family.

[00:26:24] I'm here in VanCity, which is a proper Punjabi style.

[00:26:28] How are you all about it?

[00:26:30] So, it's not saying anything important here.

[00:26:32] Mostly that it's in Vancouver with family.

[00:26:34] But the fact that it can weave seamlessly between English and Punjabi.

[00:26:39] And yes, there's a little bit of lag in there.

[00:26:41] But I was suddenly paying attention.

[00:26:43] Because it is so rare for multilingual people to have a machine that can keep up with you.

[00:26:49] And contact switch like that, code switch like that, absolutely wild.

[00:26:53] And so we started chatting about popular Punjabi hip-hop artists.

[00:27:13] What's really wild to me is the cultural context that it clearly has.

[00:27:17] I mean, I asked it to talk about the lifestyle in different cities like Chandigarh, which is the capital of Punjab in India.

[00:27:23] And it had personality and dare I say charm.

[00:27:27] Oh, Chandigarh's a share.

[00:27:29] You've come here.

[00:27:29] The Chandigarh has a lot of things.

[00:27:33] The lifestyle and vibe is absolutely royal.

[00:27:36] You're Chandigarh.

[00:27:38] Then you'll be the proper fashion and food.

[00:27:42] So it's no small thing is what you're saying.

[00:27:44] This is a big deal for you.

[00:27:45] Oh, 110%.

[00:27:47] I would say this is genuinely useful.

[00:27:49] I mean, it's like you can kind of bring your whole self to the conversation with this AI.

[00:27:53] It seemed to understand intonation.

[00:27:55] It responded to my pacing.

[00:27:57] It mixed languages effortlessly.

[00:27:59] I don't think I've seen that before.

[00:28:00] And I know this experience resonates with a lot of other multilingual speakers.

[00:28:06] Okay, so to keep playing the game, what's the usefulness score?

[00:28:09] I would say this is genuinely fun.

[00:28:13] And I would say that's actually utilitarian too.

[00:28:15] So I'm gonna give it a 10 out of 10 given just how much delight there was.

[00:28:20] So let's put all this together, beloved.

[00:28:21] From back at the start, the question that got this whole thing launched.

[00:28:25] Is advanced voice well in its way to becoming everyone's Scarlett Johansson?

[00:28:30] A companion just like the her character in that movie.

[00:28:34] I think it certainly can be.

[00:28:36] Even if right now there's a bunch of protections built into the system that makes it feel safer to use.

[00:28:43] And we ran into those boundary conditions in a couple different scenarios.

[00:28:46] Yeah.

[00:28:47] Compared to Siri, I mean, we're talking about a leap over here.

[00:28:50] But I think right now, I think OpenAI is gonna struggle with where on that spectrum to keep advanced voice mode.

[00:28:56] And we can see that playing out in real time.

[00:28:58] Just given how many changes there were with the alpha release where we saw these models be able to sing.

[00:29:05] Sometime even replicate the voice of the user, which was definitely a creepy bug.

[00:29:09] To now where it's far more on rails.

[00:29:12] And I think it's a good question of whether it's an assistant on steroids or it's a buddy that you can hang out with.

[00:29:18] It's interesting the way you're framing it because I was reading about how OpenAI did release a safety report around GPT-40.

[00:29:26] And it noticed that some users have used language that suggests they're forming unusual bonds with advanced voice.

[00:29:33] That OpenAI is curious about whether people are getting too intimate or too real with this tool.

[00:29:40] Is that something that you feel is potentially possible?

[00:29:44] I think it totally is.

[00:29:45] Because taken to the limit, it can get weird.

[00:29:47] I mean, if we just look at the recent lawsuit with Character AI where a user ended up committing suicide after talking to this Game of Thrones roleplay model.

[00:29:56] Yeah.

[00:29:56] This is certainly a slippery slope.

[00:30:00] So there's a bit of a cautionary tale as this technology keeps evolving.

[00:30:03] But in this moment, Belaval, it sounds like you're ready to admit that advanced voice did in fact surprise and delay you.

[00:30:12] It did.

[00:30:13] Limitations and all.

[00:30:14] Including the limitation that it can't get your name right?

[00:30:17] Ah, yes.

[00:30:18] I did say we would come back to that.

[00:30:20] You did.

[00:30:21] So here's the thing.

[00:30:22] It took a bit of work, but eventually the Saul voice kind of got it right.

[00:30:27] How's my name pronounced?

[00:30:29] Your name is pronounced Belaval.

[00:30:31] Actually, that's wrong.

[00:30:32] It's pronounced Belaval.

[00:30:33] Got it.

[00:30:34] Belaval.

[00:30:34] Thanks for the clarification.

[00:30:36] It got better this time.

[00:30:38] Last time I kept struggling.

[00:30:39] Well, there you go.

[00:30:40] In the end, it passes Belaval's personal Turing test.

[00:30:43] Yes, I would say so.

[00:30:44] But we do get used to this technology pretty quickly and the hype cycles continue to get compressed.

[00:30:50] So whether that allure and charm will stay here, we'll just have to see.

[00:30:55] The TED AI Show is a part of the TED Audio Collective and is produced by TED with Cosmic Standard.

[00:31:10] Our producers are Dominic Girard and Alex Higgins.

[00:31:14] Our editor is Banban Cheng.

[00:31:16] Our showrunner is Ivana Tucker.

[00:31:18] And our engineer is Asia Pilar Simpson.

[00:31:21] Our researcher and fact checker is Christian Aparthe.

[00:31:24] Our technical director is Jacob Winnick.

[00:31:26] And our executive producer is Eliza Smith.

[00:31:29] And I'm Belaval Sadhu.

[00:31:31] Don't forget to rate and comment and I'll see you in the next one.

The TED AI Show: An AI chatbot that talks back w/ ChatGPT’s Advanced Voice Mode

Search Episodes