The TED AI Show: How Meta wants to shape our digital future with open source AI w/ Ragavan Srinivasan

Llama is Meta’s Large Language Model trained on over 15 trillion tokens of publicly available information. It’s available to anyone – from people making custom fan-made entertainment on a smartphone… to, potentially, complex projects that may not have the public’s well-being in mind. So if Llama is such a widely available and powerful product, why is Meta releasing it – for free? Bilawal chats with Meta’s own Vice President of Product, Ragavan Srinivasan, to discuss the pressing questions around Llama’s benefits and risks.

For transcripts for The TED AI Show, visit go.ted.com/TTAIS-transcripts

Hosted on Acast. See acast.com/privacy for more information.

[00:00:00] TED Audio Collective

[00:00:08] In the mid-2000s, a small TV show called Veronica Mars quietly exploded and became something of a cult hit. Ever since, its fans have begged for a revival.

[00:00:19] Now, I don't know if a reboot is in the cards, but what I do know is that on Facebook you could ask Kristen Bell, the actor who plays that character, what she thinks of the odds.

[00:00:28] I also know you could hop on Instagram and ask Awkwafina to workshop some comedy routines about obsessive fan bases and pop culture.

[00:00:37] And on WhatsApp, I know that pro wrestler turned actor John Cena could try to explain why an old TV show about a young female private investigator remains such a cultural phenomenon for so many.

[00:00:49] And if you're just catching on, I'm talking about using Meta AI, the embedded AI assistant baked into these platforms to goof around for a little bit.

[00:00:57] Now, it wouldn't be the most world-changing use case of AI, but sometimes you just need a little AI-flavored entertainment.

[00:01:04] And Meta is more than happy to entertain you on its social platforms.

[00:01:10] In late September, the company announced that now you can voice chat with its AI assistant instead of just typing.

[00:01:16] And Meta AI can borrow the voice of one of these celebrities, as well as Keegan-Michael Key and Dame Judi Dench.

[00:01:23] Meta also said that their chatbot can now speak multiple languages, and that it's also multimodal in that it has the ability to understand and transform images that you might show it.

[00:01:34] Now, it's all fun and games to be sure, but these type of features are table stakes now.

[00:01:39] And in the quest to offer AI tools to its users as rapidly as possible, Meta is catching up to the competition quickly.

[00:01:46] But these features are just the tip of the iceberg.

[00:01:49] There's a lot more to this tech than just another large language model owned by another big tech company.

[00:01:55] And if Veronica Mars were tasked with investigating what's really going on here, one peak below the surface would reveal a surprising accomplice.

[00:02:03] An open source llama.

[00:02:09] I'm Bilal Sudu, and this is the TEDAI Show, where we figure out how to live and thrive in a world where AI is changing everything.

[00:02:22] Imagine this.

[00:02:23] In 2030, the CFO of a Fortune 100 company is a bot.

[00:02:28] I'm Paul Michaelman, and on Imagine This we'll be exploring possible futures and the implications they hold for organizations.

[00:02:35] Joining me will be BCG's top experts, as well as my co-host Gene, BCG's conversational Gen AI agent.

[00:02:42] Blending human creativity with AI innovation, this podcast promises an unmatched listening journey.

[00:02:49] Join us on Imagine This from BCG.

[00:02:52] You've probably heard about artificial intelligence and chat GPT, but do you know the person in charge?

[00:02:58] On our podcast, Good Bad Billionaire, we tell the stories of how the world's billionaires made their money.

[00:03:03] We're telling the story of Sam Altman, the boss of OpenAI who make chat GPT.

[00:03:08] He became a billionaire this year, but his wealth has nothing to do with artificial intelligence.

[00:03:12] He actually got rich investing in other tech startups.

[00:03:15] Listen to Good Bad Billionaire to learn how he did it and whether he's good or bad.

[00:03:19] That's Good Bad Billionaire wherever you get your BBC podcasts.

[00:03:26] LAMA is the name that Meta gave its own large language model.

[00:03:29] It's an impressive LLM, trained on over 15 trillion tokens of publicly available information.

[00:03:36] In September of 2024, four sizes of the model were made available, from 1 billion to 90 billion parameters.

[00:03:43] Small enough to run locally on a smartphone or big enough to run the most complex projects, which is kind of wild.

[00:03:50] Wilder still, depending on how you look at it, is Meta's unconventional approach to releasing LAMA.

[00:03:56] It's essentially an open source license.

[00:03:58] You can download it right now and tinker with it as you'd like, for free.

[00:04:03] For the most part, big AI companies like Google and OpenAI tend to prefer a closed approach to distributing most of their own LLM-based products.

[00:04:12] And of course, they prefer having people pay for access too.

[00:04:16] But imagine LAMA baked into Facebook, Instagram, WhatsApp and more.

[00:04:20] This could revolutionize how we interact online.

[00:04:23] On the flip side, this raises questions about security and misuse.

[00:04:27] We can't always trust that everyone who's using open source tech will do so with the public's well-being in mind.

[00:04:33] So, will the benefits of openness outweigh the risks?

[00:04:37] And what's in it for Meta?

[00:04:39] A good person to ask is Raghavan Sreenivasan.

[00:04:42] He's the vice president of product at Meta.

[00:04:45] And leads the team responsible for developing and releasing the company's herd of LLMAs.

[00:04:52] Raghavan, welcome to the show.

[00:04:53] Thank you, Bilal. Nice to be here.

[00:04:56] So, this fall, Meta announced a new update to LLAMA, as well as updated features to its own AI assistant called Meta AI.

[00:05:04] And I want to touch on Meta AI with you first for a moment, because I expect that's the tip of the iceberg,

[00:05:10] where in a lot of people's first experience with an AI assistant is going to be Meta AI.

[00:05:15] Simply because it's embedded in Meta's ubiquitous social platforms, right?

[00:05:18] We're talking Facebook, Instagram, WhatsApp.

[00:05:20] Heck, it's even on the Ray-Man glasses and the Meta Quest headset.

[00:05:25] So, Raghavan, talk to me like a user first.

[00:05:27] What are the kinds of things you find yourself doing with Meta AI?

[00:05:31] So, when you think about Meta AI, it is this universal assistant that is available to you right at your fingertips.

[00:05:39] And one of the most exciting feature that we have announced with the Connect launch was Meta AI used to primarily be about text.

[00:05:47] And now, you can talk to Meta AI and Meta AI can see.

[00:05:52] So, we announced voice and vision capabilities that are being rolled out into this product.

[00:05:59] And this becomes really interesting for you because you can now start talking to it.

[00:06:02] You can ask it questions.

[00:06:04] It will respond back to you in the languages that you know how to speak.

[00:06:07] And you can also now start sharing with it images.

[00:06:11] And all of a sudden, now this assistant that's available to you is able to understand this across all of the modalities.

[00:06:17] Are there certain things that you find yourself using on a daily basis?

[00:06:21] I'm kind of curious.

[00:06:22] Knowing that you're in the weeds with the model themselves, what's the stuff that you find yourself going back to?

[00:06:26] Yeah, so the first one on an almost daily basis.

[00:06:29] I have a lot of my friends and family living all over the world.

[00:06:32] And WhatsApp is our lifeblood for us.

[00:06:35] And with our friends in particular, we are always joking around, passing some memes along.

[00:06:40] And especially, you know, I have a lot of friends from India.

[00:06:42] And now I'm able to pull in Meta AI into one of these threads and have it riff along with us, almost like it's one of our friends.

[00:06:50] And the amount of entertainment that it provides is just incredible.

[00:06:53] And then I have three kids.

[00:06:55] Each one of them needs some help or another with homework.

[00:06:59] And sometimes, you know, calculus, I've forgotten my calculus.

[00:07:04] And so if I need to answer them and they ping me, then I do end up cheating a little bit and asking Meta AI to help me.

[00:07:11] That's very cool.

[00:07:12] I mean, it's the walking use case that I find myself using is when I'm on a long walk and I want to like just get a quick distillation of a very complicated topic.

[00:07:20] It's such a magical way to get that while you're out and about in the world and ask follow up questions.

[00:07:26] Exactly.

[00:07:27] In my case, oftentimes I end up taking our dog out for a walk.

[00:07:30] And so, you know, on one hand, there's already a leash.

[00:07:33] So if you have to like type and interact with it, it was always hard.

[00:07:35] And now I can just talk to it and it'll just like answer back, which is great.

[00:07:39] I have to imagine this gives a huge competitive advantage to Meta because in this AI race, as everyone's framing it up to be, you know, obviously Meta can now make your AI assistant available to hundreds of millions of users for free.

[00:07:53] And so I have to ask, like, does that footprint of first party distribution literally help your teams learn and iterate?

[00:08:01] Like, is there a data flywheel in here where in Meta is like a unique position to harvest fresh data and insights for future AI models?

[00:08:10] The way we think about this is actually twofold.

[00:08:13] When you think about the core foundation models that we train, the LAMA models that we train, they often operate on what we call sort of like general knowledge because you want these models to be as general purpose as possible.

[00:08:26] And then over time, as you build these models, we then start working really closely with our product teams for them to be able to take the next generation of these general purpose models and then start to specialize them and fine tune them for their needs.

[00:08:41] And so they end up deploying customized versions of LAMA through Meta AI.

[00:08:47] And that's what our end users see.

[00:08:49] So what then ends up happening is when you're out and about walking, you're able to quickly get feedback signals on like, was this translation that I just gave you helpful?

[00:09:01] Was it not?

[00:09:02] And these kinds of small but very effective at scale sort of feedback loops can help us ship a very compelling product experience.

[00:09:11] Do we use that to learn back into the models?

[00:09:14] Not as much as you might imagine, primarily because the core model itself is really focused on this notion of general knowledge.

[00:09:22] However, the product experience starts to become a lot more improved because we now have tighter feedback loops to be able to say, okay, this set of conversations in this market probably are not jiving well with people.

[00:09:37] So we need to improve them in a certain way.

[00:09:39] And we have a different sort of like feedback loop for those models, if that makes sense.

[00:09:42] So it's almost like you've got these general purpose models, but when you get into these task specific use cases, you are able to mine some very interesting signals that help you refine those models for those specific applications.

[00:09:55] Exactly.

[00:09:56] And this is something that, you know, we've done even before Gen AI.

[00:09:59] So if you think about Facebook translations as an example, right?

[00:10:02] Translations are an incredibly powerful feature, even today.

[00:10:06] When a lot of people come to our apps, they don't necessarily speak the language, but they're really interested in understanding what's happening around the world.

[00:10:12] And we have amazing AI systems that are providing these translations.

[00:10:17] But, you know, there's always going to be a translation that may not be exactly colloquially right.

[00:10:21] It may not pick up the latest, you know, again, like bringing up my kids.

[00:10:25] They use language that I'm like trying to figure out, like, what did you actually mean when you use that word?

[00:10:30] Right.

[00:10:30] Are you asking meta AI to help you translate the latest slang?

[00:10:34] I totally am.

[00:10:36] And I try to get meta AI to make lame dad jokes.

[00:10:40] Maybe that's the other use case I should have told you about.

[00:10:43] But that's the kind of like cultural zeitgeist that you really want to be able to tap into.

[00:10:48] And that happens by us getting feedback from our users in real time and then improving our products.

[00:10:54] You know, so I have to ask sort of related to that, because meta AI is so front and center for everyone to use in the meta ecosystem.

[00:11:01] How does that change the expectations your team is under to make sure the system can support these like billion user first party use cases versus like this, you know, thousand flowers blooming of third party use cases and interests out there?

[00:11:16] Like to put it more crisply, how do you think about prioritizing these first party and third party use cases for Llama?

[00:11:21] Yeah, that's a great question.

[00:11:24] And as someone that is responsible for sort of the roadmap for Llama, it is something that is very top of mind because, you know, with Llama, we have at the highest level three goals as a team that we try to pursue.

[00:11:39] First and foremost is meta as a company being at the forefront and trying to be the leader in getting making progress towards AGI.

[00:11:45] And so Llama is our vehicle towards making progress towards AGI.

[00:11:49] Now, that's a long term goal.

[00:11:51] It is a long term goal.

[00:11:52] How does meta think about AGI?

[00:11:54] Yeah, good question.

[00:11:55] And I kind of walked myself into that, didn't I?

[00:11:57] So AGI, you know, as people might know, is artificial general intelligence.

[00:12:02] And there isn't really like a set definition for what AGI is across the industry.

[00:12:07] For us at Meta, what we think about are systems and artificially intelligent systems that are able to perform at superhuman level capabilities in helping humans stay more connected and providing more utility and value for them.

[00:12:20] So we imagine this to be a state where humans and these artificial intelligence systems and agents are working closely together to further entertainment, to further social utility, to further economic prosperity.

[00:12:33] So that's sort of like the vision that we have from Meta's perspective.

[00:12:37] And so a big focus for us in the long run as a company and also as a Llama team is how do we keep steadily moving and making progress towards AGI?

[00:12:47] Second goal that we think about is how do we make sure that on an ongoing basis, we deliver the best possible capabilities of the Llama model to the vast user base that is served by Meta?

[00:13:04] And so this is where your question around, okay, how do you make sure that you prioritize what your product team needs really comes into play?

[00:13:11] Because when we charted our path towards AGI, that can be more of a research roadmap.

[00:13:15] But when you then say, okay, what is the milestone that you want to be able to hit?

[00:13:19] Not only do you have a research milestone in your head, you're also starting to think about what are the products that we want to be able to ship across software and hardware?

[00:13:26] And then the third goal that we talked about, as you know, Llama is open source.

[00:13:30] And so a big responsibility for us is also then to understand from the community's perspective, what is it that they're going to need so that we can pack all of these across a set of prioritization criteria that we as a team, as a Llama team, have to think about.

[00:13:45] And then you try and come up with a roadmap.

[00:13:48] So that's sort of the three vectors, if you will, that we think about when we set goals for Llama.

[00:13:53] You know, it's interesting. I mean, there are companies that have raised, you know, maybe a billion dollar seed round to go headfirst towards AGI and some sort of superintelligence.

[00:14:00] Does that dual focus of having this research agenda, but also regularly shipping product, does that create pressure or is it exciting?

[00:14:08] I imagine it's kind of like there are two of these opposing constraints that you need to balance.

[00:14:13] What does that feel like in practice?

[00:14:15] So actually, we don't look at this as opposing constraints in a lot of ways.

[00:14:20] In many ways, it's almost proof points and really important milestones to say, okay, the progress that you're making from a pure research perspective is also delivering value to humanity as you make this progress.

[00:14:34] And so in a lot of ways, we look at this as exactly the tension we need to be able to have.

[00:14:38] Because if you think about how technology has historically evolved and from a product perspective, you start by saying, what is a consumer problem that exists in the world today?

[00:14:49] And can I invent technology to be able to go solve that problem?

[00:14:51] That's sort of the traditional way to do product.

[00:14:53] That's right. Yeah.

[00:14:54] The other way to do product, which is, you know, anytime you have one of these platform ships, it's like, here comes an amazing new technical capability.

[00:15:02] Does that allow you to solve a problem that you previously were not able to solve?

[00:15:05] Or does it actually open up completely new opportunities for you to go and serve people that you were not able to do before?

[00:15:10] And so with Lama, we actually have the opportunity to be able to do both.

[00:15:15] And so that is the balance that we need to be able to strike.

[00:15:17] So when we think about the next versions of Lama, we think about, okay, Lama used to be only text.

[00:15:23] Now, Lama needs to become multimodal.

[00:15:27] Because as you think about progress towards AGI, these models need to be understand and communicate in all of the modalities that we as humans are able to.

[00:15:34] We don't just use text.

[00:15:35] You know, we use images.

[00:15:36] We use videos.

[00:15:37] You're talking now.

[00:15:38] There's audio, right?

[00:15:39] So then you start to say, cool.

[00:15:42] What do our product teams think they need?

[00:15:45] Do they think, like, image understanding is going to be more important than video understanding for some reason?

[00:15:51] So if that's the case, then let's actually figure out how you construct this milestone that balances both what we think we can get in the hands of consumers through our products and make progress towards AGI.

[00:16:00] So that's sort of the balance that you have to do with every release.

[00:16:03] So let's take a step back for a second here.

[00:16:05] Meta AI is built on Meta's own large language model called Lama.

[00:16:09] Why is it important for Meta to build its own foundational models rather than, say, partnering with another tech company?

[00:16:15] Like, for example, what Apple is doing with OpenAI and Google.

[00:16:18] They seem to be building their own smaller models, but they still seem to be leaning very heavily on third-party providers.

[00:16:24] But Meta is building the full stack.

[00:16:26] Why?

[00:16:27] Really good question.

[00:16:29] First and foremost, if you think about this notion of artificial general intelligence and large language models being the vehicle through which you're going to experience artificial general intelligence,

[00:16:40] you get to make a lot of choices in terms of what goes into these capabilities, what goes into these models, the kinds of capabilities that you actually want to build into them.

[00:16:50] Do you want them to understand multiple languages or do you want them to only understand, say, a handful of languages, just as an example?

[00:16:58] Do you want them to be able to speak to you?

[00:17:01] Do you want them to be able to speak to you using your own dialects?

[00:17:04] So, as you start to think about the kinds of utility that this technology needs to provide, there are a lot of choices that you end up making where, as an organization, as a company like Meta,

[00:17:18] where the surface area of our product base and our user base is so vast, you have to have the ability to really shape how this technology moves

[00:17:27] and also the rate at which the technology moves as well as the prioritization of a lot of these capabilities at which this technology moves.

[00:17:33] So, that's sort of the first reason.

[00:17:35] The second reason is when you think about how technology ecosystems generally have evolved,

[00:17:43] they start with maybe one or two closed proprietary vertically integrated ecosystems

[00:17:51] and then there usually emerges an open alternative, which over time ends up becoming by the community, for the community,

[00:18:00] and then the default ecosystem that a lot of people end up adopting, right?

[00:18:05] So, we saw this even with the web.

[00:18:06] So, you had like, you know, closed versions of browsers and then you had open source browsers that came up and everyone started using the open source browsers.

[00:18:13] And this technology is really powerful and it's something that is going to be very valuable to our consumers and we need to be able to shape how it evolves.

[00:18:22] And we have the strong belief that open ecosystems eventually end up winning and you have an opportunity now to seed and nurture an open ecosystem.

[00:18:30] Then, well, we have to invest in this and we have to do this as an open ecosystem.

[00:18:33] So, that's why the choice of us investing in Lama and building Lama and then open sourcing, this basically became a no-brainer for us.

[00:18:41] Can we talk a little bit about the open sourcing of Lama there, right?

[00:18:44] Because as I understand it, the first rendition of Lama, like the code in classic internet fashion, was leaked online via BitTorrent, no less.

[00:18:53] And then the next release for Meta formally was released as open source.

[00:18:56] I'm kind of curious, like how much of a debate did that spark within Meta?

[00:19:00] Like clearly there was intention to make the initial version of Lama available to researchers, but were you all already contemplating this broader open release?

[00:19:08] Or was there a moment where the rubber met the road and you saw how people were responding to it that you went all in on this open source approach?

[00:19:18] Yeah. So, I have to preface this by saying I wasn't here when some of these decisions happened, but I obviously spent a lot of time talking to the people that were involved in this.

[00:19:28] And so, let me at least walk you through the philosophy and the thinking behind this.

[00:19:32] Meta has had a long history of doing AI research.

[00:19:35] And so, what we wanted to do was to say, okay, in traditional fashion, let's make sure that the research community has access to these models.

[00:19:43] And here's the paper. Let's see what they do with it.

[00:19:45] But we did not anticipate that not only was there like an amazing amount of interest from the research community, but the developer community was like, oh my gosh, this thing is amazing.

[00:19:55] Like, why is this only available to researchers? We want to be able to build on top of that.

[00:19:59] And so, that feedback, I think, is not something that you could have predicted.

[00:20:03] But as soon as we saw that, what we were able to do then to say is like, okay, there is a clear opportunity here and a really important role for Meta to be able to play here.

[00:20:11] Which is why the second version of Lama, Lama 2, we ended up offering it under an open source license and the rest is history.

[00:20:17] There is also what I call a form factor side.

[00:20:21] If you think about how these large language models have evolved, they have primarily been driven by companies that have large cloud-based infrastructures and cloud-based businesses.

[00:20:31] As a result of that, a lot of these models are available behind APIs, which works for a vast number of use cases.

[00:20:40] Obviously, you know, Lama is also available via the cloud, so you can use that.

[00:20:43] But what makes Meta really different is most of our users are using mobile phones.

[00:20:51] A lot of our developers want to be able to build mobile apps.

[00:20:54] And mobile apps means sometimes you're going to be in spotty internet connection.

[00:20:58] So you need a solution that works even if you're not able to talk to the cloud.

[00:21:03] Sometimes you're working on really private data that is only on your phone and you don't really want it to leave your phone.

[00:21:08] You don't want to send that to the cloud, even if your API provider says, I'm not going to look at anything.

[00:21:12] You don't want that data to go to the cloud, right?

[00:21:15] And so a big piece of our strategy this year was also to say, how do we meet developers where they are?

[00:21:21] So that's why, if you think about what we did with Lama this year, we built the 405B class model, which is the largest, most capable model that you can use behind cloud APIs.

[00:21:33] And then we also released the 8B and the 70B initially, and then the 11B and 90B models, which are sort of the daily workhorse type of like models that most developers would want to use for their production applications.

[00:21:45] And then we also released the 1B and 3B parameter models.

[00:21:50] So these models are super lightweight, still pack a massive punch, and can answer a lot of use cases that you as a developer might want to do for offline users, for private use, on your phone or on your laptop.

[00:22:02] And so that's sort of been our philosophy for how we thought about Lama.

[00:22:05] I think this like plurality of model sizes bit is very interesting, especially how it's evolving in the ecosystem.

[00:22:11] You're totally right. Most of the chat apps that people might be using, obviously it's like taxing some beefy NVIDIA GPU in the cloud, you know, giving your answer back.

[00:22:20] But these smaller models that you alluded to, the 1B and 3B parameter models, like you can run it on any phone, in your freaking browser for crying out loud.

[00:22:28] Like that's just kind of mind-blowing.

[00:22:30] Like are there interesting use cases you're seeing there around summarization and things like that, that it's like really making a dent in?

[00:22:38] It's so interesting you mentioned this because it's exactly the use case that we talked about with the team when we were talking about building these 1B and 3B models.

[00:22:46] So we've actually seen developers running the 3B 100% locally on a browser.

[00:22:51] We've also seen a developer who's connected his iMessages on their laptop to Llama 3 and was basically prompting the model to answer questions about anything in their text.

[00:23:02] Perfect use case. Perfect use case because it's private, it's secure, nothing has to ever leave your laptop, and you can actually build this when you're on an airplane, in airplane mode, right?

[00:23:11] Totally.

[00:23:12] These kinds of use cases are exactly what we imagined developers are going to build, and this is just the first two days.

[00:23:17] So I'm really excited to see what people end up doing with these things.

[00:23:21] What's the benefit in that situation of going with something like Llama?

[00:23:24] Is it that they can like fine-tune Llama models on their private code repository?

[00:23:29] Why use Llama in this scenario versus like some other third-party offering?

[00:23:33] Yeah, really good question, and it's exactly what you said.

[00:23:36] So oftentimes software and your code repository is also one of your biggest pieces of intellectual property.

[00:23:43] And so you want to have a lot of control.

[00:23:45] You want to have a lot of security and privacy processes around this.

[00:23:48] And so being able to bring into your enterprise and to your like on-prem sort of deployments a model like Llama is something that you can't do when you're just dealing with like an external API provider.

[00:24:00] And so for a lot of industry verticals where code is very, very important and they may also be regulated, Llama is perhaps the best choice for them to be able to deploy this on their code base.

[00:24:10] So that's one of the biggest reasons.

[00:24:12] Awesome.

[00:24:12] So that was the coding example.

[00:24:13] The second example is something that we call retrieval augmented generation or a RAG.

[00:24:21] And so this one is really around this notion of a lot of enterprises have what they call enterprise knowledge bases.

[00:24:30] So this is, you know.

[00:24:31] Are we talking about SharePoint?

[00:24:33] We're kind of talking about SharePoint.

[00:24:34] You're talking about like, you know, internal portals, right?

[00:24:37] Like all of these deployments where people write their docs and then there's probably some version of a wiki internally, which is where you have to go and look up your like benefits information.

[00:24:47] When you can take like, you know, vacation, what's a vacation policy?

[00:24:49] See, so pretty much like any enterprise of like large, you know, number of employees has one of these, has multiple ones of these.

[00:24:58] And then if you're a new employee who's coming in, your onboarding process typically involves at least a week's worth of training for you to know like where to go and ask for information, right?

[00:25:08] And so now with these large language models, you can basically just use them to understand the knowledge that is dispersed across all of these different, you know, installations, SharePoint, wikis, and whatnot.

[00:25:19] And then you have essentially a chatbot that is available for you that is trained on just your enterprise's knowledge and is able to help answer any questions that your employees might have.

[00:25:28] So for an employee productivity perspective, this is a huge boon.

[00:25:32] And we use, you know, we have a version internally that we call MetaMate that we have deployed again on top of Llama that is essentially a daily driver for pretty much like all of our employees, whether you're writing code or they're trying to find out like when I can take my next vacation.

[00:25:47] Many enterprises have data and data strategies and over the years have essentially accumulated a bunch of really proprietary high-value data assets that they store.

[00:26:01] They're not able to take advantage of this by just tapping into an existing large language model because these large language models have been trained on what we call the consumer internet.

[00:26:10] And so this is the general knowledge of the world that you said.

[00:26:13] Exactly.

[00:26:14] So they can give you answers that are like sort of based on, again, like I said, like Wikipedia and the consumer internet, but not really like on your proprietary information.

[00:26:20] So let's assume you're doing drug discovery research, right?

[00:26:25] And so you have a lot of like really high-value proprietary data set.

[00:26:29] You now have to somehow train this model to also understand concepts within drug discovery.

[00:26:35] What does it mean to understand a protein sequence?

[00:26:39] What does it mean to understand like what isotopes are, right?

[00:26:42] These are concepts that the model may understand at a very high level, but it's probably not going to be as localized.

[00:26:48] And so then what you need to do is to go through this process of what we call fine-tuning, where you take a general-purpose model and then you give it data that is your proprietary data to teach it these concepts so that it can start to perform the tasks that you need for your application.

[00:27:05] You still want the benefit of this large general-purpose model, but you also want to understand these concepts.

[00:27:10] That's something that is now going to require you to tweak the weights of the model.

[00:27:15] Weights are basically how the model makes decisions on like, you know, which answers to give you, for lack of a better way to describe it.

[00:27:22] And so how do you then like tweak the weights of this model?

[00:27:25] For that, you now need access to sort of the internal guts and the engines that you don't typically get if you don't have access to the weights of the teacher model, right?

[00:27:38] And so that's a really, really important use case where we think about, okay, now you have the power of a 405b model and that can now teach and distill a very special purpose customized model for you.

[00:27:51] So that's one example.

[00:27:53] Another example is what we call flexibility, right?

[00:27:56] So as developers, especially if you're, you know, scaling out your applications, you soon get to a stage where there are some classes of prompts that you just want a quick response.

[00:28:08] What's the weather today?

[00:28:08] Right?

[00:28:09] Like stuff like that.

[00:28:10] Yeah.

[00:28:10] Stuff that doesn't require a lot of thinking.

[00:28:12] Let's put it that way.

[00:28:13] Yeah.

[00:28:13] Exactly.

[00:28:14] And then there's probably a class of prompts where the user is asking a really hard question, but you want to tap in to the full capacity of this really powerful model.

[00:28:24] And having the ability to then be able to pick and choose how you wrote your queries, which model you're going to target, and then maybe even have this large model distill a version of a model that is just for you.

[00:28:36] So that becomes like your workhorse, you know, daily driver for your use cases.

[00:28:41] That kind of flexibility requires you again to have access to the model rates, to be able to have access to an entire ecosystem that is providing the tooling and the infrastructure layer.

[00:28:50] And you just can't get to that level of control without actually having an open source ecosystem to build on top of.

[00:28:56] You know, this idea of the really large, like four or five billion parameter models, one trillion parameter models, like, you know, essentially teaching and distilling their wisdom into smaller models for the tasks you're going to hammer it with is very, very exciting.

[00:29:28] And it perhaps brings up the question, which has been a common criticism of like Meta's open source efforts, which is like, is it really open source debate, right?

[00:29:37] Whereas perhaps there's a spectrum of open source and it seems what Meta is offering right now are open weights models.

[00:29:43] Y'all opened up a bunch of restrictions too recently, enabling you to use, you know, these bigger models as teacher models.

[00:29:50] What does Meta mean by open source in this case?

[00:29:53] Look, this is obviously an active discussion, active conversation in the broader community.

[00:30:01] I think this notion and this idea of open source also has gone through its own sort of like evolutions and changes, right?

[00:30:08] And so I think we are at a moment right now where there's a completely new type of technology.

[00:30:13] Open source in many ways was defined with the notion of software in mind.

[00:30:18] Models are, and these like large language models and the systems around them are a composition of software data and this like other entity or artifact that's basically just a bunch of weights.

[00:30:30] And so I think one of the things that we as a community have to now figure out is like, what is open source in this new world actually mean?

[00:30:38] And I think over time we're going to figure these things out.

[00:30:41] We also have different flavors of this that have happened in the past.

[00:30:46] When you think about content as an example, we tried to apply this notion of like open source to content, but it didn't quite fit.

[00:30:52] And so then we came up with creative commons, right?

[00:30:54] So creative commons, now if you think about this, is equivalent to open source, but it is very in tune with the notion that content is just inherently different from software.

[00:31:06] And so you probably need a different way to say, okay, this is for the community.

[00:31:10] This is by the community.

[00:31:11] Here's what you can use it for.

[00:31:12] Here's what if you use it, here's how you attribute it.

[00:31:15] So my expectation is that as a community, we're going to come together again, as we always do and say, okay, this thing is different.

[00:31:21] So for this different thing, what are the values?

[00:31:25] What are the principles that we want to be able to protect?

[00:31:27] And what are the definitions that allow us to do that for right now, at least the ability for you to have access to the model weights?

[00:31:33] And then our definition of things like the Lama stack that give you a very open API that allow you to expand.

[00:31:39] And as these conversations happen, you can believe that we're going to be at the middle of this and we're going to try and shape and evolve this as it goes forward.

[00:31:46] It's a tremendous, like I would say it's like this public commons that Meta is giving to the world.

[00:31:51] But of course, there's a cost associated with that.

[00:31:53] And while Meta hasn't disclosed any costs, it's been reported that the amount of GPUs that y'all have used for AI development and training, like in post-training and fine-tuning, all that adds up to certainly hundreds of millions of dollars of investment.

[00:32:07] What is the amount of computation power here even look like?

[00:32:10] Like, can you paint a picture for us of what the back end of a project this size physically looks like?

[00:32:16] I'm imagining a Borg cube in some metadata center.

[00:32:20] I think we've talked about some of the numbers in the past in terms of the compute capacity that it takes to train these kinds of models.

[00:32:28] The thing I want to maybe like back up and talk about is when you talk about training these large language models, the pre-training stage, which is really the stage at which you pack a lot of knowledge in and produce the first version of this model, which is more generalized but not tuned for that.

[00:32:45] That's the stage that's the most expensive part of it.

[00:32:48] After that, yes, you still require GPUs, but it's nowhere in comparison to the scale of like compute infrastructure that you need for the pre-training stage.

[00:32:57] And so that's why you would see only very few organizations in the world who end up doing pre-training.

[00:33:04] And Meta is the only organization that pre-trains and open sources this.

[00:33:09] And then the reason I want to like hit this is because I think it's really important to say if very few companies have the ability to do this and all but one are going to keep it closed, but there is one that is actually like trying to do this and open source that.

[00:33:21] I think that's a pretty big deal.

[00:33:23] So when we think about the investment that we put behind and how large these like compute infrastructures are, that calculus also goes into sort of like why we think this is the right thing to do.

[00:33:32] Because not only are we building this for Meta, but also building this in many ways for open sourcing and having the community access.

[00:33:37] So it's now to your question of like, what do these data checkers look like?

[00:33:43] They're massive.

[00:33:44] I don't even know how to explain them because you're now looking at, you know, tens of thousands of GPUs.

[00:33:52] And then Lama 3, I believe we trained on the order of tens of thousands of GPUs.

[00:33:57] 24,000 H100s is the stat that I have in front of me.

[00:34:01] Great.

[00:34:02] Lama 4, likely going to be, you know, an order of magnitude more, right?

[00:34:09] So that's pretty large.

[00:34:10] So if you think of about 24,000, I know you have like 100,000-ish or maybe even more than 100,000 GPUs.

[00:34:17] Do they even fit into one data center?

[00:34:19] Because you really do want them like to be as close as possible to help with the training efficiency.

[00:34:24] Have really fast interconnects, right?

[00:34:26] Exactly.

[00:34:26] And now this is not just a hardware problem.

[00:34:30] It's a physical infrastructure problem because you now have to construct data centers.

[00:34:34] You now have to find a way to power them.

[00:34:37] And then you have to find a way to cool them.

[00:34:39] Exactly.

[00:34:39] Yeah.

[00:34:40] But I have to ask, with this level of investment, right?

[00:34:42] And it is like admirable.

[00:34:44] It's also, you're totally right.

[00:34:45] You're the only lab that's really open sourcing these very large expensive training runs.

[00:34:51] How does a company measure the ROI, the return on investment for these open source AI initiatives?

[00:34:56] So we fundamentally just believe that open sourcing is good for developers.

[00:35:02] Obviously, we've covered that in spades.

[00:35:03] Open sourcing is also good for Lama and for Meta because we get a lot of contributions back.

[00:35:08] The amount of, just from a hardware efficiency perspective, being able to not just train these models.

[00:35:14] Now you actually have to run inference on them, which is how you deploy these models.

[00:35:18] The optimizations that you have to do for various types of hardware, there's a huge amount of community contribution that comes in as part of that.

[00:35:26] And then there's an entire tool chain that builds on top of this, which means if you have to now find a way to connect Lama to some obscure database or even for something that Meta is going to need,

[00:35:38] there's probably someone out there in the community that also has a similar need.

[00:35:41] And because Lama is extensible and Lama is open, they've already built an implementation, which means we can just bring that in-house, right?

[00:35:46] So this isn't like only altruistic.

[00:35:49] We know that once you open source this kind of technology, the benefits will accrue for the community and for Meta.

[00:35:54] There's an interesting memo that leaked last year from a Google employee that basically suggested Lama was eating their lunch.

[00:36:01] It said like, Google has no moat, neither does OpenAI.

[00:36:05] And, you know, I'm kind of curious in this world, some of your competitors are obviously seeing the gains you're making with open source

[00:36:12] and are starting to selectively release smaller open source models.

[00:36:15] Like Google with Gemma, for example.

[00:36:18] What does the future of this AI race look like to you?

[00:36:21] Is it simply a race to the bottom?

[00:36:23] I think the way I think about this is you can either fully commit to open source or not.

[00:36:31] And if you fully commit to open source, then that becomes a set of choices that you end up making that inform everything from like, you know, your data strategy to your infrastructure strategy to your release strategy.

[00:36:41] But if you're like, I kind of want to like dab my toes a little bit in the water, but I don't really want to get into the water.

[00:36:47] Then you're neither here nor there.

[00:36:49] And you end up like not actually playing a game that you're playing.

[00:36:54] You're just like trying to somehow say, well, look, I also have like a token open source like offering here.

[00:36:58] Right.

[00:36:58] The important thing is a lot of the community sees through that because eventually with open source software and with open source like systems, you want the confidence that this is going to be an enduring commitment.

[00:37:12] Maybe you're like spending your nights and weekends as a hobbyist building an open source tool that builds on top of Llama.

[00:37:19] Right.

[00:37:19] You want to know that Llama is going to be around for a while.

[00:37:22] So I think that type of like commitment is hard when it's not core to your strategy.

[00:37:29] So I do think that from a cloud based perspective, you've already seen this.

[00:37:34] The impact of Llama has meant that the cost of inference just keeps getting slashed because it's really hard to compete.

[00:37:42] When you have a high quality model like Llama that is open source and it's like readily available, then what do you do?

[00:37:49] Right. And so our goal with Llama, as I said right at the beginning, is we want to be able to get to AGI.

[00:37:54] We want Llama to be the best and Llama is going to stay open.

[00:37:57] And so that's like our long term strategy.

[00:38:00] What other competitors do with that, I think, is a question that's best asked to them.

[00:38:05] Now, embracing open source creates a situation where you have a distribution model with no centralized authority.

[00:38:11] How much do you think embracing open source is about sort of being a step ahead of any potential regulatory oversight that might come down the road?

[00:38:19] Obviously, this is something like governments across the world are grappling with.

[00:38:22] The UN just released their report.

[00:38:24] Doesn't open source make this murkier to oversee and administer?

[00:38:28] In some ways, it's actually the opposite because there isn't really, again, like anyone else who's doing open sourcing at our scale,

[00:38:36] which then means if you're a government and you're now thinking about, okay, how should I think about maybe I have my own national like LLM, sovereign LLM needs?

[00:38:47] Who do I go talk to?

[00:38:49] I can go talk to all of these people who are like building proprietary models, but maybe I want to control this.

[00:38:53] And historically, I've deployed Linux inside.

[00:38:56] So I understand what it means to work in an open source community.

[00:38:59] Who's the only player that is committed long term to building this in an open source manner?

[00:39:05] So you end up actually end up having really interesting and important conversations with Meta because we are trying to be responsible stewards of LLM as an open source project.

[00:39:13] So in a lot of ways, I actually feel like us open sourcing LLM puts us in a position where we're able to educate policymakers.

[00:39:20] We are able to engage with them and help them understand the value of why technology like this should continue to stay open and not just be proprietary.

[00:39:29] Can you just distill down sort of the various measures that are in place to prevent the misuse of models like LLM?

[00:39:36] There are probably three dimensions in which we think about safety.

[00:39:40] The first dimension is really around the choice and the control that we want to be able to give developers to do the right thing.

[00:39:50] The second dimension is safety is a system problem and it's not just a model problem.

[00:39:55] And so we approach it from that lens.

[00:39:57] And then the third one for us is safety is an end-to-end and an ongoing process.

[00:40:03] So we start all the way from, you know, when you're starting to think about what is LLM 4 going to look like?

[00:40:08] It's at that stage of planning all the way through the development process of the model to its integration into our products to when it finally ships in our consumer products.

[00:40:19] So this is one of those things that you don't just bolt on at the end.

[00:40:22] You kind of have to be like very holistically thinking about this.

[00:40:25] Maybe I'll zone in on the second one because I think that's probably the most critical choice that we have made here.

[00:40:31] is this notion of saying these large language models are powerful.

[00:40:36] And so trying to pack all of that safety into just the model is just going to be incredibly difficult.

[00:40:42] And it's going to make the model very hard to be flexible and steerable for the types of use cases that you want developers to be able to do.

[00:40:49] So instead, we bake a fair amount of security and safety into the core model itself.

[00:40:54] But then we also release what we call llama guard systems.

[00:40:58] There are a bunch of them.

[00:41:00] I'll pick one.

[00:41:01] There's an input and an output llama guard.

[00:41:03] Let's say you're a developer and you're building an app that's aimed at college tutoring, right?

[00:41:12] You want the model to be able to provide a certain type of responses because you're dealing with like young people.

[00:41:18] And you want the model to be able to address that.

[00:41:20] Let's say you're another developer who's building a college dating app.

[00:41:25] Then you want the tone and the responses of the models to be able to serve that need as well, right?

[00:41:30] You can't bake all of this into just the model and say, okay, you're going to do this.

[00:41:33] So instead, what we do is to actually give you these input and output filters so that based on your use case,

[00:41:38] you can then say, okay, what are the bars and the guardrails that I want to be able to set within my context?

[00:41:44] And so every release of llama comes with not only this, we also have cybersecurity evals.

[00:41:50] So all of the core set of use cases where you want the model to be useful,

[00:41:54] but at the same time, you want to be able to protect it.

[00:41:56] We give you the tools and the systems.

[00:41:58] And because llama also has a very rich ecosystem, we work with a lot of cloud service providers,

[00:42:03] which is typically how a lot of developers experience and build on top of llama.

[00:42:06] They also have access to this.

[00:42:08] So they also deploy them.

[00:42:09] And so that approach, I think, again, makes it very unique because it's only something that I think you can do in this kind of an open source model

[00:42:17] where it's not just an API that is making a choice.

[00:42:20] You actually get to make the right set of choices for your use case and for your user base, right?

[00:42:25] So that's sort of the approach that we take.

[00:42:27] Yeah, I think it makes a ton of sense, right?

[00:42:29] You've got this like fungible intelligence,

[00:42:31] but then you've got these other primitives around it that filter the inputs and the outputs coming from it

[00:42:35] where you can exert control and lay those guardrails as you outlined.

[00:42:39] I have to ask you maybe a follow-up question to that,

[00:42:44] which is in the background of this entire conversation, right?

[00:42:47] There's the fact that in the past, Meta has been challenged for its business practices, right?

[00:42:52] Like it's been accused of everything from freaking influencing elections, inappropriately using personal data,

[00:42:58] spreading misinfo, disinfo, all while pushing engagement to draw and add dollars.

[00:43:03] I think it's fair to say that some folks have expressed suspicions about Meta's AI ambitions.

[00:43:10] What's your answer to anyone who might ask this question, which is like,

[00:43:13] why should we trust Meta with our AI future?

[00:43:15] Yeah, look, this is an important question.

[00:43:17] And it's something that we obviously take very seriously because ultimately I think all of us who are at Meta believe that

[00:43:24] when you build compelling product experiences and you build compelling product experiences

[00:43:29] that give people the choice and control, you earn their trust.

[00:43:33] Without their trust, you're not going to be able to build something that is going to be enduring

[00:43:37] and it's not going to be something that people keep coming back to.

[00:43:41] Obviously, we've made our fair share of mistakes, but we've also worked through them.

[00:43:44] And so when we think about the investments that we make in AI, the investments that we make in Gen AI in particular,

[00:43:51] we pay a lot of attention to making sure that we follow best practices.

[00:43:55] And with the ability for Lama itself to be open source,

[00:43:59] we also now have the community having access to how we're building Lama, how we're deploying Lama.

[00:44:05] And then that then should ideally build back more trust in Meta itself as a player in the community

[00:44:11] that is committed to making sure that there is transparency, there's a lot more choice,

[00:44:14] there's a lot more control over your experience.

[00:44:16] Because ultimately, trust is going to be first and foremost when you use these AI applications

[00:44:21] because they're going to be powerful and you want them to be trustworthy.

[00:44:25] As you think about the future iterations of models,

[00:44:27] what comes to your mind as this sort of like North Star use case for where you're taking Lama?

[00:44:33] Yeah. Look, this is a hard question because my North Star use case,

[00:44:39] if you'd asked me like two weeks ago, would be,

[00:44:42] hey, can my kids talk to my mom in a language that each of them understand?

[00:44:47] And now we have that, right?

[00:44:48] And the rate at which this technology is evolving,

[00:44:50] I almost feel like my North Star use cases have to be...

[00:44:53] They're more like URSA minor use cases or something.

[00:44:55] Yeah.

[00:44:56] Exactly. Exactly.

[00:44:57] Because I actually do think there are a lot of things that we showed even in some of our Orion demos

[00:45:04] where the seamless integration of augmented reality and these glasses with AI

[00:45:10] is really going to create things that we've only seen in movies.

[00:45:14] So to be able to have like this kind of like holographic like conversation with someone,

[00:45:17] it's still only stuff that's made of movies, right?

[00:45:20] And so I'm like, okay, like that would be really cool.

[00:45:23] So those are kinds of maybe the more, not just two week, like North Stars,

[00:45:29] but the, you know, the longer term North Stars.

[00:45:31] I think that it's just going to be such a huge advantage for y'all.

[00:45:34] Having these like billion user surfaces where people are going to be engaging with this stuff every day.

[00:45:38] And then this thriving open source community with enterprise partners, but also indie hackers.

[00:45:43] People listening to this may not realize how easy it is to just go download LM Studio

[00:45:46] and start running Llama locally.

[00:45:49] And it's kind of wild, like things that felt so out of reach just a year ago are like incredibly accessible

[00:45:54] and running on my freaking MacBook Pro.

[00:45:57] I want to ask you like, given Meta's really unique position at this intersection of social media,

[00:46:02] open source AI now, and really global communications,

[00:46:06] what are you excited for over the next decade?

[00:46:08] And what keeps you up at night?

[00:46:11] Yeah.

[00:46:12] What am I excited about and what keeps me up at night?

[00:46:14] There may be just two sides of the same coin as these things tend to go, right?

[00:46:20] What I'm excited about is when you think about Llama and Llama's own journey,

[00:46:26] we think about this in terms of how do you build a system that is capable of speaking

[00:46:36] or understanding all of the modalities in which humans understand.

[00:46:40] So sort of like this universal set of modalities start with like text, images, videos,

[00:46:46] and who knows like what else, you know, even being able to understand a reel as a native format.

[00:46:51] Humans come up with like new media of communication.

[00:46:54] So you want these models to be able to understand at that level

[00:46:58] and to be able to generate content at that level.

[00:47:00] So that's sort of like one dimension.

[00:47:02] The second dimension is this technology is going to have the capacity to think about and reason about

[00:47:11] and plan about things the same way humans do.

[00:47:15] And we're still at the very early stages of what these models can do.

[00:47:19] And so over time, they're going to have that capacity.

[00:47:22] And then the third piece is a lot of the models today are really focused on what I would call static generation.

[00:47:30] So you give it a prompt, it gives you back some content.

[00:47:34] You have to go do something with that.

[00:47:35] But over time, they should be able to act on your behalf.

[00:47:40] And this action can not only be on the digital domain where you're able to entrust them to go take care of,

[00:47:49] hey, you know, my daughter really wants to go to the Taylor Swift concert.

[00:47:53] I have to keep hitting refresh when the tickets go online.

[00:47:57] Can you just like take care of that for me?

[00:48:00] Right.

[00:48:00] Going all the way to then being able to take actions in the real world, in the physical world.

[00:48:06] So you combine all of these, being able to understand and communicate in any medium,

[00:48:12] being able to have the full capacity of the human brain in terms of, you know,

[00:48:16] being able to do long-term planning and reasoning,

[00:48:18] and then to be able to act, that is incredibly powerful technology.

[00:48:22] And the flip side of this is you want to make sure that this technology is developed with the right set of connectivity to what humanity's needs are.

[00:48:33] And as you build it, you're thinking about things like responsibility and safety.

[00:48:38] And you make sure that you're putting the right set of guardrails around it so that you evolve the technology in lockstep

[00:48:44] with what we as humans would want to experience on a day-to-day basis.

[00:48:47] And for me, especially, given that Lama is at the forefront of all of this,

[00:48:52] and our team is responsible for bringing this technology to the world,

[00:48:55] dude, that's what keeps me up at night.

[00:49:02] All right.

[00:49:03] So it's encouraging to hear that one of the key people involved in developing and releasing Lama to the world

[00:49:08] spends a lot of time thinking about how to build a system that is responsible and safe.

[00:49:12] Because I find it exciting that a company like Meta is giving away a largely open model.

[00:49:17] It's a crowded space.

[00:49:19] And with closed-source models from Google, OpenAI, Microsoft, and Anthropic vying for the top spot,

[00:49:25] Meta's openness is refreshing.

[00:49:27] It gives an enterprise that doesn't want to give their proprietary data away a path to a custom-built and owned solution.

[00:49:33] And it also gives scrappy indie hackers building a small product on a mobile device

[00:49:38] a path that won't bury them in massive amounts of GPU costs.

[00:49:42] This openness, combined with our vast user base across WhatsApp, Messenger, Instagram, and Facebook,

[00:49:49] positions Meta uniquely to build this seamless bridge between the physical and digital worlds.

[00:49:55] In fact, Meta is very much at work on this augmented reality future.

[00:49:59] The company revealed a pretty advanced, if still pretty clunky-looking pair of AR glasses

[00:50:05] that could very well be where all of this is headed next.

[00:50:08] But we're still years away from walking around with Tony Stark Iron Man-style heads-up displays.

[00:50:14] And in the meantime, while we should always hold Meta accountable,

[00:50:19] their moves suggest a commitment to the public good.

[00:50:22] This balance of caution and optimism is crucial.

[00:50:25] As we question Meta's intentions,

[00:50:27] let's also acknowledge the potential benefits of their open approach.

[00:50:31] It's a very nuanced perspective, but one worth considering.

[00:50:34] Meta's openness might actually be the key to unlocking a more inclusive, community-driven future.

[00:50:40] One where AI enhances our lives without sacrificing our agency.

[00:50:47] The TED AI Show is a part of the TED Audio Collective

[00:50:50] and is produced by TED with Cosmic Standard.

[00:50:53] Our producers are Dominic Girard and Alex Higgins.

[00:50:57] Our editor is Banban Cheng.

[00:50:59] Our showrunner is Ivana Tucker.

[00:51:02] And our engineer is Asia Pilar Simpson.

[00:51:04] Our technical director is Jacob Winnick.

[00:51:07] And our executive producer is Eliza Smith.

[00:51:09] Our researcher and fact-checker is Christian Aparthe.

[00:51:12] And I'm your host, Bilal Sidhu.

[00:51:15] See y'all in the next one.

[00:51:16] Bye.

The TED AI Show: How Meta wants to shape our digital future with open source AI w/ Ragavan Srinivasan

Search Episodes