The next grand challenge for AI | Jim Fan
TED TechMarch 29, 202411:3010.54 MB

The next grand challenge for AI | Jim Fan

Researcher Jim Fan presents the next grand challenge in the quest for AI: the "foundation agent," which would seamlessly operate across both the virtual and physical worlds. He explains how this technology could fundamentally change our lives — permeating everything from video games and metaverses to drones and humanoid robots — and explores how a single model could master skills across these different realities.

Learn more about our flagship conference happening this April at attend.ted.com/podcast


Hosted on Acast. See acast.com/privacy for more information.

Researcher Jim Fan presents the next grand challenge in the quest for AI: the "foundation agent," which would seamlessly operate across both the virtual and physical worlds. He explains how this technology could fundamentally change our lives — permeating everything from video games and metaverses to drones and humanoid robots — and explores how a single model could master skills across these different realities.

Learn more about our flagship conference happening this April at attend.ted.com/podcast


Hosted on Acast. See acast.com/privacy for more information.

[00:00:00] TED Audio Collective If we look to the present and future of artificial intelligence,

[00:00:13] there's a sure possibility that all of the virtual wonders we've witnessed in simulation

[00:00:18] can expand our physical world as well.

[00:00:21] We've imagined it in sci-fi masterpieces like in Star Wars, where robots are brought

[00:00:27] to life and become key players in our heroes adventures.

[00:00:30] And we've also seen it play out in our own lives when chat GBT creates images and

[00:00:36] stories of any scenario we dream of.

[00:00:38] The question now is, where can we go from here?

[00:00:43] I'm Cheryl Dorsey and this is TED Tech.

[00:00:47] AI Research Scientist Dr. Jim Fan spearheads a series of pretty groundbreaking robotic technologies

[00:00:55] like Voyager, the first AI agent that plays Minecraft proficiently and bootstraps its

[00:01:01] capabilities continuously.

[00:01:03] And Jim's vision for the future of artificial intelligence, text or code can seamlessly

[00:01:08] translate into real world skills as AI learns, adapts and improves its capabilities.

[00:01:15] In his talk, Jim gives us all a fun and hopeful look at where AI can lead us.

[00:01:22] Let's take a listen.

[00:01:25] Tired of unnecessary payroll errors?

[00:01:35] Stop them in their tracks.

[00:01:36] With Paycom, employees do their own payroll.

[00:01:39] They're able to identify errors and fix them before submission right in the app because

[00:01:44] no one can afford for payroll to be wrong.

[00:01:48] Not HR and payroll teams, not leaders and definitely not employees.

[00:01:53] Paychecks, time sheet corrections, uninterested days, missing overtime hours and expense mistakes

[00:01:59] are, well, a necessary for everyone.

[00:02:03] Manage the process to make Payday right with Paycom.

[00:02:06] Learn more at paycom.com slash sound rise.

[00:02:10] That's paycom.com slash sound rise.

[00:02:15] In the spring of 2016, I was sitting in a classroom at Columbia University but wasn't

[00:02:25] paying attention to the lecture.

[00:02:28] Instead, I was watching a board game tournament on my laptop and wasn't just any tournament

[00:02:34] but a very, very special one.

[00:02:37] The match was between AlphaGo and Lee Ciddle.

[00:02:40] The AI has just won three out of five games and became the first ever to beat a human champion

[00:02:47] at the game of Go.

[00:02:49] I still remember the adrenaline of seeing history unfold that day.

[00:02:54] The glory moment when AI agents finally entered the mainstream.

[00:03:00] But when the excitement fades, I realized that as mighty as AlphaGo was, it could only

[00:03:06] do one thing and one thing alone.

[00:03:09] It isn't able to play any other games like Super Mario or Minecraft.

[00:03:14] And it certainly cannot do a dirty laundry or cook a nice dinner for you tonight.

[00:03:19] But what we truly want, our AI agents as versatile as Wally, as diverse as all the robot body

[00:03:26] forms or embodiments in Star Wars and works across infinite realities, virtual physical

[00:03:34] as in Ready Player One.

[00:03:36] So how can we achieve these science fictions in possibly the near future?

[00:03:42] This is a practitioner's guide towards generally capable AI agents.

[00:03:47] Most of the ongoing research efforts can be laid out nicely across three axes.

[00:03:53] The number of skills an agent can do, the body forms or embodiments it can control and

[00:03:59] the realities it can master.

[00:04:01] So let's take it one access at a time.

[00:04:05] Earlier this year, I led the Voyager project which is an agent that scales up massively

[00:04:11] on the number of skills.

[00:04:13] And there's no game better than Minecraft for the infinite creative things it supports.

[00:04:18] And here's a fun fact for all of you.

[00:04:20] Minecraft has 140 million active players and just to put that number in perspective, it's

[00:04:27] more than twice the population of UK.

[00:04:30] And Minecraft is so insanely popular because it's open-ended.

[00:04:34] It does not have a fixed storyline for you to follow and you can do whatever your heart

[00:04:38] desires in the game.

[00:04:40] And when we set Voyager free in Minecraft, we see that it's able to play the game for

[00:04:44] hours on end without any human intervention.

[00:04:48] It can explore the terrains, my all kinds of materials, fight monsters, craft hundreds

[00:04:53] of recipes and unlock an ever-expanding tree of skills.

[00:04:57] So what's the magic?

[00:04:59] The core insight is coding as action.

[00:05:04] First, we convert this really world into a textual representation using a Minecraft JavaScript

[00:05:09] API made by the enthusiastic community.

[00:05:13] Voyager invokes GBD4 to write code snippets in JavaScript that become executable skills

[00:05:19] in the game.

[00:05:20] Yet, just like human engineers, Voyager makes mistakes.

[00:05:24] It isn't always able to get a program correct on the first try.

[00:05:27] So we add a self-reflection mechanism for it to improve.

[00:05:31] There are three sources of feedback for the self-reflection, the JavaScript code execution

[00:05:36] error, the agent state like health and hunger, and the world state like terrains and enemies

[00:05:42] nearby.

[00:05:44] So Voyager takes an action, observes the consequences of its action on the world and on itself, reflects

[00:05:50] on how it can possibly do better, trials some new action plans and rings and repeats.

[00:05:56] And once the skill becomes mature, Voyager saves it to skill library as a persistent memory.

[00:06:02] You can think of the skill library as a code repository, returned entirely by a language

[00:06:08] model.

[00:06:09] And in this way, Voyager is able to bootstrap its own capabilities recursively as it explores

[00:06:16] and experiments in Minecraft.

[00:06:18] So let's work through an example together.

[00:06:22] Voyager finds itself hungry and needs to get food as soon as possible.

[00:06:26] It senses four entities nearby, a cat, a villager, a pig, and some weed seeds.

[00:06:33] Voyager starts an inner monologue.

[00:06:35] Do I kill the cat or villager for food?

[00:06:39] Horrible idea.

[00:06:40] How about a weed seed?

[00:06:41] I can grow a farm out of the seeds but that's going to take a long time.

[00:06:45] So sorry, Piggy, you are the chosen one.

[00:06:50] Voyager finds a piece of iron in its inventory so it records an old skill from the library

[00:06:56] to craft an iron sword and starts to learn a new skill called Han Pig.

[00:07:02] And now we also know that unfortunately, Voyager isn't vegetarian.

[00:07:07] One question still remains.

[00:07:09] How does Voyager keep exploring indefinitely?

[00:07:12] We only give it a high level directive that is to obtain as many unique items as possible

[00:07:18] and Voyager implements a curriculum to find progressively harder and more novel challenges

[00:07:24] to solve all by itself.

[00:07:28] And putting all these together, Voyager is able to not only master but also discover new

[00:07:34] skills along the way.

[00:07:36] And we did not pre-program any of this.

[00:07:38] It's all Voyager's idea.

[00:07:40] And this is what we call lifelong learning.

[00:07:43] When an agent is forever curious and forever pursuing new adventures.

[00:07:49] Compared to AlphaGo, Voyager scales up massively on a number of things it can do but still controls

[00:07:55] only one body in Minecraft.

[00:07:58] So the question is, can we have an algorithm that works across many different bodies?

[00:08:04] Enteres Manimorph.

[00:08:07] It is an initiative I co-developed at Stanford.

[00:08:11] We created a foundation model that can control not just one but thousands of robots with

[00:08:17] very different arm and leg configurations.

[00:08:21] Manimorph is able to handle extremely varied kinematic characteristics from different robot

[00:08:26] bodies.

[00:08:28] And this is the intuition on how we create a manimorph.

[00:08:32] First, we design a special vocabulary to describe the body parts so that every robot body

[00:08:39] is basically a sentence written in the language of this vocabulary.

[00:08:45] And then we just apply a transformer to it, much like charge-BT.

[00:08:49] But instead of writing out text, Manimorph writes out motor controls.

[00:08:54] We show that Manimorph is able to control thousands of robots to go upstairs, cross difficult

[00:09:00] terrains and avoid obstacles.

[00:09:03] Extrapolating into the future, if we can greatly expand this robot vocabulary, I envision

[00:09:09] Manimorph 2.0 will be able to generalize to robot hands, humanoids, dogs, drones and even

[00:09:17] beyond.

[00:09:18] Compared to Voyager, Manimorph takes a big stride towards multi-body control.

[00:09:24] And now let's take everything one level further and transfer the skills and embodiments across

[00:09:30] realities.

[00:09:32] Enteres Isaac Sim and Videa's simulation effort.

[00:09:37] The biggest strength of Isaac Sim is to accelerate physics simulation to a thousand acts faster

[00:09:44] than real time.

[00:09:45] So it's very much like the virtual sparring dojo in the movie Matrix.

[00:09:51] And what's more, Isaac Sim can procedurally generate worlds with infinite variations

[00:09:57] so that no two look the same.

[00:10:00] So here's an interesting idea.

[00:10:02] If an agent is able to master 10,000 simulations, then it may very well just generalize to a real

[00:10:10] physical world which is simply the 10,000 and first reality and let that sink in.

[00:10:17] We will eventually get to the single agent that generalizes across all three axes.

[00:10:23] And that is the foundation agent.

[00:10:26] I believe training foundation agent will be very similar to Chad G.P.T.

[00:10:31] All language tasks can be expressed as texting and text out.

[00:10:36] Be it writing poetry, translate English to Spanish or coding Python, it's all the same.

[00:10:42] And Chad G.P.T. simply scales this up massively across lots and lots of data.

[00:10:49] It's the same principle.

[00:10:51] The foundation agent takes as input an embodiment prompt and a task prompt and output actions.

[00:10:58] And we're training it by simply scaling it up massively across lots and lots of realities.

[00:11:07] I believe in a future where everything that moves will eventually be autonomous.

[00:11:13] And one day, we will realize that all the AI agents across Wally, Star Wars, Ready Player One,

[00:11:20] no matter if they are in the physical or virtual spaces,

[00:11:24] will all just be different prompts to the same foundation agent.

[00:11:29] And that, my friends, will be the next branch challenge.

[00:11:33] Y'all will quest for AI.

[00:11:35] Thank you.

[00:11:46] So, Portford Ted Tech comes from Factor.

[00:11:48] Factor has delicious ready to eat meals to make eating better, way easy.

[00:11:54] Every fresh, never frozen meal is chef crafted, dietician approved and ready to go in just

[00:12:00] two minutes.

[00:12:01] I got Factor meals as I was getting ready to go on book tour, last summer, and every day

[00:12:07] I had so many different delicious meals to choose from.

[00:12:11] You can choose from calorie smart options, protein plus options and keto options.

[00:12:15] On top of that, if you want breakfast, Factor also offers those.

[00:12:20] You can get smoothies for breakfast pancakes, all sorts of different options.

[00:12:24] It's flexible for your schedule so you can get as many meals or as little meals as you

[00:12:28] need.

[00:12:29] Head to FactorMills.com slash Ted Tech 50 and use Ted Tech 50 to get 50% off.

[00:12:33] That's code TedTech50 at FactorMills.com slash Ted Tech 50 to get 50% off.

[00:12:45] Ted Tech is part of the Ted Audio Collective.

[00:12:48] This episode was produced by Nina Lawrence, edited by Alejandra Salazar and fact check

[00:12:53] by Julia Dickerson.

[00:12:56] Special thanks to Maria Lodias, Faraday Grange, Corey Hajim, Danielle Paloreso and Michelle

[00:13:02] Quint.

[00:13:03] I'm Cheryl Dorsey.

[00:13:04] Thanks for listening and talk to you again next week.