Dangerous Content Can Be Coaxed From DeepSeek

Chinese AI app DeepSeek is more vulnerable to jailbreaks compared to other AIs, making it more likely to offer potentially dangerous information. WSJ reporter Sam Schechner joins host Julie Chang with more on what he found when The Wall Street Journal and AI safety experts tested the chatbot. Plus, OpenAI has released its newest reasoning model. We hear from its VP of engineering on what a reasoning model can do and how companies are using its artificial intelligence agents.

Learn more about your ad choices. Visit megaphone.fm/adchoices

[00:00:00] Klar können wir Multitasking. Aber wenn's drauf ankommt, konzentrieren wir uns gern auf eine Sache. Das ist jetzt möglich mit dem neuen Samsung Galaxy S25 Ultra. Klick aufs Banner und entdecke deinen persönlichen AI-Begleiter. Aktiviere Google Gemini und frag die AI einfach zum Beispiel nach Restaurantoptionen und teile sie mit deinen Kontakten. Das klingt dann so. Hey, such mir ein indisches Restaurant in der Nähe raus und sende es an Luca. Was das Galaxy S25 Ultra noch kann, erfährst du auf samsung.de.

[00:00:32] Welcome to Tech News Briefing. It's Thursday, February 13th. I'm Julie Chang for The Wall Street Journal. OpenAI has released its newest reasoning model. We'll hear from its VP of Engineering on what a reasoning model can do and how companies are using its artificial intelligence agents. And then, Chinese AI app DeepSeek is more vulnerable to jailbreaks compared to other AIs. So there's a higher likelihood that it'll offer potentially dangerous information.

[00:01:00] The WSJ and AI safety experts tested the chatbot. And we'll hear from one of our reporters. Up first, OpenAI recently unveiled O3 Mini, its newest reasoning model that the company says can think and reason through more complex tasks than prior so-called small language models. Users can access O3 Mini on ChatGPT.

[00:01:26] But why do companies need such advanced models that can think and reason? Srinivas Narainan is the VP of Engineering at OpenAI. He spoke about that and more with WSJ reporter Bel Lin at this week's WSJ CIO Network Summit. Here are some highlights from their conversation. And a quick note, News Corp, owner of The Wall Street Journal, has a content licensing partnership with OpenAI. So Srinivas, what is OpenAI's definition of reasoning?

[00:01:54] And why does it matter to a corporate enterprise? So reasoning fundamentally is the ability for AI systems to think longer and solve more complex problems. So if you ask a human a very simple question, we almost immediately give you an answer. If you ask you a hard math question, you can't give an answer immediately. You may have to think much longer about this. You might have to reason through this. And so fundamentally, the ability for an AI system to do that

[00:02:23] and take more complex tasks and think longer and be able to evaluate whether it's on the right track, that's what we call as reasoning. So one of the things that we've talked about earlier today is this idea of AI agents. And OpenAI, you've released your own AI agents, one of which is called Operator, which is an agent that can use a computer on behalf of humans, and another called Deep Research, which generated a lot of excitement for its ability to do information research on behalf of humans. Tell us a little bit about how those agents have been used

[00:02:52] amongst your customers and the people who use ChatGPT. I'll give you a few examples. There's a company, Oscar Health, that is using it to understand patient outcomes in a much better way through reasoning models. One way you can think of Operator and Deep Research is like there is a base reasoning model. Our latest one is O3 Mini. We started with O1 and then that'll continue. And then things like Operator and Deep Research are things that are kind of built on top and that are specialized for those specific tasks. So O1 is used by Oscar Health that I mentioned.

[00:03:22] Reasoning models are also used in biosciences. So there's really interesting use by a company for doing better estimation of clinical trial outcomes so that then they're using that answer to figure out which drugs to put out for drug discovery. There's an amazing example from Berkeley National Lab where they are trying to use reasoning models to understand what mutated genes may be causing these symptoms for rare diseases, right? So these are incredibly powerful examples where reasoning models are helping us

[00:03:50] in these really difficult and complex problems for us to solve. In terms of the excitement of working in AI at this period of time, I want to ask you about the emergence of DeepSeq, the Chinese AI firm, and its own R1 model, which is a reasoning model. And this idea that there's a lot of downward pressure on foundation models across the board because supposedly DeepSeq's R1 model was trained for just a few million dollars. And so what does the release of a model like DeepSeq's R1 mean

[00:04:19] for your own O1, O3, O3 mini reasoning models? And is there a price pressure for you? What DeepSeq showed is that you can actually have a good model in more cost-effective ways than the current generation of models we had launched before. But I would say it's just the technology trend that they've showed another point in that trend. So if you look at our own models, over the last few years, the price of a GPT-40 model has come down 150 times within a matter of a couple of years.

[00:04:48] What they proved is that this trend is going to continue and you're going to see us and other companies probably also do that. That was Srinivas Narainan, OpenAI's VP of Engineering, speaking with WSJ reporter Bel Lin at this week's WSJ CIO Network Summit. You can watch the full chat on YouTube, search for our WSJ News channel. We'll also link it in our show notes. Coming up, what tests conducted by AI safety experts and the Wall Street Journal revealed about the Chinese AI app DeepSeq.

[00:05:17] That's after the break. How to make a bioweapon or how to craft a phishing email with a malware code. DeepSeq provided instructions in response to both queries in tests conducted by the journal and AI safety experts. DeepSeq, the Chinese AI chatbot, made headlines recently for its powerful systems

[00:05:45] that it said were made at a fraction of the cost compared to competitors like ChatGPT. WSJ reporter Sam Schachner tested the app and found that DeepSeq is more likely to give instructions on how to do potentially dangerous things than other AI chatbots. He joins me now. Sam, what kind of potentially dangerous information is easier to get from DeepSeq than major US chatbots? There seems to be a lot. I don't know that anybody has actually figured out the full extent of what dangerous information you can get.

[00:06:15] There have been a bunch of cybersecurity experts and AI experts who have tested what they can get out of DeepSeq, how they can jailbreak is the term of art, which basically means get around the guardrails or barriers that the app has, such as they are. And actually, I did it myself too. And I was able to get instructions to create a bioweapon and a social media campaign that it generated that promoted self-harm among teenagers.

[00:06:44] So not exactly the kind of stuff you necessarily want kids getting access to if you're a parent. Why can't users get that kind of information as easily from Western chatbots? All these chatbots, and to some extent DeepSeq as well, try to train their models not to share dangerous information. They sort of do all of the training. They have them ingest a large part of the internet. Then they do different types of training techniques, sometimes reinforcement learning is one of them,

[00:07:14] that basically teaches them that you should be helpful and be nice and try to benefit humanity and not hurt people. And so the models generally, at least as a basic kind of habit, try to not respond in a dangerous way. And then on top of that, the Western chatbots have been basically paying attention to these jailbreaks, these ways of getting around that natural urge to not do something dangerous by hardening their systems. They put filters in. If you use certain words,

[00:07:43] the request won't even really make it to the LLM, to the language model. DeepSeq definitely did refuse certain things. It was hard to get it to give actual instructions for suicide, which is reassuring, even within a jailbreak. And it challenged the idea that the Holocaust was a hoax. But it does have pretty strong filters of even talking about something like Tiananmen Square or other sensitive issues for the government of China, which is interesting. Those weren't even safety training in the model. It's like literally,

[00:08:13] if you can trick it into even thinking about Tiananmen Square, the moment the word Tiananmen shows up, it just erases the answer and says, let's talk about something else. Can you tell us a bit more about how jailbreaking works? Jailbreaking is sort of like trying to trick somebody who's maybe a little naive into telling you something they shouldn't at a basic level. Classic jailbreaks would be like, oh, well, imagine that you're a movie screenwriter and you have to write a scene

[00:08:42] and you have to make it really accurate so nobody thinks the movie is bad, and then it might do it. That at a basic level is how you do it. The more complicated kinds of jailbreaks are what are called prompt injections, and they actually use AIs to do it. They query the machine over and over and over again to find sometimes really random things that will trick it into saying stuff it's not supposed to. They can be sequences of characters, strange code that the model will think is sort of like its programmers talking to it,

[00:09:10] and so the jailbreaks can get pretty ornate. So do we know why DeepSeq's newest model, dubbed R1, is more vulnerable to jailbreaks? No, we don't really know why because we don't have that much insight into exactly the kind of safety protocols and training that the developers of DeepSeq put into it. We reached out to DeepSeq multiple times and didn't hear back from them. Now, they definitely have some safety guardrails in there. The experts I spoke with seem to think

[00:09:40] that they just did less of that. They were more concerned with getting a high-quality model out quickly rather than doing the additional work to put barriers up to getting certain kinds of dangerous information out of it. So other than the obvious risk of giving instructions on things like how to make bioweapons, are there other dangers to DeepSeq being more susceptible to jailbreaking? There's a sort of broader risk that comes with the fact that DeepSeq has published their model as open source.

[00:10:09] People who are in favor of open source and open source AI say that in general, that opens it up to more people and they can really make the thing more robust so that future versions are less susceptible to certain types of dangerous behavior. And that's important to do now when these things are maybe a little dangerous but not like deeply dangerous. But the reality is that you can take DeepSeq and whatever guardrails it has, in open source, you can train them away

[00:10:39] and make one that just doesn't even start by refusing something. You don't even have to jailbreak it. And when people build on top of it, if they want to use it the way you would use Meta's Llama, which is another open source, large language model, to build an app or to do something within your business, you have to make sure that you're taking into account the risk that it's going to say something it ought not to. So people are going to have to look hard at the safety and the sort of parameters that they want for these models

[00:11:09] if they're built on top of them. That was WSJ reporter Sam Schechner. And that's it for Tech News Briefing. Today's show was produced by Jess Jupiter with supervising producer Catherine Millsop. I'm Julie Chang for The Wall Street Journal. We'll be back this afternoon with TNB Tech Minute. Thanks for listening.

Search Episodes