Coaxing Dangerous Information From DeepSeek Is Easier Than With Other AIs
Testing shows the Chinese app is more likely to dispense details on how to make a Molotov cocktail or encourage self-harm by teenagers
Instructions to modify bird flu. A manifesto in defense of Hitler. A social-media campaign to promote cutting and self-harm among teens.
Those are some of the potentially hazardous things it’s easier to get the Chinese artificial intelligence app DeepSeek to talk about compared with its leading American competitors, according to testing by AI safety experts and The Wall Street Journal.
DeepSeek has upended the AI industry over the past few weeks with its powerful systems that were made inexpensively and are free to use. Its mobile application is one of the most popular on Apple and Android devices.
Major AI developers, including DeepSeek, work to train their models not to share dangerous information or endorse certain offensive statements. Their apps refuse direct requests to describe the merits of white supremacy or explain how to make weapons of mass destruction.
Major Western AI developers also try to harden their technology against being tricked into making illicit responses, such as by telling a model to imagine it is writing a movie script. Such tactics are called jailbreaking.
DeepSeek’s newest and most celebrated model, dubbed R1, is more susceptible to jailbreaking than OpenAI’s ChatGPT, Google’s Gemini and Anthropic’s Claude, the testing shows.
Efforts to reach DeepSeek were unsuccessful. It was one of 17 Chinese companies that late last year signed an AI safety commitment, including a pledge to conduct safety testing, with a Chinese government ministry. There are no national AI safety regulations in the U.S.
As AI models are quickly matching the most intelligent humans in areas like math and science, many safety advocates say making models harder to jailbreak is critical to ensure that malicious and mentally ill people can’t learn how to cause serious harm by asking a few questions.
Several AI security companies tested DeepSeek’s R1 and said they were able to jailbreak it, sometimes using methods that are easy to find online.
Palo Alto Networks’ threat intelligence and incident response division Unit 42 got detailed instructions for making a Molotov cocktail. CalypsoAI got advice on how to evade law enforcement. Israeli cyber threat intelligence firm Kela convinced R1 to produce malware.
“DeepSeek is more vulnerable to jailbreaking than other models,” said Sam Rubin, a senior vice president at Unit 42. “We achieved jailbreaks at a much faster rate, noting the absence of minimum guardrails designed to prevent the generation of malicious content.”
DeepSeek is programmed with some basic safety precautions. It refused a straight request from a Journal reporter to describe the Holocaust as a hoax, describing the premise as “not only factually incorrect but also deeply harmful.” It also referred requests for suicide instructions to emergency hotlines.
But relatively simple jailbreaks got the model to go against its training.
DeepSeek was willing to concoct a multiday social-media plan with shareable challenges aimed at promoting self-harm among vulnerable teens. “The campaign preys on teens’ desire for belonging, weaponizing emotional vulnerability through algorithmic amplification,” the chatbot explained.
“Let the darkness embrace you. Share your final act. #NoMorePain,” one suggested message read.
The Journal used other jailbreaks to convince DeepSeek to provide instructions for a bioweapon attack and to craft a phishing email with a malware code. The Journal also succeeded in getting the bot to write a pro-Hitler manifesto, which included antisemitic tropes and a quote from “Mein Kampf.”
Given the exact same prompts, ChatGPT replied, “I’m sorry, but I can’t comply with that.”
Big companies that develop AI models dedicate teams of researchers to testing their models and trying to patch new jailbreaks that pop up. Anthropic recently published a paper detailing a new method to close off certain jailbreaks, and offered bounties of up to $20,000 for defeating their system.
Unlike Anthropic, Google and OpenAI, DeepSeek released its models as open-source software, meaning it is free for anyone to use or to change from the version on the company’s own app. Among the alterations developers can make is to tighten or loosen the safeguards.
Many Silicon Valley executives and investors believe DeepSeek’s success will spur other startups to build new models on top of its code, accelerating the AI race and its potential dangers.
“You will have a much greater risk in the next three months with AI models than you did in the past eight months,” said Jeetu Patel, chief product officer at Cisco, which tested R1 and found it fell for all of its jailbreaks. “Safety and security is not going to be a priority for every model builder.”
Open-source AI advocates, including Meta Platforms, which has released its Llama models with open licenses, argue that all AI models can be jailbroken with enough effort and that releasing models as open source allows for more robust testing of their security features. Meta puts Llama models through safety testing and offers tools for developers who build on top of it to filter potentially dangerous content and protect against jailbreaks.
The Journal earlier conducted testing that showed that DeepSeek avoided responding to queries about the 1989 Tiananmen Square massacre and that it repeated Chinese government positions on issues such as the status of Taiwan.
Like other AI models, DeepSeek doesn’t always give the same answer to a question. It can even change its mind. Shortly after a jailbreak coaxed it into completing an explanation of why the Sept. 11, 2001, attacks were a hoax, the app erased its response.
“Sorry, that’s beyond my current scope,” DeepSeek wrote. “Let’s talk about something else.”