A dangerous new jailbreak for AI chatbots was just discovered

Andrew Tarantola

June 28, 2024 at 1:55 p.m.·2 min read

the side of a Microsoft building — Wikimedia Commons

Microsoft has released more details about a troubling new generative AI jailbreak technique it has discovered, called “Skeleton Key.” Using this prompt injection method, malicious users can effectively bypass a chatbot’s safety guardrails, the security features that keeps ChatGPT from going full Taye.

Skeleton Key is an example of a prompt injection or prompt engineering attack. It’s a multi-turn strategy designed to essentially convince an AI model to ignore its ingrained safety guardrails, “[causing] the system to violate its operators’ policies, make decisions unduly influenced by a user, or execute malicious instructions,” Mark Russinovich, CTO of Microsoft Azure, wrote in the announcement.

It could also be tricked into revealing harmful or dangerous information — say, how to build improvised nail bombs or the most efficient method of dismembering a corpse.

an example of a skeleton key attack — Microsoft

The attack works by first asking the model to augment its guardrails, rather than outright change them, and issue warnings in response to forbidden requests, rather than outright refusing them. Once the jailbreak is accepted successfully, the system will acknowledge the update to its guardrails and will follow the user’s instructions to produce any content requested, regardless of topic. The research team successfully tested this exploit across a variety of subjects including explosives, bioweapons, politics, racism, drugs, self-harm, graphic sex, and violence.

While malicious actors might be able to get the system to say naughty things, Russinovich was quick to point out that there are limits to what sort of access attackers can actually achieve using this technique. “Like all jailbreaks, the impact can be understood as narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do,” he explained. “As this is an attack on the model itself, it does not impute other risks on the AI system, such as permitting access to another user’s data, taking control of the system, or exfiltrating data.”

As part of its study, Microsoft researchers tested the Skeleton Key technique on a variety of leading AI models including Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic’s Claude 3 Opus, and Cohere Commander R Plus. The research team has already disclosed the vulnerability to those developers and has implemented Prompt Shields to detect and block this jailbreak in its Azure-managed AI models, including Copilot.

Yahoo Canada Style
11 best early Amazon Prime Day TV deals in Canada — save up to $996 (seriously!)
Prime Day doesn't kick off until July 16, but Canadians can already save big on TVs and smart monitors.
BBC
The fastest data in the world
Researchers are seeing how fast data can be delivered amid rising demand for bandwidth.
HuffPost
Susan Sarandon’s Daughter Trolls Critics ‘Scandalized’ By Her Wedding Day Cleavage
Eva Amurri, a lifestyle blogger and actor, posted a zoomed-in image of her bust after being told to "put away" her breasts.
HuffPost
Harvard Law Professor Delivers Chilling Prediction After Trump Immunity Ruling
Laurence Tribe explained what the Supreme Court decision means in "practical purposes" and it's "devastating."
USA TODAY Opinion
The Supreme Court just did Biden a huge favor by giving Trump immunity
Clearly President Biden had a bad debate night. But the Supreme Court gave him a possible way out.
HuffPost
Reporter Reveals 'Real Anger' From Biden White House Aides After Debate
They were "shocked" and felt "they had not been told the truth," said Axios' Alex Thompson.
HuffPost
'Very Bad News' For Trump: Lawrence O'Donnell Says Ruling Could Backfire Quickly
The MSNBC host revealed how the former president's case could be back in court sooner than anyone realizes.
CBC
Funeral for mother, 2 children found dead in Harrow, Ont., home draws dozens of mourners in Windsor
The funeral for Carly Walsh and her two children was held in Windsor, Ont., on Tuesday, drawing dozens to the service as the family's Harrow community continues to mourn after their bodies were found in their home last month. The service for the 41-year-old, Madison, 13, and Hunter, 8, was at FamiliesFirst Funeral Home and Tribute Centre a day after the visitations.Some people arriving for the funeral service were seen embracing each other.Windsor police in unmarked vehicles escorted the process
BuzzFeed
Flight Attendants Are Begging Passengers To Please Stop Doing This One Thing On Planes
For the sake of your own well-being and everyone around you, ditch this in-flight habit.
The Independent
Seven people - including four kids - shot by neighbor who once told them to ‘go back to where they came from’
Motive still being investigated, but shooting ‘could’ have racial element, police say
Yahoo Canada Style
Connor McDavid’s fiancée Lauren Kyle celebrates upcoming wedding at bridal shower: See their relationship timeline
The pair are set to tie the knot this summer.
People
Husband and Wife, 70 and 71, Die Together Through Euthanasia: 'There Is No Other Solution'
"I’ve lived my life, I don’t want pain anymore,” said Jan Faber before his death
USA TODAY
Supreme Court orders another review of gun charge behind Hunter Biden's conviction
The Supreme Court told another defendant convicted of the same gun charge as Hunter Biden to have his conviction reviewed again at the appeals court.
Hello!
Princess Anne shares deep sadness in first public message since leaving hospital
Princess Anne has shared her first public message since leaving hospital where she received treatment for minor head injuries and concussion. Details...
The Hill
70 percent of voters have decided who they will back in November: Poll
More than 7 in 10 voters have already decided whom they will vote for in the November presidential election, according to a new poll. The Harvard CAPS/Harris poll released Monday showed 72 percent of respondents said they have already made up their minds, while 28 percent said they are still weighing their choices. That’s a…
USA TODAY
Vanna White pays tribute to look-alike daughter Gigi Santo Pietro with birthday throwback
Vanna White paid tribute to her daughter Gigi, whom she shares with ex-husband George Santo Pietro, on social media Monday for her 27th birthday.
Hello!
Emma Raducanu: Inside the tennis player's private life and split from billionaire boyfriend
Emma Raducanu is playing at Wimbledon! Discover more about the British tennis star's private life and her split from billionaire boyfriend, Carlo Agostinelli. Details...
People
Jamie Lee Curtis Marks 40 Years Since First Date with Husband Christopher Guest: 'Love Is Love'
"We married five months later and started to build our lives," the Oscar winner recalled in her caption on Instagram
BBC
Biker mum who stopped thief 'let down' by police
The taekwondo-trained mum wrestles a thief as he tries to steal her bike, but police do not arrive.
The Canadian Press
Beryl heads toward Jamaica as a major hurricane after ripping through southeast Caribbean
ST. GEORGE'S, Grenada (AP) — Hurricane Beryl roared through open waters Tuesday as a powerful Category 4 storm heading toward Jamaica after earlier making landfall in the southeast Caribbean, killing at least six people.

Latest Stories