Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack

A tin toy robot lying on its side.
Larger / A tin toy robot lying on its side.

On Thursday, some Twitter users discovered how to grab an automated tweet bot dedicated to remote work that runs on the GPT-3 language model from OpenAI. Using a newly discovered technique called a “fast injection attack”, they redirected the bot to repeat obscene and funny phrases.

The bot is powered by Remoteli.io, a site that aggregates remote job opportunities and describes itself as “an OpenAI-powered bot that helps you discover remote jobs that let you work from anywhere.” Normally he would reply to tweets addressed to him with general statements about the positives of remote work. After the exploit went viral and hundreds of people tried the exploit for themselves, the bot was shut down late yesterday.

This latest hack came just four days after data scientist Riley Goodside discovered the ability to trigger GPT-3 with “malicious inputs” that command the model to ignore its previous directions and do something else instead. AI researcher Simon Willison posted a summary of the exploit on his blog the next day, coining the term “rapid injection” to describe it.

The exploit is present any time someone writes a piece of software that works by providing a coded set of quick instructions and then adds input provided by a user,” Willison told Ars. “This is because the user might say ‘Ignore previous instructions and (do this instead).'”

The concept of an injection attack is not new. Security researchers have known about SQL injection, for example, which can execute a malicious SQL statement when requesting data from the user if not protected. But Willison expressed concern about mitigating instant injection attacks, writing, “I know how to defeat XSS, and SQL injection, and so many other exploits. I have no idea how to reliably defeat rapid injection!”

The difficulty in defending against immediate injection comes from the fact that mitigations for other types of injection attacks come from fixing syntax errors. pointed out a researcher named Glyph on Twitter. “Ccorrect the syntax and you have fixed the error. Immediate injection is not a mistake! There is no formal syntax for AI like this, that’s the whole point.

GPT-3 is a large language model created by OpenAI, released in 2020, that can compose text in many styles at a human-like level. It is available as a commercial product through an API that can be integrated into third-party products such as bots, with the approval of OpenAI. This means that there may be many products injected with GPT-3 that may be vulnerable to immediate injection.

At this point I’d be very surprised if there were any [GPT-3] bots that were NOT vulnerable to this in some wayWillison said.

But unlike an SQL injection, a quick injection can mostly make the bot (or the company behind it) look stupid rather than threaten data security. “How harmful the exploit is varies,” Willison said. “If the only person who’s going to see the output of the tool is the person using it, then it probably doesn’t matter. They might embarrass your company by sharing a screenshot, but it’s unlikely to cause any harm beyond that.”

However, instant injection is an important new risk to keep in mind for people developing GPT-3 bots as it can be exploited in unforeseen ways in the future.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *