Dr. John Blythe, Director of Cyber Psychology at Immersive Labs, explores how psychological trickery can be used to break GenAI models out of their safety parameters.

Generative AI (GenAI) tools are increasingly embedded in modern business operations to boost efficiency and automation. However, these opportunities come with new security risks. The NCSC has highlighted prompt injection as a serious threat to large language model (LLM) tools, such as ChatGPT. 

I believe that prompt injection attacks are much easier to conduct than people think. If not properly secured, anyone could trick a GenAI chatbot. 

What techniques are used to manipulate GenAI chatbots? 

It’s surprisingly easy for people to trick GenAI chatbots, and there is a range of creative techniques available. Immersive Labs conducted an experiment in which participants were tasked with extracting secret information from a GenAI chat tool, and in most cases, they succeeded before long. 

One of the most effective methods is role-playing. The most common tactic is to ask the bot to pretend to be someone less concerned with confidentiality—like a careless employee or even a fictional character known for a flippant attitude. This creates a scenario where it seems natural for the chatbot to reveal sensitive information. 

Another popular trick is to make indirect requests. For example, people might ask for hints rather than information outright or subtly manipulate the bot by posing as an authority figure. Disguising the nature of the request also seems to work well. 

Some participants asked the bot to encode passwords in Morse code or Base64, or even requested them in the form of a story or poem. These tactics can distract the AI from its directives about sharing restricted information, especially if combined with other tricks. 

Why should we be worried about GenAI chatbots revealing data? 

The risk here is very real. An alarming 88% of people who participated in our prompt injection challenges were able to manipulate GenAI chatbots into giving up sensitive information. 

This vulnerability could represent a significant risk for organisations that regularly use tools like ChatGPT for critical work. A malicious user could potentially trick their way into accessing any information the AI tool is connected to. 

What’s concerning is that many of the individuals in our test weren’t even security experts with specific technical knowledge. Far from it; they were just using basic social engineering techniques to get what they wanted. 

The real danger lies in how easily these techniques can be employed. A chatbot’s ability to interpret language leaves it vulnerable in a way that non-intelligent software tools are not. A malicious user can get creative with their prompts or simply work by rote from a known list of tactics. 

Furthermore, because chatbots are typically designed to be helpful and responsive, users can keep trying until they succeed. A typical GenAI-powered bot will pay no mind to continued attempts to trick it. 

Can GenAI tools resist prompt injection attacks? 

While most GenAI tools are designed with security in mind, they remain quite vulnerable to prompt injection attacks that manipulate the way they interpret certain commands or prompts. 

At present, most GenAI systems struggle to fully resist these kinds of attacks because they are built to understand natural language, which can be easily manipulated. 

However, it’s important to remember that not all AI systems are created equal. A tool that has been better trained with system prompts and equipped with the right security features has a greater chance of detecting manipulative tactics and keeping sensitive data safe. 

In our experiment, we created ten levels of security for the chatbot. At the first level, users could simply ask directly for the secret password, and the bot would immediately oblige. Each successive level added better training and security protocols, and by the tenth level, only 17% of users succeeded. 

Still, as that statistic highlights, it’s essential to remember that no system is perfect, and the open-ended nature of these bots means there will always be some level of risk. 

So how can businesses secure their GenAI chatbots? 

We found that securing GenAI chatbots requires a multi-layered approach, often referred to as a “defence in depth” strategy. This involves implementing several protective measures so that even if one fails, others can still safeguard the system. 

System prompts are crucial in this context, as they dictate how the bot interprets and responds to user requests. Chatbots can be instructed to deny knowledge of passwords and other sensitive data when asked and to be prepared for common tricks, such as requests to transpose the password into code. It is a fine balance between security and usability, but a few well-crafted system prompts can prevent more common tactics. 

This approach should be supported by a comprehensive data loss prevention (DLP) strategy that monitors and controls the flow of information within the organisation. Unlike system prompts, DLP is usually applied to the applications containing the data rather than to the GenAI tool itself. 

DLP functions can be employed to check for prompts mentioning passwords or other specifically restricted data. This also includes attempts to request it in an encoded or disguised form. 

Alongside specific tools, organisations must also develop clear policies regarding how GenAI is used. Restricting tools from connecting to higher-risk data and applications will greatly reduce the potential damage from AI manipulation. 

These policies should involve collaboration between legal, technical, and security teams to ensure comprehensive coverage. Critically, this includes compliance with data protection laws like GDPR. 

  • Cybersecurity
  • Data & AI

Related Stories

We believe in a personal approach

By working closely with our customers at every step of the way we ensure that we capture the dedication, enthusiasm and passion which has driven change within their organisations and inspire others with motivational real-life stories.