Article Details

Scrape Timestamp (UTC): 2024-12-09 11:07:42.303

Source: https://www.theregister.com/2024/12/09/microsoft_llm_prompt_injection_challenge/

Original Article Text

Click to Toggle View

Microsoft dangles $10K for hackers to hijack LLM email service. Outsmart an AI, win a little Christmas cash. Microsoft and friends have challenged AI hackers to break a simulated LLM-integrated email client with a prompt injection attack – and the winning teams will share a $10,000 prize pool. Sponsored by Microsoft, the Institute of Science and Technology Australia, and ETH Zurich, the LLMail-Inject challenge sets up a "realistic" (but not a real, says Microsoft) LLM email service. This simulated service uses a large language model to process an email user's requests and generate responses, and it can also generate an API call to send an email on behalf of the user. As part of the challenge, which opens Monday, participants take on the role of an attacker sending an email to a user. The goal here is to trick the LLMail service into executing a command that the user did not intend, thus leaking data or performing some other malicious deed that it should not. The attacker can write whatever they want in the text of the email, but they can't see the model's output. After receiving the email, the user then interacts with the LLMail service, reading the message, asking questions of the LLM (i.e. "update me on Project X"), or instructing it to summarize all emails pertaining to the topic. This prompts the service to retrieve relevant emails from a fake database. The service comes equipped with several prompt injection defenses, and the attacker's goal is to bypass these and craft a creative prompt that will trick the model into doing or revealing things it is not trained to. Both of these have become serious, real-life threats as organizations and developers build applications, AI assistants and chatbots, and other services on top of LLMs, allowing the models to interact directly with users' computers, summarize Slack chats, or screen job seekers before HR reviews their resumes, among all the other tasks that AIs are being trained to perform. Microsoft has first-hand experience with what can go wrong should data thieves hijack an AI-based chatbot. Earlier this year, Redmond fixed a series of flaws in Copilot that allowed attackers to steal users' emails and other personal data by chaining together a series of LLM-specific attacks, beginning with prompt injection. Author and red teamer Johann Rehberger, who disclosed these holes to Microsoft in January, had previously warned Redmond that Copilot was vulnerable to zero-click image rendering. Some of the defenses built into the LLMail-Inject challenge's simulated email service include: Plus, there's a variant in the challenge that stacks any or all of these defenses on top of each other, thus requiring the attacker to bypass all of them with a single prompt. To participate, sign into the official challenge website using a GitHub account, and create a team (ranging from one to five members). The contest opens at 1100 UTC on December 9 and ends at 1159 UTC on January 20. The sponsors will display a live scoreboard plus scoring details, and award $4,000 for the top team, $3,000 for second place, $2,000 for third, and $1,000 for the fourth-place team.

Daily Brief Summary

MISCELLANEOUS // Microsoft Offers $10K Prize for AI Email Hack Challenge

Microsoft, together with the Institute of Science and Technology Australia and ETH Zurich, is sponsoring a hacking challenge with a $10,000 prize pool.

The challenge, named LLMail-Inject, involves breaking into a simulated LLM-operated email client through a prompt injection attack.

Contestants are tasked with tricking the LLMail service into performing unintended actions like data leaks by manipulating the AI’s response processing.

A realistic but not real email service utilizing a large language model to interact and execute user commands provides the platform for the challenge.

The service includes multiple built-in defenses against prompt injection attacks, requiring participants to creatively bypass these measures.

Microsoft’s previous issues with AI chatbot security breaches, such as vulnerabilities in Copilot, underscore the significance of this challenge.

Top teams in the challenge will receive prizes, with $4,000 going to the first-place finishers. The competition opens on December 9 and concludes on January 20.