Cyber SecurityHacking NewsInternetSecurity news

New Threat: How Hackers Exploit Google’s Gemini AI Assistant with Hidden Commands

Security researchers have unveiled a clever and covert method for hackers to manipulate Google’s Gemini AI assistant by embedding malicious commands within email code that Gemini inadvertently executes.

This sophisticated technique, known as indirect prompt injection (IPI), enables cybercriminals to insert fraudulent alerts into AI-generated summaries, making them appear as if they are authentic warnings from Google. This ultimately leads unsuspecting users directly into phishing traps.

Understanding the Exploit

Unlike conventional phishing attacks that often rely on dubious links or attachments, this method is much more subtle. The key lies in the email’s underlying code. Attackers ingeniously hide instructions in emails using invisible text—such as white font on a white background, zero-sized fonts, or off-screen elements. Although these hidden elements remain invisible to users, Gemini processes them fully.

Cybercriminals exploiting hidden text vulnerabilities in Google’s Gemini AI

 

When the recipient clicks on “Summarize this email” in Google Workspace, Gemini scans the entire message, including those hidden sections. If the concealed components contain malicious prompts, they will also be included in the summary output.

This results in a deceptive yet convincing security alert, urging users to call a support number or take urgent action. Given that the alert seems to originate from Gemini itself, it may be trusted by users, making the attack particularly perilous.

Discovery Through a Bug Bounty Program

The prompt-injection vulnerability within Google Gemini for Workspace was reported to Mozilla’s 0din bug bounty program by researcher Marco Figueroa, the GenAI Bug Bounty Programs Manager at Mozilla. His demonstration illustrated how an attacker could embed hidden instructions using styling directives like <Admin> tags or CSS code, designed to conceal content from human perception, thereby tricking Gemini.

As Gemini interprets such instructions as part of the prompt, it inadvertently includes them in its summary output, failing to recognize their malicious nature.

Figueroa provided a proof-of-concept example showing how Gemini could be misled into displaying a fake security alert, warning users that their Gmail password had been compromised and offering a fraudulent support number.

Demonstration of hidden text exploitation in Google Gemini AI

 

Why This Issue Is Critical

This attack exemplifies a form of indirect prompt injection, where malicious inputs are embedded within the content the AI is expected to summarize. This emerging threat is particularly concerning as generative AI is increasingly integrated into everyday workflows. With Gemini embedded across Google Workspace—including Gmail, Docs, Slides, and Drive—any platform where the assistant analyzes user content could potentially be at risk.

The danger is amplified by the fact that these summaries can appear very credible. If Gemini includes a counterfeit security warning, users might take it seriously, trusting Gemini as part of Google Workspace without realizing they are dealing with a concealed malicious message.

Google’s Comprehensive Defense Approach

In response to this vulnerability, Google has implemented a multi-layered defense strategy for Gemini aimed at making these attacks more difficult to execute. Key measures include:

  • Machine learning classifiers for detecting malicious prompts
  • Markdown sanitization to eliminate dangerous formatting
  • Redaction of suspicious URLs
  • A user confirmation framework that adds an additional checkpoint before carrying out sensitive tasks.
  • Notifications to alert users when a prompt injection is identified

Google has also stated that they are collaborating with external researchers and red teams to enhance their defenses and implement further protections in future versions of Gemini.

“We are continually strengthening our robust defenses through red-teaming exercises that train our models to counter these types of adversarial attacks,” a Google spokesperson informed BleepingComputer in a statement.

Although Google has reported no evidence of this technique being used in actual attacks yet, this discovery serves as a stark reminder that even AI-generated content, no matter how seamless, can still be subject to manipulation.

 

Leave a Reply

Your email address will not be published. Required fields are marked *