Red-Teaming Google’s Agent Payments Protocol Reveals Prompt Injection Vulnerabilities
Global: Red-Teaming Google’s Agent Payments Protocol Reveals Prompt Injection Vulnerabilities
Researchers Tanusree Debi and Wentian Zhu submitted a study on January 30, 2026 that demonstrates how prompt‑injection techniques can compromise Google’s Agent Payments Protocol (AP2), a system designed to secure LLM‑driven financial transactions. Using a functional shopping agent built on Gemini‑2.5‑Flash and the Google ADK framework, the authors show that adversarial prompts can manipulate product rankings and extract sensitive user data.
Background on Agent‑Mediated Payments
LLM‑based agents are increasingly employed to automate purchases, relying on contextual reasoning to interpret user intents. AP2 seeks to protect these interactions through cryptographically verifiable mandates, aiming to ensure that agents execute only authorized transactions.
Methodology: Red‑Team Prompt Injection
The authors conducted a red‑team evaluation, crafting adversarial inputs that exploit the agent’s prompt‑processing pipeline. Two distinct techniques were introduced: the Branded Whisper Attack, which alters product ranking outcomes, and the Vault Whisper Attack, which targets the retrieval of confidential user information.
Branded Whisper Attack
This attack injects subtle brand‑related cues into the prompt, causing the agent to prioritize specific items in the marketplace. Experiments indicated a consistent shift in ranking, with targeted products appearing at the top of the list in over 82% of trials.
Vault Whisper Attack
By embedding covert queries within seemingly benign prompts, the Vault Whisper Attack coaxed the agent into disclosing stored credentials and payment details. The technique succeeded in extracting sensitive data in 68% of test cases without triggering standard safety checks.
Experimental Findings
Across a series of controlled runs, both attacks demonstrated reliable subversion of AP2 behavior. The study highlights that even straightforward prompt manipulations can bypass existing cryptographic safeguards, underscoring a gap between theoretical security guarantees and practical resilience.
Implications and Recommendations
The results suggest that current LLM‑mediated payment architectures require stronger isolation mechanisms, such as prompt sanitization layers and stricter verification of agent outputs. The authors recommend further research into defensive strategies, including adversarial training and runtime monitoring, to mitigate prompt‑injection risks.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung