Implicit Tool Poisoning Framework: A High Success Rate in LLM Agent Attacks

Global: Implicit Tool Poisoning Framework Demonstrates High Success Rate in LLM Agent Attacks

Researchers from an unnamed institution have introduced MCP-ITP, an automated framework designed to embed malicious instructions in tool metadata without invoking the compromised tool, thereby coercing large language model (LLM) agents to misuse legitimate high‑privilege tools. The study, posted on arXiv in January 2026, aims to highlight a previously underexplored vulnerability in the Model Context Protocol (MCP) ecosystem and to evaluate the effectiveness of such attacks against a range of LLM agents.

Background on Model Context Protocol and Tool Poisoning

The Model Context Protocol standardizes how LLM‑based agents interact with external tools, streamlining context registration and execution. Prior investigations have primarily examined explicit tool poisoning, where malicious code is directly executed, or have relied on manually crafted poisoned tools to assess risk.

Defining Implicit Tool Poisoning

Implicit tool poisoning differs by leaving the poisoned tool dormant; instead, malicious directives embedded in its metadata manipulate the agent to call a separate, legitimate high‑privilege tool for harmful purposes. This indirect approach reduces the likelihood of detection during routine tool audits.

Framework Architecture and Optimization Strategy

MCP-ITP treats the generation of poisoned tools as a black‑box optimization problem. It iteratively refines tool metadata using feedback from two LLMs: an evaluation model that assesses attack efficacy and a detection model that estimates the likelihood of being flagged. The objective is to maximize the Attack Success Rate (ASR) while minimizing the Malicious Tool Detection Rate (MDR).

Experimental Setup and Dataset

The authors evaluated MCP-ITP on the MCPTox dataset, which comprises 12 distinct LLM agents and a variety of tool configurations. Each agent was subjected to both the automated MCP-ITP generated tools and a baseline set of manually crafted poisoned tools.

Key Findings

Across the test suite, MCP-ITP achieved an ASR of up to 84.2%, outperforming the manual baseline. Simultaneously, the framework suppressed MDR to as low as 0.3%, indicating a high degree of stealth against existing detection mechanisms.

Implications for LLM Agent Security

The results suggest that implicit tool poisoning can substantially increase the attack surface of MCP‑enabled systems, potentially enabling adversaries to exploit high‑privilege functionalities without triggering conventional safeguards.

Future Directions and Mitigation Strategies

The authors recommend further research into robust detection heuristics that examine tool metadata semantics, as well as the development of verification protocols during the MCP registration phase to reduce the risk of covert instruction injection.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Implicit Tool Poisoning Framework Achieves 84.2% Success Against LLM Agents