Study Examines Memory Poisoning Threats and Defenses for LLM Agents in Healthcare
Global: Study Examines Memory Poisoning Threats and Defenses for LLM Agents in Healthcare
A team of researchers released a preprint in January 2026 that investigates how large language model (LLM) agents equipped with persistent memory can be compromised through memory‑poisoning attacks. The work focuses on agents that interact with electronic health record (EHR) systems, evaluates the robustness of such attacks under realistic conditions, and proposes two defensive mechanisms aimed at mitigating the identified risks.
Attack Methodology and Findings
The authors describe a memory injection attack (MINJA) that leverages query‑only interactions to embed malicious instructions into an agent’s long‑term memory. In controlled experiments, the attack achieved more than 95 % injection success and a 70 % overall attack success rate when the agent’s memory was initially empty.
Evaluating Attack Robustness
To assess real‑world viability, the study varied three dimensions: the initial memory state, the number of indication prompts, and retrieval parameters. Experiments conducted on GPT‑4o‑mini, Gemini‑2.0‑Flash, and Llama‑3.1‑8B‑Instruct models using the MIMIC‑III clinical dataset demonstrated that pre‑existing legitimate memories substantially lowered both injection and attack success rates, suggesting that naive attack assumptions may overstate the threat.
Proposed Defense Strategies
The paper introduces two novel defenses. The first, Input/Output Moderation, aggregates trust scores from multiple orthogonal signals to filter potentially harmful content. The second, Memory Sanitization, applies a trust‑aware retrieval process that incorporates temporal decay and pattern‑based filtering to prune suspect memory entries.
Effectiveness of Memory Sanitization
Defensive evaluation revealed that calibrating trust thresholds is critical: overly aggressive thresholds blocked legitimate entries, while lax thresholds allowed subtle attacks to persist. The authors report that a balanced configuration reduced successful injections by approximately 60 % without materially impairing the agent’s functionality.
Implications for Deployment
Findings highlight the importance of incorporating adaptive trust mechanisms when deploying memory‑augmented LLM agents in sensitive domains such as healthcare. The results suggest that existing memory structures can provide a degree of resilience, but dedicated defenses are necessary to address sophisticated poisoning attempts.
Future Research Directions
The authors call for broader evaluation across diverse clinical workflows, exploration of automated trust‑score generation, and longitudinal studies to monitor defense performance over time. Their work establishes baseline metrics that future studies can use to benchmark both attacks and mitigations.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung