First Systematic Review Maps Prompt Injection Defenses for Generative AI
Global: Systematic Review of Prompt Injection Mitigations in Generative AI
Researchers have conducted the first systematic literature review of prompt injection mitigation strategies for large language models, analyzing 88 studies to map defenses and assess effectiveness. The review, posted on arXiv in January 2026, builds on the National Institute of Standards and Technology (NIST) report on adversarial machine learning and seeks to standardize terminology across the field.
Scope and Methodology
The authors surveyed peer‑reviewed papers, conference proceedings, and preprints, extending beyond the works cited in the NIST report. By applying NIST’s taxonomy as a baseline, they identified additional defense categories and compiled a catalog that notes quantitative results, open‑source availability, and model‑agnostic applicability.
Extended Taxonomy
Based on the analysis, the study proposes new taxonomy nodes that capture emerging techniques such as context‑aware filtering, reinforcement‑learning‑based safeguards, and sandboxed execution environments. These additions aim to align future research with a common framework.
Effectiveness Findings
Across the examined literature, reported mitigation effectiveness varies widely, with some defenses achieving over 90% reduction in successful jailbreak attempts on specific LLMs, while others show modest improvements. The catalog records these metrics alongside the datasets used for evaluation.
Open‑Source and Model‑Agnostic Solutions
The review highlights that a growing number of defenses are released under permissive licenses and are designed to operate independently of any particular model architecture, facilitating broader adoption in production systems.
Implications for Researchers and Practitioners
By providing a structured overview and standardized terminology, the authors intend the work to serve as a reference point for future adversarial‑machine‑learning studies and to guide developers in selecting and implementing robust prompt‑injection defenses.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung