Researchers Find Vulnerability in Multimodal LLM Decisions with 66% Success Rate

Global: Study Demonstrates Single Perturbation Can Hijack Multimodal LLM Decisions

Researchers from five institutions reported that a single, universal adversarial perturbation can manipulate the outputs of multimodal large language models (MLLMs) across multiple stateless tasks, achieving a 66% success rate against five distinct targets in a single frame. The findings were submitted to arXiv on November 25, 2025 and revised on January 29, 2026.

Background on Multimodal Models

Multimodal LLMs, which process both visual and textual inputs, are increasingly integrated into autonomous systems such as self‑driving vehicles and robotics. Because these applications often rely on stateless inference—where each input is processed independently without retaining session history—any vulnerability that operates on a per‑frame basis could have immediate operational impact.

Introducing Semantic‑Aware Hijacking

The authors define a novel threat vector called Semantic‑Aware Hijacking, wherein an adversary crafts a Semantic‑Aware Universal Perturbation (SAUP) that dynamically interprets input semantics and redirects them toward attacker‑chosen outcomes. The perturbation functions as a “semantic router,” actively influencing the model’s decision pathway.

Methodological Approach

To assess feasibility, the team performed both theoretical and empirical analyses of the latent‑space geometry of target MLLMs. Guided by these insights, they developed the Semantic‑Oriented (SORT) optimization strategy and created a newly annotated dataset containing fine‑grained semantic labels for systematic evaluation.

Experimental Validation

Extensive experiments were conducted on three representative MLLMs, including the Qwen model. Using a single adversarial frame, the SAUP achieved a 66% success rate in steering the model toward five predefined malicious targets, demonstrating that a universal perturbation can simultaneously affect multiple decision points.

Security Implications

The results suggest that existing defenses for MLLMs, which often focus on input‑specific perturbations, may be insufficient against universal, semantics‑aware attacks. Researchers and practitioners are urged to consider robust detection mechanisms that account for latent‑space manipulations.

Future Directions

The authors propose extending the dataset to cover broader semantic categories and investigating mitigation techniques such as adversarial training with semantic diversity. Community feedback is anticipated to shape subsequent revisions and potential standard‑setting efforts.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Study Demonstrates Single Perturbation Can Hijack Multimodal LLM Decisions