Researchers Develop Tailored Backdoor Attack Framework for Prompt-Driven Video Segmentation Models
Global: Researchers Develop Tailored Backdoor Attack Framework for Prompt-Driven Video Segmentation Models
A new study posted to arXiv reveals that a team of researchers has designed a specialized backdoor attack method, named BadVSFM, targeting prompt-driven video segmentation foundation models (VSFMs). The work outlines how the two‑stage approach can embed malicious behavior while preserving normal segmentation performance, and it highlights the limited effectiveness of existing defenses.
Background on Prompt‑Driven Video Segmentation
Video segmentation foundation models such as SAM2 have become integral to high‑stakes domains like autonomous driving and digital pathology. These systems rely on prompts—textual or visual cues—to generate masks that delineate objects across video frames, offering flexibility and scalability for real‑world applications.
Limitations of Existing Backdoor Techniques
Prior attempts to inject backdoors, exemplified by classic BadNet attacks, have shown attack success rates (ASR) below 5% when directly applied to VSFMs. Analysis of encoder gradients and attention maps indicates that conventional training keeps gradients for clean and triggered inputs largely aligned, and attention continues to focus on the true object, preventing the encoder from learning a distinct trigger‑related representation.
Proposed BadVSFM Framework
BadVSFM addresses these shortcomings through a two‑stage strategy. First, the image encoder is guided so that frames containing the trigger map to a designated target embedding while clean frames stay aligned with a reference encoder. Second, the mask decoder is trained to produce a consistent target mask for triggered frame‑prompt pairs across various prompt types, whereas clean outputs remain close to a reference decoder.
Experimental Validation
Extensive experiments on two public datasets and five different VSFMs demonstrate that BadVSFM achieves strong, controllable backdoor effects under diverse triggers and prompts, all while maintaining high segmentation quality on clean data. Ablation studies confirm the robustness of the approach to variations in loss functions, stages, target selections, trigger designs, and poisoning rates.
Implications and Defense Challenges
Gradient‑conflict analysis and attention visualizations show that BadVSFM successfully separates triggered and clean representations and redirects attention toward trigger regions. Notably, four representative defense mechanisms evaluated in the study were largely ineffective, underscoring an underexplored vulnerability in current VSFMs.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung