Soft Prompt Attacks: A Growing Concern for Large Language Models

Global: Study Finds Soft Prompt Attacks Can Extract 65.2% of LLM Data, Defense Reduces Rate to 1.6%

Researchers Zhuochen Yang, Kar Wai Fok, and Vrizlynn L. L. Thing submitted a paper to arXiv on October 13 2025, later revised on January 22 2026, presenting CoSPED—a framework for assessing and mitigating data‑extraction risks posed by soft‑prompt attacks on large language models (LLMs). The authors report that their approach can achieve an extraction success rate of 65.2% when comparing a 50‑token prefix, while a proposed defense lowers the rate to 1.6%.

Methodology

CoSPED combines several novel components: Dynamic Loss, Additive Loss, Common Loss, and a Self‑Consistency Decoding Strategy. These elements are designed to improve the stability and repeatability of soft‑prompt tuning, allowing the attacker to more reliably coax private training data from the target model.

Extraction Performance

Extensive experiments across multiple loss‑function configurations demonstrated that the integrated system reaches a 65.2% extraction rate under the defined 50‑token prefix comparison metric. The authors state that this performance surpasses previously reported benchmarks for similar soft‑prompt‑based attacks.

Cross‑Model Evaluation

To test generality, the team applied CoSPED to the Pythia family of LLMs, observing a 51.7% extraction rate. This cross‑model analysis suggests that the vulnerability is not limited to a single architecture or training regime.

Defense Strategy

The paper explores a mitigation technique called Rank‑One Model Editing, which modifies model weights to disrupt the extraction pathway. When applied, the extraction success drops dramatically to 1.6%, indicating that targeted model edits can effectively neutralize the soft‑prompt attack vector.

Implications and Future Work

The findings highlight a concrete privacy risk associated with soft‑prompt tuning and provide a proof‑of‑concept defense. The authors recommend further investigation into scalable defenses and broader assessments across diverse model families to better understand the trade‑offs between model utility and privacy protection.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Study Finds Soft Prompt Attacks Can Extract 65.2% of LLM Data, Defense Reduces Rate to 1.6%

Methodology

Extraction Performance

Cross‑Model Evaluation

Defense Strategy

Implications and Future Work

Data and Protocol

Privacy Protocol