New Benchmark Reveals Over-Generation Attacks on Large Language Models

Global: New Benchmark Reveals Prompt-Based Over-Generation Attacks on LLMs

Researchers introduced a benchmark that evaluates black‑box, query‑only prompt attacks capable of forcing large language models (LLMs) to generate far beyond their normal end‑of‑sequence token. The study, posted on arXiv in December 2025, examines two distinct attackers—Evolutionary Over‑Generation Prompt Search (EOGen) and a goal‑conditioned reinforcement‑learning method (RL‑GOAL)—and quantifies their impact using a novel Over‑Generation Factor (OGF). Findings show that such attacks can degrade answer quality, increase latency and cost, and potentially be weaponized as denial‑of‑service vectors.

Benchmark Design and Metrics

The benchmark operates under a strict black‑box assumption: attackers have query‑only access to the target model and a known tokenizer, but no internal model parameters. Over‑Generation Factor is defined as the ratio of tokens produced to the model’s context window, providing a normalized measure of attack severity. Additional summaries capture stall behavior and latency spikes.

Evolutionary Over‑Generation Prompt Search (EOGen)

EOGen employs a genetic algorithm to explore the token space for prefixes that suppress the EOS token. In experiments on the Phi‑3 model, the evolutionary attacker achieved a mean OGF of 1.38 ± 1.15 and a Success@OGF ≥ 2 rate of 24.5 percent, indicating that roughly one in four attempts generated at least twice the context window before terminating.

Goal‑Conditioned Reinforcement Learning Attacker (RL‑GOAL)

RL‑GOAL trains a neural network to produce prefixes conditioned on a target length, effectively learning how to steer the model toward longer continuations. Across multiple victim models, RL‑GOAL attained higher mean OGF values, reaching up to 2.81 ± 1.38, thereby demonstrating a stronger capacity to induce over‑generation.

Experimental Results

Both attackers were evaluated on publicly available LLMs with context windows ranging from 4,096 to 8,192 tokens. The RL‑GOAL approach consistently outperformed EOGen in terms of OGF and success rates, while also exhibiting lower variance in generated token counts. Latency measurements indicated that over‑generation can increase response times by several seconds, depending on the model size and hardware.

Implications for LLM Security

The benchmark highlights a practical avenue for denial‑of‑service attacks that does not require model internals, relying solely on crafted prompts. Service providers may need to implement detection mechanisms for unusually long token sequences or enforce stricter token‑limit policies to mitigate potential abuse.

Future Directions

The authors suggest extending the benchmark to include multimodal models, exploring defensive prompting strategies, and investigating the trade‑off between model openness and robustness. Ongoing research will likely focus on automated mitigation techniques that can identify and truncate maliciously long generations in real time.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via arXiv.

New Benchmark Reveals Prompt-Based Over-Generation Attacks on LLMs