Mastermind Framework Boosts Multi-Turn LLM Jailbreak Success

Global: New Framework ‘Mastermind’ Boosts Multi-Turn LLM Jailbreak Success

Researchers have unveiled a framework called Mastermind that seeks to improve multi‑turn jailbreak attacks against large language models, according to a preprint posted on arXiv on Jan. 26, 2026. The system is designed to overcome limitations of earlier attacks by employing a closed loop of planning, execution, and reflection.

Background

Prior jailbreak attempts often lose coherence over extended conversations and rely on rigid, pre‑defined patterns that cannot adapt to the dynamic responses of the model.

Mastermind Architecture

Mastermind uses a hierarchical planning structure that separates high‑level attack objectives from low‑level tactical actions, enabling sustained focus throughout a dialogue. A knowledge repository automatically discovers and refines effective attack patterns by reflecting on previous interactions.

Experimental Evaluation

The authors tested the framework against several state‑of‑the‑art models, including GPT‑5 and Claude 3.7 Sonnet. Results indicated substantially higher attack success rates and harmfulness ratings compared with existing baselines.

Resilience to Defenses

According to the study, Mastermind also demonstrated notable resilience against multiple advanced defense mechanisms evaluated during the experiments.

Implications for LLM Security

The findings highlight ongoing challenges in protecting large language models from adversarial prompting and may inform future defensive research.

Publication Status

The work is currently available as an arXiv preprint and has not yet undergone peer review.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

New Framework ‘Mastermind’ Boosts Multi-Turn LLM Jailbreak Success