Reasoning-Enhanced Language Models Show Greater Robustness on Theory of Mind Tasks

Global: Reasoning-Enhanced Language Models Show Greater Robustness on Theory of Mind Tasks

Three researchers—Ian B. de Haan, Peter van der Putten, and Max van Duijn—submitted a study to arXiv on January 23, 2026, investigating how reasoning‑oriented large language models (LLMs) perform on Theory of Mind (ToM) assessments. The paper, titled *Reasoning Promotes Robustness in Theory of Mind Tasks*, aims to determine whether observed improvements stem from genuine social‑cognitive reasoning or from enhanced problem‑solving stability.

Background on Theory of Mind in LLMs

Recent benchmarks have demonstrated that LLMs can achieve high scores on ToM tests, prompting scholarly debate about whether these models truly understand mental states or merely exploit statistical patterns. Critics have warned that without careful analysis, performance metrics may overstate genuine cognitive capability.

Reasoning‑Oriented Training Approaches

The authors focus on models trained with reinforcement learning from verifiable rewards (RLVR), a paradigm that explicitly rewards step‑by‑step reasoning processes. Prior work has shown that RLVR can boost accuracy across diverse tasks, but its impact on social‑cognitive evaluations has remained unclear.

Experimental Design

To probe model behavior, the study adapts classic machine‑psychology experiments and incorporates established ToM benchmarks. The authors introduce systematic prompt variations and task perturbations to assess whether performance remains stable under altered conditions.

Key Findings

Results indicate that RLVR‑enhanced models consistently maintain higher accuracy when prompts are rephrased or when task parameters are modified. The authors report that robustness improvements are statistically significant across all tested scenarios, suggesting that the models are less sensitive to superficial wording changes.

Interpretation of Results

According to the paper, the observed gains are more plausibly attributed to the models’ increased ability to locate correct solutions rather than to the emergence of fundamentally new ToM reasoning mechanisms. The authors argue that robustness, while valuable, does not necessarily equate to deeper social‑cognitive understanding.

Implications for Future Evaluation

The study recommends that future assessments of LLMs’ social cognition incorporate robustness checks to differentiate between genuine reasoning and mere stability. It also suggests that researchers should be cautious when interpreting high benchmark scores as evidence of Theory of Mind capabilities.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.