Graph-Based Exploration Solves Majority of ARC-AGI-3 Levels Without Training

Global: Graph-Based Exploration Solves Majority of ARC-AGI-3 Levels Without Training

A training-free, graph-oriented algorithm has achieved a median success rate of 30 out of 52 levels on the ARC-AGI-3 Preview Challenge, placing third on the private leaderboard, according to a paper posted on arXiv in December 2025. The approach combines visual frame analysis with systematic state-space traversal, enabling agents to infer game mechanics through limited interactions without relying on large language models.

Benchmark Overview

The ARC-AGI-3 benchmark comprises a series of game‑like tasks that require agents to hypothesize, test, and adapt to increasingly complex mechanics across multiple levels. Success depends on the ability to form and refine hypotheses about hidden rules, a capability that recent state‑of‑the‑art large language models have struggled to demonstrate reliably.

Methodology

The reported method processes each visual frame to segment meaningful components, then constructs a directed graph that records explored states and the actions leading to them. By tracking visited nodes and untested state‑action pairs, the system prioritizes actions that are most likely to reduce uncertainty, effectively guiding the agent toward the shortest path for uncovering new mechanics.

Performance Results

On the ARC-AGI-3 Preview Challenge, the graph‑based strategy solved a median of 30 out of 52 levels across six distinct games. This performance substantially exceeds that of leading language‑model‑based agents, which have been documented as unable to consistently solve the benchmark’s tasks.

Comparison with Language Models

Researchers note that while large language models excel in textual reasoning, they often fail to capture the dynamic, sparse‑feedback environments presented by ARC-AGI-3. The graph‑structured exploration, by contrast, operates without learning, relying instead on explicit state tracking and action prioritization to navigate the task space.

Implications for Interactive Reasoning

The findings suggest that explicit, systematic exploration can serve as a strong baseline for interactive reasoning problems, highlighting the importance of state management in environments where feedback is limited. The results may inform future hybrid approaches that combine graph‑based planning with learned components.

Open‑Source Release

The authors have made the implementation publicly available on GitHub (https://github.com/dolphin-in-a-coma/arc-agi-3-just-explore), allowing the research community to reproduce and extend the work.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.