Revolutionizing Symbolic World Model Generation with Agent2World Framework

Global: Agent2World Framework Boosts Symbolic World Model Generation for LLM Planning

Background

Researchers from an international team announced on arXiv in December 2025 a new multi‑agent system called Agent2World, designed to improve large language models’ ability to generate executable symbolic world models such as Planning Domain Definition Language (PDDL) domains. The work addresses the scarcity of large‑scale, verifiable supervision for model‑based planning and seeks to reduce errors that static validation methods often miss. The paper outlines how the framework integrates interactive feedback to produce more reliable world‑model outputs.

Agent2World Framework

The system follows a three‑stage pipeline. First, a Deep Researcher agent conducts web‑based knowledge synthesis to fill specification gaps identified in the planning task. Second, a Model Developer agent translates the synthesized knowledge into executable world models. Finally, a specialized Testing Team conducts adaptive unit testing and simulation‑based validation, providing behavior‑aware feedback to the Model Developer.

Addressing Validation Gaps

Traditional approaches rely on static checks that cannot capture dynamic execution failures. By embedding a testing component that interacts with the generated models, Agent2World detects behavior‑level errors during both inference and training phases. This interactive validation is intended to ensure that generated models not only conform to syntactic specifications but also behave correctly when executed.

Performance Evaluation

The authors evaluated the framework on three benchmarks covering both PDDL specifications and executable code representations. Across all benchmarks, Agent2World achieved state‑of‑the‑art inference‑time results, surpassing prior methods in accuracy and consistency. The paper reports that the framework consistently outperformed existing baselines, establishing a new performance ceiling for symbolic world‑model generation.

Training Enhancements

Beyond inference, the Testing Team serves as an environment for generating multi‑turn training trajectories. Models fine‑tuned on these trajectories exhibited a substantial improvement, with an average relative gain of 30.95% compared to the same model before training. This gain highlights the value of behavior‑aware, adaptive feedback in supervised fine‑tuning.

Future Directions

The project page (https://agent2world.github.io) provides additional resources, including code, datasets, and detailed experimental results. The authors suggest that the multi‑agent architecture could be extended to other domains requiring executable model generation, potentially influencing broader research in AI‑driven planning and simulation.This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Agent2World Framework Boosts Symbolic World Model Generation for LLM Planning