Low‑Cost Task‑Level Workflow Generation Framework Reduces Token Use in LLM‑Based Multi‑Agent Systems
Global: Low‑Cost Task‑Level Workflow Generation Framework Reduces Token Use in LLM‑Based Multi‑Agent Systems
A research team has introduced a low‑cost task‑level workflow generation framework called SCALE for large‑language‑model (LLM) powered multi‑agent systems. The work, posted on arXiv in January 2026, aims to reduce token consumption while preserving performance in coordinating multiple agents to solve complex tasks. By replacing exhaustive execution‑based evaluation with self‑prediction and few‑shot calibration, the authors seek to address the high computational expense of existing methods.
Background on Workflow Generation
Multi‑agent systems built on LLMs typically orchestrate several agents through predefined workflows. Prior approaches generate these workflows either at the task level—defining a sequence of agent actions for a whole task—or at the query level—tailoring a workflow for each individual query. The relative costs and benefits of the two strategies have not been systematically compared.
Limitations of Existing Approaches
Task‑level evaluation that relies on exhaustive execution incurs substantial token usage, making large‑scale experimentation expensive. Moreover, the authors note that execution‑based validation can be unreliable, sometimes failing to reflect true task performance due to stochastic model behavior.
Proposed SCALE Framework
The proposed SCALE framework—Self prediction of the optimizer with few‑shot Calibration for Evaluation—foregoes full execution. Instead, it predicts the quality of a workflow using the optimizer itself and applies a lightweight calibration step with a few examples to estimate performance. This design reduces the need for costly token‑intensive runs.
Experimental Findings
Across several benchmark datasets, SCALE achieved competitive results, showing an average performance degradation of only 0.61% relative to the best existing methods. At the same time, overall token consumption dropped by up to 83%, demonstrating a significant efficiency gain.
Implications and Future Work
The results suggest that query‑level workflow generation may be unnecessary in many scenarios, as a small set of top‑K task‑level workflows can cover a comparable range of queries. The authors propose further exploration of self‑evolution techniques and broader validation on real‑world applications.
Conclusion
SCALE offers a promising avenue for reducing computational costs in LLM‑driven multi‑agent coordination without sacrificing accuracy, potentially enabling more scalable deployments of complex AI systems.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung