OmegaUse GUI Agent Achieves State-of-the-Art Scores on Mobile and Desktop Benchmarks
Global: OmegaUse GUI Agent Achieves State-of-the-Art Scores on Mobile and Desktop Benchmarks
Researchers have unveiled OmegaUse, a general‑purpose graphical user interface (GUI) agent designed for autonomous task execution across both mobile and desktop platforms. In offline evaluations, the model attained a 96.3% score on the ScreenSpot‑V2 benchmark and a 79.1% step‑success rate on AndroidControl, indicating strong cross‑environment capabilities.
Data Construction Pipeline
To support the model, the team assembled a data pipeline that combines rigorously curated open‑source datasets with an automated synthesis framework. The framework merges bottom‑up autonomous exploration of interfaces with top‑down taxonomy‑guided generation, producing high‑fidelity synthetic interaction data.
Two‑Stage Training Strategy
The training regimen follows a decoupled two‑stage approach. First, Supervised Fine‑Tuning (SFT) establishes fundamental interaction syntax. Subsequently, Group Relative Policy Optimization (GRPO) refines spatial grounding and sequential planning, enhancing the agent’s decision‑making in complex GUI contexts.
Model Architecture
OmegaUse is built on a Mixture‑of‑Experts (MoE) backbone, a design choice intended to balance computational efficiency with the capacity for sophisticated agentic reasoning.
Cross‑Platform Benchmark Suite
The authors introduced OS‑Nav, a benchmark suite that spans multiple operating systems. OS‑Nav includes ChiM‑Nav, targeting Chinese Android mobile environments, and Ubu‑Nav, focusing on routine desktop interactions on Ubuntu. OmegaUse achieved a 74.24% step‑success rate on ChiM‑Nav and a 55.9% average success rate on Ubu‑Nav.
Performance Relative to Existing Standards
On established GUI benchmarks, OmegaUse set a new state‑of‑the‑art result of 96.3% on ScreenSpot‑V2 and led with a 79.1% step‑success rate on AndroidControl, surpassing previously reported figures for comparable agents.
Implications and Future Directions
The reported results suggest that advanced GUI agents like OmegaUse could streamline human‑computer interaction and boost productivity by autonomously handling routine tasks on diverse devices. The authors note ongoing work to improve real‑time adaptability and to expand evaluation to additional operating systems.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung