Curiosity-Driven Knowledge Retrieval Boosts Mobile Agent Performance on Android Tasks
Global: Curiosity-Driven Knowledge Retrieval Boosts Mobile Agent Performance on Android Tasks
A newly proposed framework aims to enhance smartphone automation by addressing knowledge gaps in mobile agents. Developed by a research team and posted to arXiv in January 2026, the system introduces a curiosity score that triggers external information retrieval when uncertainty exceeds a set threshold. The approach targets complex Android applications where existing agents often struggle with incomplete knowledge and limited generalization.
Framework Overview
The core of the framework is a curiosity-driven mechanism that quantifies execution uncertainty. When the calculated curiosity score surpasses a predefined limit, the agent automatically queries documentation, code repositories, and prior execution trajectories to obtain supplemental data.
AppCard Structure
Retrieved information is organized into structured entities called AppCards. Each AppCard encodes functional semantics, parameter conventions, interface mappings, and interaction patterns for a specific application, providing a concise knowledge package that the agent can reference during task execution.
Integration into Agent Reasoning
During runtime, the enhanced agent selectively incorporates relevant AppCards into its reasoning pipeline, effectively filling knowledge blind spots. This selective integration allows the agent to adapt its planning and decision-making processes based on newly acquired context.
Performance Evaluation
Experiments on the AndroidWorld benchmark demonstrate consistent improvements across multiple backbone models. The framework yields an average gain of six percentage points and achieves a new state‑of‑the‑art success rate of 88.8% when combined with GPT‑5. Results indicate that the benefits are especially pronounced for multi‑step and cross‑application tasks.
Implications and Future Work
Analyses suggest that AppCards reduce ambiguity, shorten exploration phases, and support more stable execution trajectories. Case studies confirm these advantages, and the authors have made task trajectories publicly available at https://lisalsj.github.io/Droidrun-appcard/. Ongoing research will explore scalability to broader application domains and integration with alternative large‑language models.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung