Hierarchical Multi-Agent DRL Enhances UAV Mobility Management in Multi-Connectivity Networks

Global: Hierarchical Multi-Agent DRL Enhances UAV Mobility Management in Multi-Connectivity Networks

Researchers from an international team have introduced a mobility management scheme for unmanned aerial vehicles (UAVs) operating within multi‑connectivity wireless networks. The approach, detailed in a December 2024 arXiv preprint (arXiv:2412.16167v2), combines dynamic cluster reconfiguration with energy‑efficient power allocation to satisfy strict reliability requirements while curbing overall power use and limiting the frequency of cluster changes.

Background on Multi‑Connectivity and UAV Challenges

Multi‑connectivity enables distributed access points (APs) to form dynamic clusters and coordinate resource distribution, a configuration that places heightened demands on mobility management for aerial users. UAVs, which frequently traverse heterogeneous coverage areas, must maintain seamless connections despite varying interference levels and limited energy reserves.

Hierarchical Multi‑Agent Deep Reinforcement Learning Framework

The proposed solution employs a hierarchical multi‑agent deep reinforcement learning (H‑MADRL) architecture. A high‑level agent, hosted in an edge cloud linked to the APs via low‑latency optical back‑haul, determines optimal clustering policies. Concurrently, low‑level agents embedded in individual APs manage power‑allocation decisions based on the clustering guidance received.

Action‑Observation Transition‑Driven Learning Algorithm

To accelerate learning, the authors introduce an action‑observation transition‑driven algorithm that integrates the high‑level agent’s action space into the low‑level agents’ observation vectors. This shared information enables AP‑level agents to align power allocation more closely with the overarching clustering strategy, improving overall efficiency.

Simulation Results and Performance Comparison

Simulation experiments indicate that the distributed H‑MADRL algorithm attains performance comparable to a centralized benchmark in terms of reliability and power consumption. Notably, the decentralized method achieves these results while preserving the scalability advantages of a distributed system.

Scalability Advantages

When the number of APs is doubled, the decision‑time increase for the proposed scheme is approximately 10 %, in contrast to a 90 % rise observed for the centralized approach. This modest overhead suggests the framework can accommodate expanding network topologies without prohibitive latency penalties.

Implications and Future Directions

The study demonstrates that hierarchical reinforcement learning can effectively address the intertwined challenges of clustering and power allocation in UAV‑centric multi‑connectivity environments. Future research may explore real‑world deployments, integration with additional network layers, and extensions to other mobile platforms.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.