Trust-Based Delegated Consensus with FHE and Multi-Agent Reinforcement Learning Evaluated on Blockchain IoT Security
Global: Trust-Based Delegated Consensus with FHE and Multi-Agent Reinforcement Learning Evaluated on Blockchain IoT Security
A team of researchers has introduced a trust‑based delegated consensus framework that combines fully homomorphic encryption (FHE) with attribute‑based access control (ABAC) to protect blockchain‑enabled Internet of Things (IoT) networks. The approach, described in a recent arXiv preprint (arXiv:2512.22860), was tested on a simulated 16‑node IoT environment to assess its ability to detect and mitigate a range of adversarial attacks. The study evaluates three reinforcement‑learning strategies—tabular Q‑learning (RL), deep reinforcement learning with dueling double DQN (DRL), and multi‑agent reinforcement learning (MARL)—against five defined attack families. Findings are presented as F1‑score metrics for each method under each attack scenario.
Framework Overview
The proposed framework delegates consensus responsibilities to trusted nodes while preserving data confidentiality through FHE, allowing encrypted policy evaluation without exposing raw attributes. ABAC policies govern access rights, enabling fine‑grained control over which IoT devices may participate in block validation. By integrating these cryptographic primitives, the system aims to prevent unauthorized manipulation of the trust ledger.
Reinforcement‑Learning Strategies
Three learning agents were trained to recognize malicious behavior. The tabular Q‑learning agent operates on a discrete state‑action table, the deep RL agent employs a dueling double DQN architecture to approximate value functions, and the MARL setup coordinates multiple agents that share observations and rewards. All agents receive the same encrypted policy outcomes as input features.
Attack Families and Performance
Performance was measured against five attack families: Naive Malicious Attack (NMA), Collusive Rumor Attack (CRA), Adaptive Adversarial Attack (AAA), Byzantine Fault Injection (BFI), and Time‑Delayed Poisoning (TDP). MARL achieved the highest F1‑score for collusive attacks (0.85) compared with DRL (0.68) and RL (0.50). Both DRL and MARL reached perfect detection (F1 = 1.00) for adaptive attacks, where RL lagged with an F1 of 0.50. All three agents recorded an F1 of 1.00 against Byzantine faults.
Advantages of Multi‑Agent Learning
The superior results of the multi‑agent configuration suggest that coordinated learning can better capture the complex patterns generated by collusive adversaries. By sharing state information, MARL agents appear to construct a more robust representation of trust dynamics, leading to earlier and more accurate identification of coordinated rumor propagation.
Challenges Posed by Time‑Delayed Poisoning
Time‑Delayed Poisoning proved to be the most damaging scenario for every approach. After a sleeper phase, F1‑scores fell to a range of 0.11‑0.16 across all agents, indicating that the delayed activation of malicious payloads can severely undermine detection mechanisms that rely on immediate trust signals.
Conclusions and Future Directions
The authors conclude that integrating FHE‑protected ABAC with multi‑agent reinforcement learning offers measurable security benefits for blockchain‑based IoT systems, yet they acknowledge the need for further research on mitigating delayed poisoning attacks and scaling the framework to larger networks.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung