New Algorithms Introduce Risk Aversion to Online POMDP Planning

Global: New Algorithms Introduce Risk Aversion to Online POMDP Planning

Researchers Yaacov Pariente and Vadim Indelman submitted a preprint to arXiv on January 28, 2026, describing novel online planning methods for partially observable Markov decision processes (POMDPs) that incorporate the Iterated Conditional Value‑at‑Risk (ICVaR) measure. The work aims to reduce tail risk in decision‑making under uncertainty while preserving finite‑time performance guarantees.

Background and Motivation

Traditional online POMDP planners optimize the expected return, which can overlook low‑probability, high‑impact outcomes. By applying a dynamic risk measure such as ICVaR, the authors target scenarios where avoiding adverse tail events is critical, for example in autonomous navigation or robotic manipulation.

Iterated CVaR Framework

The ICVaR formulation introduces a risk parameter α. When α = 1, the objective collapses to the standard expectation‑based criterion; values of α < 1 increase risk aversion. The paper presents a policy evaluation algorithm for ICVaR that delivers finite‑time guarantees independent of the action‑space size.

Algorithmic Extensions

Building on the evaluation routine, the authors extend three widely used online planners—Sparse Sampling, Particle Filter Trees with Double Progressive Widening (PFT‑DPW), and Partially Observable Monte Carlo Planning with Observation Widening (POMCPOW)—to directly optimize the ICVaR value function instead of expected return.

Performance Guarantees

For the ICVaR‑based Sparse Sampling algorithm, the authors prove finite‑time performance bounds under the risk‑sensitive objective. These bounds enable a novel exploration strategy specifically tailored to the ICVaR criterion.

Experimental Evaluation

Empirical tests on standard benchmark POMDP domains show that the risk‑averse planners achieve lower tail risk compared with their risk‑neutral counterparts, while maintaining comparable overall performance.

Implications and Future Work

The study suggests that integrating dynamic risk measures into online POMDP planning can provide more robust decision‑making in safety‑critical applications. The authors indicate plans to explore scalability to larger state spaces and to assess the impact of different risk‑parameter settings.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.