Learning POMDP Parameters with Hidden States: A New Spectral-Tensor Approach

Global: New Spectral‑Tensor Approach Enables Learning of POMDP Parameters with Hidden States

A new study released on arXiv proposes a method for autonomously learning the parameters of discrete Partially Observable Markov Decision Processes (POMDPs) that contain hidden states. The authors describe a technique that combines spectral learning of Predictive State Representations (PSRs) with tensor decomposition to estimate both transition and observation likelihoods from sequences of actions and observations. The work aims to support agents that must reason about systems such as furniture with concealed locking mechanisms.

Problem Context

POMDPs are widely used to model decision‑making problems where the true system state is not directly observable. Traditional learning approaches either assume full observability or focus solely on estimating the number of hidden states without providing explicit transition or observation probabilities, limiting their usefulness for downstream planning.

Limitations of Existing Spectral Methods

Spectral techniques that learn PSRs can directly infer the dimensionality of the hidden state space, yet they do not produce direct estimates of the underlying transition and observation matrices. Other tensor‑based methods that do yield such estimates typically require full‑rank transition matrices for every action and assume that the state space is fully observable, assumptions that rarely hold in practical scenarios.

Proposed Learning Framework

The presented method relaxes these assumptions by learning observation and transition matrices up to a similarity transform, which can subsequently be resolved using tensor methods. Specifically, the approach partitions the hidden states so that states within a single partition share identical observation distributions for actions whose transition matrices are full‑rank. Within each partition, the algorithm recovers the corresponding observation and transition matrices, enabling a more granular model of the environment.

Experimental Evaluation

Empirical tests on synthetic POMDP environments indicate that, given sufficient data, the partition‑level models produced by the new technique achieve performance comparable to that of traditional PSR models when integrated with standard sampling‑based POMDP solvers. The experiments demonstrate that the learned likelihoods are accurate enough to support effective planning.

Practical Implications

By providing explicit estimates of observation and transition probabilities, the method allows planners to tailor behavior after model acquisition, a capability that was previously unavailable with pure PSR learning. This opens the possibility of more precise policy synthesis for autonomous agents operating in partially observable domains.

Outlook

The authors suggest that future work will explore scaling the approach to larger action spaces and investigating robustness to limited or noisy data. Extending the framework to continuous‑state POMDPs could further broaden its applicability across robotics and AI systems.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

New Spectral‑Tensor Approach Enables Learning of POMDP Parameters with Hidden States