New Preprint Offers High‑Probability Convergence Guarantees for Randomized Subspace SGD Under Heavy‑Tailed Noise
Global: New Study Analyzes Randomized Subspace Normalized SGD Under Heavy‑Tailed Noise
Researchers Gaku Omiya, Pierre‑Louis Poirion, and Akiko Takeda submitted a preprint to arXiv on January 28, 2026, presenting a convergence analysis of randomized subspace stochastic gradient descent (RS‑SGD) and a novel variant called randomized subspace normalized SGD (RS‑NSGD) that operates under heavy‑tailed noise conditions. The work aims to fill a gap in the literature by providing high‑probability convergence bounds where most prior results rely on expectations.
Background
Randomized subspace methods are valued for reducing the computational cost per iteration in large‑scale optimization, yet their theoretical guarantees in nonconvex settings have largely been limited to expected‑value analyses. Existing high‑probability bounds are scarce, especially when the stochastic gradients exhibit sub‑Gaussian characteristics.
New Theoretical Guarantees
The authors first demonstrate that RS‑SGD attains a high‑probability convergence bound under sub‑Gaussian noise, matching the oracle complexity order of earlier expectation‑based results. This establishes that the algorithm remains reliable even when probabilistic guarantees are required.
Introducing RS‑NSGD
Motivated by the prevalence of heavy‑tailed gradient distributions in modern machine learning, the paper proposes RS‑NSGD, which incorporates direction normalization into subspace updates. Assuming the noise possesses bounded p‑th moments, the authors derive both in‑expectation and high‑probability convergence guarantees for the new method.
Performance Insights
Analysis indicates that RS‑NSGD can achieve superior oracle complexity compared with full‑dimensional normalized SGD, suggesting that the subspace approach combined with normalization offers computational advantages without sacrificing convergence speed.
Implications for Machine Learning
If the theoretical improvements translate to practice, RS‑NSGD could enable more efficient training of deep models that encounter heavy‑tailed gradient noise, a scenario common in large‑scale and nonconvex optimization tasks.
Limitations and Next Steps
The abstract focuses on mathematical guarantees and does not discuss empirical evaluations or implementation details. Future research may explore real‑world benchmarks, extend the analysis to other noise distributions, and assess the algorithm’s robustness in diverse learning environments.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung