Latent Bridge Models Achieve Record-Setting Audio Super-Resolution up to 192 kHz
Global: Latent Bridge Models Achieve Record-Setting Audio Super-Resolution up to 192 kHz
A team of researchers has introduced a new audio super-resolution system that leverages latent bridge models (LBMs) to upscale low‑resolution waveforms to high‑resolution audio. The work, posted on arXiv in September 2025, aims to overcome the sub‑optimal quality of earlier diffusion‑ and bridge‑based methods by exploiting the informative prior contained in the low‑resolution input.
Latent Bridge Models Overview
The proposed architecture first compresses an audio waveform into a continuous latent space. An LBM then performs a latent‑to‑latent generation that mirrors the low‑resolution‑to‑high‑resolution upsampling process, allowing the model to directly inherit structural cues from the source signal.
Frequency‑Aware Training
To address the scarcity of high‑resolution training data, the authors incorporate both the prior and target frequencies as inputs to the model. This frequency‑aware design enables the system to learn an any‑to‑any upsampling mapping during training, improving flexibility across a wide range of sampling rates.
Cascaded Architecture and Prior Augmentation
The study further introduces cascaded LBMs combined with two prior‑augmentation strategies. This configuration represents the first attempt to extend audio upsampling beyond the conventional 48 kHz ceiling, offering a seamless multi‑stage super‑resolution pipeline that can be adapted for post‑production workflows.
Benchmark Evaluation
Comprehensive experiments were conducted on public datasets—including VCTK, ESC‑50, and Song‑Describer—as well as two internal test sets. The evaluations measured both objective metrics and human perceptual judgments.
Performance Highlights
Results indicate that the system achieves state‑of‑the‑art objective and perceptual quality for any‑to‑48 kHz super‑resolution across speech, general audio, and music. Notably, the approach also sets the first record for any‑to‑192 kHz audio super‑resolution, surpassing previously reported limits.
Implications for Audio Production
By delivering high‑fidelity upsampling at unprecedented sampling rates, the latent bridge framework could streamline workflows in music mastering, film sound design, and archival restoration, where preserving fine‑grained acoustic detail is critical.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung