MeanCache Boosts Flow Matching Inference Speed Across Major Generative Models
Global: MeanCache Boosts Flow Matching Inference Speed Across Major Generative Models
Researchers introduced MeanCache, a training‑free caching framework designed to accelerate Flow Matching inference in large‑scale generative models. Detailed in a paper posted to arXiv on January 2026, the approach targets redundant computation by shifting from instantaneous to average‑velocity calculations, thereby reducing error accumulation during high‑acceleration generation.
Method Overview
MeanCache leverages cached Jacobian‑vector products (JVPs) to construct interval‑average velocities, contrasting with traditional feature‑caching methods that rely solely on instantaneous velocity information. This average‑velocity perspective mitigates local deviations that can degrade output quality.
Average‑Velocity Approach
By aggregating instantaneous velocities into an average over each inference interval, MeanCache creates a smoother trajectory for the generative process. The cached JVPs are reused across steps, minimizing the need for repeated expensive computations.
Trajectory‑Stability Scheduling
The framework incorporates a trajectory‑stability scheduling strategy that employs a Peak‑Suppressed Shortest Path algorithm under predefined budget constraints. This scheduler determines optimal cache timing, enhancing both reuse stability and overall efficiency.
Experimental Validation
Benchmarks on three commercial‑scale models—FLUX.1, Qwen‑Image, and HunyuanVideo—demonstrated consistent performance improvements. MeanCache achieved acceleration factors of 4.12× for FLUX.1, 4.56× for Qwen‑Image, and 3.59× for HunyuanVideo, while also surpassing state‑of‑the‑art caching baselines in generation quality.
Performance Gains
The reported speedups translate to reduced inference latency and lower computational costs, offering practical benefits for deployments that require real‑time or high‑throughput generative capabilities.
Implications and Future Work
Authors suggest that the stability‑driven acceleration concept could inspire further research into caching mechanisms for other diffusion‑based or flow‑matching architectures, potentially extending to broader AI applications.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung