Dual-Path Gradient Alignment Boosts Speech Deepfake Detection Accuracy
Global: Dual-Path Gradient Alignment Boosts Speech Deepfake Detection Accuracy
Motivation and Challenges
A new training framework that aligns gradients from original and augmented speech inputs has been shown to improve the detection of synthetic audio. The approach reduces the equal error rate by up to 18.69% on the In-the-Wild dataset and shortens the number of epochs required for convergence.
Dual-Path Architecture
The method processes each training utterance through two parallel pathways: one receives the raw audio, while the other receives a version altered by a data‑augmentation technique. By maintaining separate forward passes, the system can directly compare the learning signals generated by each version.
Gradient Alignment Mechanism
During back‑propagation, the framework computes the direction of gradients for both pathways and measures their alignment. When a misalignment is detected, the gradients are adjusted to reduce conflict, ensuring that parameter updates move the model toward a shared objective rather than opposing directions.
Experimental Findings
Analysis of training dynamics revealed that roughly 25% of iterations exhibited gradient conflicts when using the RawBoost augmentation strategy. Implementing the alignment step eliminated most of these conflicts, leading to smoother optimization trajectories.
Performance Gains
Compared with a baseline that applies augmentation without alignment, the aligned dual‑path system achieved faster convergence, requiring fewer training epochs, and delivered a relative reduction of 18.69% in equal error rate on the In-the-Wild benchmark.
Broader Impact
These results suggest that gradient‑conflict mitigation can enhance the robustness of deepfake detection models across varied acoustic conditions. Future research may explore extending the technique to other modalities and augmentation schemes.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung