Revolutionizing Speaker Diarization: MK-SGC-SC Method Outperforms Existing Systems

Global: New Multi-Kernel Sparse Graph Method Improves Unsupervised Speaker Diarization

Researchers Nikhil Raghav, Avisek Gupta, Swagatam Das, and Md Sahidullah announced a novel unsupervised speaker diarization technique on Jan. 24, 2026, with a revision posted on Jan. 29, 2026. The study, posted on the preprint server arXiv, introduces the MK-SGC-SC approach, which leverages multiple kernel similarities and sparse graph construction to segment audio recordings by speaker without any pre-training or weak supervision.

Method Overview

The proposed framework first extracts speaker embeddings from audio streams and then computes similarity scores using four polynomial kernels alongside a degree-one arccosine kernel. These diverse similarity measures are combined to form a sparse graph that emphasizes local relationships among embeddings, enabling spectral clustering to identify speaker clusters.

Kernel Selection and Graph Construction

According to the authors, the polynomial kernels capture various degrees of non-linearity, while the arccosine kernel provides a complementary angular similarity. By constructing the graph in a principled manner, the method reduces noise from distant embeddings and focuses computational effort on the most relevant connections.

Experimental Evaluation

Experiments reported in the paper demonstrate that MK-SGC-SC outperforms existing unsupervised diarization systems on three benchmark corpora: DIHARD-III, AMI, and VoxConverse. The authors claim state-of-the-art performance across these datasets, highlighting the robustness of the approach in diverse acoustic environments.

Open-Source Release

To facilitate further research, the implementation and associated code have been made publicly available via a URL linked in the arXiv entry. The authors encourage replication and extension of their work by the broader speech-processing community.

Implications for Future Research

The study suggests that combining multiple kernel functions with sparse graph techniques can close the performance gap between supervised and unsupervised speaker diarization. Researchers may explore additional kernel families or integrate the method with downstream tasks such as meeting transcription or speaker-aware language modeling.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

New Multi-Kernel Sparse Graph Method Improves Unsupervised Speaker Diarization

Method Overview

Kernel Selection and Graph Construction

Experimental Evaluation

Open-Source Release

Implications for Future Research

Data and Protocol

Privacy Protocol