Pretrained Attention Strategy Shows Strong Gains in Extreme Multi‑Label Classification
Global: New Pretraining Technique Boosts Multi‑Label Text Classification
A new technique called PLANT (Pretrained and Leveraged Attention) has been introduced to improve attention mechanisms in extreme multi‑label text classification models. The method was presented by Debjyoti Saha Roy, Byron C. Wallace, and Javed A. Aslam in a paper submitted to arXiv on October 30, 2024 and revised through December 26, 2025.
Method Overview
PLANT operates by planting label‑specific attention patterns using a pretrained learning‑to‑rank model guided by mutual information gain. This initialization is architecture‑agnostic, allowing seamless integration with large language model backbones such as Mistral‑7B, LLaMA3‑8B, DeepSeek‑V3, and Phi‑3.
The approach is designed as a plug‑and‑play module that can be attached to existing extreme multi‑label classifiers without altering their core architecture, thereby simplifying adoption across diverse systems.
Experimental Evaluation
Empirical results reported in the paper indicate that PLANT consistently outperforms existing state‑of‑the‑art methods across several benchmark tasks, including International Classification of Diseases (ICD) coding, legal topic classification, and content recommendation.
The performance gains are most pronounced in few‑shot settings, where the method delivers substantial improvements on rare labels that are typically under‑represented in training data.
Ablation studies included in the work confirm that the initialization of attention weights is a primary driver of the observed improvements, underscoring the importance of effective attention seeding.
Availability and Impact
The authors have made the code and trained models publicly available, facilitating replication and further exploration by the research community.
The paper is classified under the Computation and Language (cs.CL) and Machine Learning (cs.LG) subjects on arXiv and carries the identifier arXiv:2410.23066.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung