NeoChainDaily
NeoChainDaily
Uplink
Initialising Data Stream...
29.12.2025 • 16:49 Research & Innovation

Pretrained Attention Strategy Shows Strong Gains in Extreme Multi‑Label Classification

Global: New Pretraining Technique Boosts Multi‑Label Text Classification

A new technique called PLANT (Pretrained and Leveraged Attention) has been introduced to improve attention mechanisms in extreme multi‑label text classification models. The method was presented by Debjyoti Saha Roy, Byron C. Wallace, and Javed A. Aslam in a paper submitted to arXiv on October 30, 2024 and revised through December 26, 2025.

Method Overview

PLANT operates by planting label‑specific attention patterns using a pretrained learning‑to‑rank model guided by mutual information gain. This initialization is architecture‑agnostic, allowing seamless integration with large language model backbones such as Mistral‑7B, LLaMA3‑8B, DeepSeek‑V3, and Phi‑3.

The approach is designed as a plug‑and‑play module that can be attached to existing extreme multi‑label classifiers without altering their core architecture, thereby simplifying adoption across diverse systems.

Experimental Evaluation

Empirical results reported in the paper indicate that PLANT consistently outperforms existing state‑of‑the‑art methods across several benchmark tasks, including International Classification of Diseases (ICD) coding, legal topic classification, and content recommendation.

The performance gains are most pronounced in few‑shot settings, where the method delivers substantial improvements on rare labels that are typically under‑represented in training data.

Ablation studies included in the work confirm that the initialization of attention weights is a primary driver of the observed improvements, underscoring the importance of effective attention seeding.

Availability and Impact

The authors have made the code and trained models publicly available, facilitating replication and further exploration by the research community.

The paper is classified under the Computation and Language (cs.CL) and Machine Learning (cs.LG) subjects on arXiv and carries the identifier arXiv:2410.23066.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Ende der Übertragung

Originalquelle

Privacy Protocol

Wir verwenden CleanNet Technology für maximale Datensouveränität. Alle Ressourcen werden lokal von unseren gesicherten deutschen Servern geladen. Ihre IP-Adresse verlässt niemals unsere Infrastruktur. Wir verwenden ausschließlich technisch notwendige Cookies.

Core SystemsTechnisch notwendig
External Media (3.Cookies)Maps, Video Streams
Analytics (Lokal mit Matomo)Anonyme Metriken
Datenschutz lesen