Rethinking Influence Estimation in Large Language Models

Global: Study Questions Embedding-Layer Dominance in LLM Influence Estimation, Highlights Middle Attention Layers and New Metric

Researchers have released a new arXiv preprint that reevaluates how training‑sample influence is measured in large language models (LLMs). The paper, posted in November 2025, proposes that middle attention layers provide more reliable influence estimates than the traditionally favored embedding layers, introduces alternative aggregation methods, and presents a novel evaluation metric called the Noise Detection Rate (NDR). The work aims to improve interpretability and auditing of LLMs without requiring costly model retraining.

Background on Influence Functions

Influence functions attempt to trace a model’s decision back to individual training examples by analyzing gradients propagated through the network. Prior approaches have largely relied on first‑order and higher‑order gradient terms, but computational constraints have forced many studies to restrict analysis to a subset of layers, often the initial embedding layers.

Reevaluating Layer Importance

Building on earlier findings by Yeh et al. (2022), the authors provide both theoretical arguments and empirical evidence that the “cancellation effect”—the presumed diminishing of influence scores in deeper layers—is unreliable. Their experiments indicate that middle attention layers capture influence signals more consistently across diverse LLM architectures.

Alternative Aggregation Strategies

The study also critiques the common practice of averaging influence scores across layers. Instead, it explores ranking‑based and vote‑based aggregation techniques, demonstrating that these alternatives can substantially improve the accuracy of influence estimation.

Introducing the Noise Detection Rate

To assess influence‑estimation quality without full model retraining, the authors propose the Noise Detection Rate (NDR). According to the paper, NDR outperforms the cancellation‑effect baseline in predicting whether a training sample materially affects model outputs.

Experimental Findings Across Model Scales

Extensive experiments spanning multiple LLM families and parameter counts reveal that the first (embedding) layers are not universally superior to later (including final) layers for influence estimation. The results suggest a more nuanced view of layer relevance, varying with model size and architecture.

Implications for Model Auditing

If validated, these insights could reshape best practices for dataset auditing, model interpretability, and responsible AI governance by encouraging analysts to focus on middle attention mechanisms and adopt more sophisticated aggregation and evaluation methods.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Study Questions Embedding-Layer Dominance in LLM Influence Estimation, Highlights Middle Attention Layers and New Metric