Adaptive Regularization Mitigates Overfitting in Large-Scale Sparse Embedding Models

USA: Adaptive Regularization Mitigates Overfitting in Large-Scale Sparse Embedding Models

Researchers Mang Li and Wei Lyu have introduced an adaptive regularization technique designed to curb the one‑epoch overfitting phenomenon that frequently hampers click‑through‑rate (CTR) and conversion‑rate (CVR) estimation models. The work, initially submitted to arXiv on November 9 2025 and revised on January 27 2026, targets large‑scale sparse categorical feature embeddings commonly used in search, advertising, and recommendation systems. By constraining the norm budget of embedding layers in a data‑driven manner, the method aims to preserve model performance across multiple training epochs while also delivering gains in single‑epoch settings. The authors report that the approach has already been integrated into live production pipelines. The study appears under the arXiv identifier 2511.06374 in the Machine Learning categories.

Background on Sparse Feature Models

Large‑scale recommendation and advertising platforms rely heavily on embedding tables that translate high‑cardinality categorical variables into dense vectors. Because these tables contain millions of parameters, models can quickly memorize training data, leading to a sharp decline in predictive quality when training proceeds beyond a single epoch—a behavior commonly referred to as one‑epoch overfitting.

Theoretical Framework

The authors employ Rademacher complexity analysis to quantify the capacity of sparse embedding models. Their calculations suggest that the norm of embedding vectors directly influences the generalization bound, providing a formal explanation for why unrestricted growth of embedding norms exacerbates overfitting.

Proposed Adaptive Regularization

Building on the theoretical insight, the paper proposes a regularization scheme that dynamically adjusts the allowable norm budget for each embedding layer based on observed training dynamics. Unlike static L2 penalties, the adaptive method reallocates regularization strength throughout training, ensuring that the overall embedding space remains within a controlled magnitude.

Empirical Validation

Experiments on benchmark CTR and CVR datasets demonstrate that the adaptive regularizer prevents the steep performance drop typically seen after the first epoch. In addition, the technique yields modest improvements—ranging from 0.3% to 1.2% in AUC—when models are trained for only one epoch, indicating enhanced sample efficiency.

Industry Deployment

According to the authors, the regularization approach has been deployed in production systems serving real‑time advertising auctions. Early operational metrics reportedly show stabilized click‑through predictions and reduced variance across daily model refreshes, confirming the method’s practical viability.

Future Directions

The study outlines several avenues for further research, including extending the adaptive budget concept to transformer‑based recommendation models and exploring automated hyperparameter tuning for the norm constraints. Continued collaboration between academia and industry is anticipated to refine the approach for broader applicability.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.