Open Dataset and Modeling Framework Target Net Conversion Rate in E‑Commerce
Global: New Open Dataset and Modeling Framework for Net Conversion Rate in E‑Commerce
Researchers at Alibaba’s e‑commerce technology team have unveiled a new open dataset and a modeling framework aimed at improving net conversion rate (NetCVR) prediction for industrial recommender systems. The dataset, named CASCADE, was derived from user interactions on the Taobao app and released alongside the paper on arXiv on January 26 2026. The work addresses the limitations of traditional conversion‑rate metrics, which overlook refunds, by defining NetCVR as the probability that a clicked item is purchased and not refunded. By providing both the data and a continuous‑learning solution, the authors seek to enhance user satisfaction measurement and business value for online retailers. The initiative is intended for researchers and practitioners developing real‑time recommendation algorithms.
Limitations of Traditional CVR Metrics
Conversion rate (CVR) has long been used to allocate traffic in recommender systems, yet it fails to capture the full economic outcome because it ignores subsequent refund behavior. The authors note that the two-stage delay—from click to purchase and from purchase to refund—creates opposing effects that render standard CVR modeling approaches ineffective for NetCVR estimation.
Introducing the CASCADE Dataset
The CASCADE (Cascaded Sequences of Conversion and Delayed Refund) dataset comprises billions of interaction records collected from the Taobao mobile application. It includes timestamps for clicks, conversions, and refunds, enabling researchers to model the full cascaded feedback loop. According to the authors, CASCADE is the first large‑scale publicly available resource specifically designed for continuous NetCVR prediction.
Key Findings from Dataset Analysis
Analysis of CASCADE revealed three actionable insights: (1) NetCVR exhibits pronounced temporal dynamics, necessitating online continuous modeling; (2) a cascaded approach that separately predicts CVR and refund rate outperforms direct NetCVR prediction; and (3) delay time, which correlates with both CVR and refund rate, serves as a valuable feature for improving prediction accuracy.
TESLA: A Cascaded Modeling Approach
Building on these insights, the authors propose TESLA, a continuous NetCVR modeling framework that integrates a CVR‑refund‑rate cascaded architecture, stage‑wise debiasing, and a delay‑time‑aware ranking loss. The design allows the model to be updated in real time as new interaction data arrive, addressing the temporal volatility identified in the dataset.
Performance Gains Demonstrated
Extensive experiments on the CASCADE dataset show that TESLA consistently surpasses state‑of‑the‑art baselines, achieving absolute improvements of 12.41 percent in RI‑AUC and 14.94 percent in RI‑PRAUC for NetCVR prediction. These metrics indicate a substantial boost in ranking quality and precision‑recall performance under delayed feedback conditions.
Open Access and Future Directions
The codebase and the full CASCADE dataset have been released on GitHub (https://github.com/alimama-tech/NetCVR), inviting the research community to replicate and extend the findings. The authors suggest that future work could explore richer user‑item interaction features and broader applicability across different e‑commerce platforms.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung