Machine Unlearning Cuts Direct Data Leakage in Code Generation Models

Global: Machine Unlearning Cuts Direct Data Leakage in Code Generation Models by Over 50%

Researchers have conducted the first large‑scale empirical assessment of machine unlearning as a privacy safeguard for large language models that generate code (LLMs4Code). The study, released on arXiv in February 2025, introduced a benchmark designed to evaluate both the removal of sensitive information and the preservation of code‑generation ability. By applying three unlearning algorithms to three widely used open‑source models—AIXCoder‑7B, CodeLlama‑7B, and CodeQwen‑7B—the authors measured how effectively personal data could be forgotten while maintaining functional performance.

Benchmark Construction

The benchmark comprises two complementary components. A synthetic “forget” set embeds a variety of personal identifiers, such as names, addresses, and API keys, deliberately inserted into training‑like code snippets. A separate “retain” set consists of standard programming tasks intended to gauge whether the models retain their ability to produce correct code after unlearning procedures are applied.

Unlearning Algorithms Evaluated

Three representative unlearning strategies were examined: Gradient Ascent (GA), a hybrid of Gradient Ascent with Gradient Descent (GA+GD), and a combination of Gradient Ascent with Kullback‑Leibler regularization (GA+KL). Each method was executed on the three target models, allowing a systematic comparison of effectiveness across algorithmic approaches and model architectures.

Key Findings

Across all experiments, direct memorization‑based leakage declined by more than 50 % on average after unlearning. At the same time, the models retained approximately 91 % of their original code‑generation performance, indicating that privacy gains did not come at a prohibitive cost to utility. These results suggest that machine unlearning can serve as a practical tool for reducing the exposure of sensitive training data in code‑focused language models.

Residual Risks

Despite the reduction in direct leaks, post‑unlearning output analysis revealed a consistent shift toward indirect leakage, where models inferred or reconstructed private information without explicit memorization. This phenomenon highlights an underexplored vulnerability that persists even when targeted data have been successfully removed.

Future Directions

The authors conclude that while machine unlearning offers a viable pathway to enhance privacy in LLMs4Code, further research is required to mitigate indirect leakage mechanisms. Proposed avenues include developing hybrid unlearning‑and‑regularization techniques and extending evaluation frameworks to capture subtler forms of data inference.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Machine Unlearning Cuts Direct Data Leakage in Code Generation Models by Over 50%

Benchmark Construction

Unlearning Algorithms Evaluated

Key Findings

Residual Risks

Future Directions

Data and Protocol

Privacy Protocol