Impact of Code Comments on Large Language Model Bug Fixing

Global: Impact of Code Comments on Large Language Model Bug Fixing

Researchers have reported that the inclusion of source-code comments can dramatically enhance the bug‑fixing performance of large language models (LLMs). The study, posted to arXiv on January 23, 2026, examined how comment presence during both training and inference influences automated bug fixing (ABF) outcomes. By comparing multiple model configurations, the authors aimed to determine whether the common practice of stripping comments from code datasets is justified.

Background

Automated Bug Fixing leverages LLMs to transform buggy methods into corrected versions, a task that increasingly supports software‑engineering workflows. Conventional preprocessing pipelines often remove comments to reduce token length and simplify training data, operating under the assumption that comments add little semantic value for bug resolution.

Methodology

The investigation evaluated two distinct model families across four experimental conditions: training with comments versus without, and inference with comments versus without. To address the scarcity of commented methods in existing datasets, the team employed an auxiliary LLM to generate synthetic comments for previously comment‑free code snippets. This approach ensured a balanced representation of commented and uncommented examples throughout the experiments.

Key Findings

Results indicated that when comments were present during both training and inference, ABF accuracy increased by up to threefold compared with comment‑free configurations. Moreover, models trained on commented code did not suffer performance penalties when presented with comment‑less inputs during inference, suggesting that retaining comments does not compromise versatility.

Interpretability Insights

An interpretability analysis highlighted that comments describing implementation details—such as algorithmic steps or rationale—were especially beneficial. These explanatory notes appeared to guide the LLMs toward more precise modifications, reducing the likelihood of incorrect or overly generic fixes.

Implications and Future Directions

The findings challenge the prevailing notion that comments are expendable in LLM training pipelines for software engineering tasks. Practitioners may consider preserving or augmenting comment data when curating training corpora, and dataset curators might prioritize the collection of richly commented code. Further research could explore comment quality thresholds and the impact of multilingual comments on model performance.

Limitations

Because the study relied on abstract‑level information, detailed statistical breakdowns and model architecture specifications were not available. Future work should validate these observations across broader codebases and with additional LLM variants to confirm generalizability.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

Code Comments Triple Bug-Fixing Accuracy of Large Language Models, Study Finds