Reproducibility Study Confirms Effectiveness of API Call Frequency Malware Detection
Global: Reproducibility Study Confirms Effectiveness of API Call Frequency Malware Detection
Researchers led by Juhani Merilehto submitted a preprint on January 13, 2026 that independently reproduces a previously published malware detection technique that relies on order‑invariant analysis of API call frequencies combined with a Random Forest classifier.
Background and Dataset
The original approach, introduced by Fellicious et al., was evaluated on a publicly available corpus comprising 250,533 training samples and 83,511 test samples. According to the authors, the dataset reflects a broad spectrum of benign and malicious software, enabling a realistic assessment of detection capabilities.
Model Variants and Evaluation
The reproduction examined four configurations—Unigram, Bigram, Trigram, and a combined n‑gram model—using the same feature extraction pipeline and hyper‑parameters reported in the source study. Each variant was trained on the full training set and evaluated on the designated test set.
Performance Gains
Across all configurations, the reproduced models achieved F1‑scores that surpassed the originally reported figures by 0.99 % to 2.57 % when the API call length was capped at 2,500. The Unigram model, highlighted as a lightweight option, recorded an F1‑score of 0.8717 compared with the original 0.8631, confirming a modest but measurable improvement.
Reproducibility Findings
Three independent experimental runs employing different random seeds yielded consistent outcomes, with standard deviations reported below 0.5 %. The authors interpret this low variance as evidence of high reproducibility and methodological stability.
Implications for Security Research
By validating the original methodology, the study reinforces the practicality of frequency‑based API call analysis as a viable malware detection strategy. The authors suggest that the approach could be integrated into existing security pipelines without substantial computational overhead.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung