Large Language Models Prove Effective in Identifying Threat Indicators

Global: LLMs Evaluated for Proactive Detection of Threat Indicators

On Jan 13, 2026, researchers Aniesh Chawla and Udbhav Prasad released a study on arXiv that systematically evaluates large language models (LLMs) for the proactive identification of indicators of compromise (IOCs) extracted from unstructured web‑based threat‑intelligence sources. The work aims to shift enterprise security from reactive malware detection toward early warning based on publicly available reports.

Methodology

The authors built an automated pipeline that continuously scrapes IOCs from fifteen distinct web‑based threat‑report platforms. Six LLM variants—Gemini, Qwen, and three Llama‑based models—were prompted to classify each extracted element as malicious or benign. The evaluation framework measured precision, specificity, and recall for each model across the full corpus.

Dataset Composition

The test set comprised 479 webpages containing a total of 2,658 IOCs. These indicators broke down into 711 IPv4 addresses, 502 IPv6 addresses, and 1,445 domain names. All items were manually verified to serve as a reliable ground truth for model assessment.

Performance Overview

Results showed considerable variation among the LLMs. While some models struggled to distinguish benign from malicious entries, others achieved high accuracy. The study reports that overall precision ranged from 0.642 to 0.958, and specificity spanned 0.511 to 0.788, indicating that model selection critically impacts detection quality.

Top‑Performing Model

Gemini 1.5 Pro emerged as the leading system, attaining a precision of 0.958 and a specificity of 0.788 for malicious IOC identification. Notably, the model achieved perfect recall (1.0), correctly flagging every genuine threat in the dataset. These figures suggest that Gemini 1.5 Pro can reliably surface all relevant IOCs while maintaining a low false‑positive rate.

Implications for Security Operations

According to the authors, integrating high‑performing LLMs like Gemini 1.5 Pro into security‑operations workflows could enable organizations to ingest threat intelligence faster and act before adversaries exploit vulnerabilities. The proactive approach contrasts with traditional signature‑based detection, which typically reacts only after malware has been observed in the wild.

Limitations and Future Directions

The paper acknowledges that the evaluation was confined to publicly available web reports and that model performance may differ on proprietary or encrypted data sources. The researchers propose extending the pipeline to incorporate real‑time feeds and to explore fine‑tuning techniques that could further improve specificity without sacrificing recall.

This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.

LLMs Evaluated for Proactive Detection of Threat Indicators