Personalized Phishing Detection Leveraging LLMs and Retrieval-Augmented Generation Shows High Accuracy
Global: Personalized Phishing Detection Leveraging LLMs and Retrieval-Augmented Generation Shows High Accuracy
A new study released on arXiv outlines a personalized email security framework that combines large language models (LLMs) with retrieval‑augmented generation (RAG) to improve phishing detection. The research, authored by an unnamed team of computer scientists, proposes constructing user‑specific context from historical legitimate emails and real‑time threat intelligence to guide the LLM’s decision‑making. The approach aims to lower false‑positive rates while maintaining high detection performance.
Challenges with Existing Detectors
According to the abstract, conventional rule‑based and machine‑learning detectors struggle with the growing sophistication of phishing messages, often generating excessive false positives that burden security operations. The authors note that standalone LLM classifiers can exacerbate this issue, mislabeling legitimate correspondence as malicious.
Framework Design
The proposed system retrieves a compact set of a user’s past legitimate emails and enriches it with domain and URL reputation data from a cyber‑threat intelligence platform. This evidence is then supplied to the LLM as contextual input, enabling the model to tailor its assessment to the individual’s communication patterns. The study evaluates four open‑source LLMs—Llama4‑Scout, DeepSeek‑R1, Mistral‑Saba, and Gemma2—within this architecture.
Evaluation and Results
Evaluation on a dataset compiled from public and institutional sources demonstrates strong performance across the models. The abstract highlights that Llama4‑Scout achieved an F1‑score of 0.9703 and realized a 66.7% reduction in false positives when augmented with RAG. Similar gains were reported for the other models, indicating the feasibility of the user‑profiling strategy.
Implications for Email Security
These findings suggest that integrating RAG with LLMs can produce high‑precision, low‑friction phishing detection systems that adapt to individual user behavior. By reducing false positives, organizations may experience lower operational overhead and fewer disruptions to legitimate communications.
Future Directions
The authors propose extending the framework to incorporate additional threat‑intelligence feeds and to assess scalability in large‑enterprise environments. Further research may also explore privacy‑preserving techniques for handling user email data.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung