New Global Honeynet Dataset Offers High-Resolution Insight into Cyberattack Behavior
Global: New Global Honeynet Dataset Offers High-Resolution Insight into Cyberattack Behavior
Researchers have released a high‑resolution honeynet dataset captured over a continuous 72‑hour period from June 9 to 11, 2025, on Microsoft Azure. The collection contains 132,425 individual attack events recorded by three distinct honeypot systems—Cowrie, Dionaea, and SentryPeer—deployed across four geographically dispersed virtual machines.
Dataset Composition and Scope
Each event entry includes enriched metadata such as UTC timestamps, source and destination IP addresses, autonomous system identifiers, organizational mappings, geolocation coordinates, targeted ports, honeypot identifiers, derived temporal features, and standardized protocol classifications. The dataset is intended for standalone analyses of global cyberattack behaviors without requiring additional data aggregation.
Geographic and Protocol Distribution
The dataset documents activity from 2,438 unique source IPs spanning 95 countries. Although the top 1% of IP addresses account for only 1% of total events, three protocols dominate the traffic: Session Initiation Protocol (SIP), Telnet, and Server Message Block (SMB). SentryPeer captures concentrated SIP floods in North America and Southeast Asia, Cowrie logs Telnet/SSH scans primarily from Western Europe and the United States, and Dionaea records SMB exploits around European nodes.
Temporal Patterns
Analysis of the timestamps reveals pronounced activity peaks at 07:00 and 23:00 UTC, interspersed with gaps attributed to maintenance windows that expose operational blind spots. These rush‑hour spikes suggest coordinated attack timing across disparate regions.
Research Applications
The authors propose the dataset for use in anomaly detection, protocol‑misuse studies, threat‑intelligence generation, and defensive policy design. By providing fine‑grained temporal resolution alongside contextual geolocation and protocol metadata, the resource aims to support reproducible, cloud‑scale investigations into evolving cyber threats.
Limitations and Access
While the dataset offers extensive detail, its collection on a single cloud provider and within a limited time frame may introduce platform‑specific biases. The accompanying analysis code and data access instructions are included to facilitate immediate adoption by the research community.
This report is based on information from arXiv, licensed under Academic Preprint / Open Access. Based on the abstract of the research paper. Full text available via ArXiv.
Ende der Übertragung