HackerSignal: A Large-Scale Multi-Source Dataset Linking Hacker Community Discourse to the CVE Vulnerability Lifecycle

In Plain Terms

This paper introduces HackerSignal, a very large public dataset that stitches together 7.45 million documents from hacker forums, exploit databases, vulnerability advisories, and software fix commits collected over 36 years. Everything is connected through shared CVE vulnerability identifiers, letting researchers trace a security flaw from early hacker chatter all the way to its official patch. The authors demonstrate three AI benchmark tasks the dataset enables and release diagnostics and documentation to support responsible reuse.

Key Contributions

Key contributions will be added soon.

Artifacts

No artifacts listed yet.

Citation

Benjamin M. Ampel & Sagar Samtani (2026). HackerSignal: A Large-Scale Multi-Source Dataset Linking Hacker Community Discourse to the CVE Vulnerability Lifecycle. In *arXiv preprint arXiv:2605.03158* https://doi.org/10.48550/arXiv.2605.03158
Benjamin M. Ampel
Benjamin M. Ampel
Assistant Professor in Computer Information Systems and Director, Center for CyberAI Research (CCAIR)

My research focuses on AI-enabled Cybersecurity, including Cyber Threat Intelligence, Large Language Models, and Phishing Detection.

Loading stats...