Identifying and Categorizing Malicious Content on Paste Sites: A Neural Topic Modeling Approach

Tala Vahedi; Benjamin M. Ampel; Sagar Samtani; Hsinchun Chen

doi:10.1109/ISI53945.2021.9624765

Identifying and Categorizing Malicious Content on Paste Sites: A Neural Topic Modeling Approach

Tala Vahedi, Benjamin M. Ampel, Sagar Samtani, Hsinchun Chen

Last updated on Jun 9, 2026

Altmetric Attention Score

This badge shows attention from news, blogs, social media, policy documents, and more. View details

📈 Dimensions Citation Metrics

Dimensions tracks citations across scholarly literature, patents, clinical trials, and policy documents. View full metrics →

In Plain Terms

Cybercriminals dump stolen data, credit card numbers, and malware code onto public text-sharing sites like Pastebin. This study builds a new machine-learning method that combines a language model (BERT) with topic modeling to automatically sort millions of these posts into categories, helping security teams spot leaked sensitive information and emerging threats faster.

Key Contributions

Key contributions will be added soon.

Artifacts

PDF

Related Papers

Citation

Tala Vahedi, Benjamin M. Ampel, Sagar Samtani, & Hsinchun Chen (2021). Identifying and Categorizing Malicious Content on Paste Sites: A Neural Topic Modeling Approach. IEEE ISI https://doi.org/10.1109/ISI53945.2021.9624765

Hacker Forums Cyber Threat Intelligence Deep Learning Transformers / LLMs Exploit Labeling

Identifying and Categorizing Malicious Content on Paste Sites: A Neural Topic Modeling Approach

Altmetric Attention Score

📈 Dimensions Citation Metrics

Key Contributions

Artifacts

Related Papers

Citation

Benjamin M. Ampel

Assistant Professor in Computer Information Systems and Director, CyberAI Research and Education Center (CARE)