Automatic Extraction of Protected Health Information from Multilingual Hacker Communities

Altmetric Attention Score

This badge shows attention from news, blogs, social media, policy documents, and more. View details

๐Ÿ“ˆ Dimensions Citation Metrics

Dimensions tracks citations across scholarly literature, patents, clinical trials, and policy documents. View full metrics โ†’

In Plain Terms

This research tackles the theft and resale of stolen health records discussed in hacker channels on encrypted platforms like Telegram and Discord. The authors build NERF-PHI, a named-entity-recognition system that, after machine-translating over three million multilingual hacker posts into English, automatically pulls out mentions of victims and medical data. They find that encoder-based language models are highly effective at surfacing this protected health information for investigators.

Key Contributions

Key contributions will be added soon.

Artifacts

Citation

Cade Dacosta, Benjamin M. Ampel, Matthew Hashim, & Hsinchun Chen (2026). Automatic Extraction of Protected Health Information from Multilingual Hacker Communities. HICSS https://doi.org/10.24251/HICSS.2026.063
Benjamin M. Ampel
Benjamin M. Ampel
Assistant Professor in Computer Information Systems and Director, Center for CyberAI Research (CCAIR)

My research focuses on AI-enabled Cybersecurity, including Cyber Threat Intelligence, Large Language Models, and Phishing Detection.

Loading stats...