Automatic Extraction of Protected Health Information from Multilingual Hacker Communities

Cade Dacosta; Benjamin M. Ampel; Matthew Hashim; Hsinchun Chen

doi:10.24251/HICSS.2026.063

Automatic Extraction of Protected Health Information from Multilingual Hacker Communities

Cade Dacosta, Benjamin M. Ampel, Matthew Hashim, Hsinchun Chen

Last updated on Jun 9, 2026

Altmetric Attention Score

This badge shows attention from news, blogs, social media, policy documents, and more. View details

📈 Dimensions Citation Metrics

Dimensions tracks citations across scholarly literature, patents, clinical trials, and policy documents. View full metrics →

In Plain Terms

This research tackles the theft and resale of stolen health records discussed in hacker channels on encrypted platforms like Telegram and Discord. The authors build NERF-PHI, a named-entity-recognition system that, after machine-translating over three million multilingual hacker posts into English, automatically pulls out mentions of victims and medical data. They find that encoder-based language models are highly effective at surfacing this protected health information for investigators.

Key Contributions

Key contributions will be added soon.

Artifacts

PDF

Related Papers

Citation

Cade Dacosta, Benjamin M. Ampel, Matthew Hashim, & Hsinchun Chen (2026). Automatic Extraction of Protected Health Information from Multilingual Hacker Communities. HICSS https://doi.org/10.24251/HICSS.2026.063

Protected Health Information Hacker Communities Named Entity Recognition Large Language Models Healthcare Cybersecurity

Automatic Extraction of Protected Health Information from Multilingual Hacker Communities

Altmetric Attention Score

📈 Dimensions Citation Metrics

Key Contributions

Artifacts

Related Papers

Citation

Benjamin M. Ampel

Assistant Professor in Computer Information Systems and Director, CyberAI Research and Education Center (CARE)