Medcongtm: Interpretable multi-label clinical code prediction with dual-view graph contrastive topic modeling

ÇELİKTEN, TUĞBA; Onan, Aytuğ

doi:10.1016/j.knosys.2025.114103

Medcongtm: Interpretable multi-label clinical code prediction with dual-view graph contrastive topic modeling

ÇELİKTEN T., Onan A.

Knowledge-Based Systems, cilt.327, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 327
Basım Tarihi: 2025
Doi Numarası: 10.1016/j.knosys.2025.114103
Dergi Adı: Knowledge-Based Systems
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, Library and Information Science Abstracts, Library, Information Science & Technology Abstracts (LISTA)
Anahtar Kelimeler: Clinical code prediction, Contrastive learning, Multi-label classification, Topic modeling
Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

BackgroundAccurate and interpretable clinical code assignment from free-text medical records is a fundamental challenge in healthcare informatics. Traditional machine learning and language model-based methods often lack transparency and struggle with multi-label prediction across complex taxonomies such as ICD, CPT, and LOINC. Existing topic modeling techniques, while interpretable, are rarely optimized for the clinical coding task and fail to leverage the rich semantic structure inherent in medical texts and ontologies. MethodsWe propose MedConGTM, a novel dual-view graph contrastive topic modeling framework tailored for interpretable and multi-label clinical code prediction. MedConGTM constructs two semantic views of each document: a document-token semantic graph and a document-code co-assignment graph. These views are jointly optimized through a novel dual-view contrastive learning objective that maximizes the mutual information between topic distributions inferred from text and task-specific code views. We introduce a code-aware word co-occurrence graph enhanced with medical ontologies and propose a hierarchy-sensitive contrastive loss that incorporates structural relationships between clinical codes. To ensure transparency, we design a topic-to-code attention decoder that links predicted codes to interpretable latent topics and salient textual evidence. ResultsExperiments on MIMIC-III and i2b2 datasets demonstrate that MedConGTM outperforms state-of-the-art baselines in both code prediction accuracy and topic coherence. It also provides interpretable code rationales aligned with clinical semantics. ConclusionsMedConGTM offers a powerful, interpretable, and clinically grounded solution for automated ICD/CPT/LOINC code assignment, bridging the gap between topic modeling, contrastive learning, and real-world healthcare applications.