Medcongtm: Interpretable multi-label clinical code prediction with dual-view graph contrastive topic modeling


ÇELİKTEN T., Onan A.

Knowledge-Based Systems, cilt.327, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 327
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.knosys.2025.114103
  • Dergi Adı: Knowledge-Based Systems
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, Library and Information Science Abstracts, Library, Information Science & Technology Abstracts (LISTA)
  • Anahtar Kelimeler: Clinical code prediction, Contrastive learning, Multi-label classification, Topic modeling
  • Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

BackgroundAccurate and interpretable clinical code assignment from free-text medical records is a fundamental challenge in healthcare informatics. Traditional machine learning and language model-based methods often lack transparency and struggle with multi-label prediction across complex taxonomies such as ICD, CPT, and LOINC. Existing topic modeling techniques, while interpretable, are rarely optimized for the clinical coding task and fail to leverage the rich semantic structure inherent in medical texts and ontologies. MethodsWe propose MedConGTM, a novel dual-view graph contrastive topic modeling framework tailored for interpretable and multi-label clinical code prediction. MedConGTM constructs two semantic views of each document: a document-token semantic graph and a document-code co-assignment graph. These views are jointly optimized through a novel dual-view contrastive learning objective that maximizes the mutual information between topic distributions inferred from text and task-specific code views. We introduce a code-aware word co-occurrence graph enhanced with medical ontologies and propose a hierarchy-sensitive contrastive loss that incorporates structural relationships between clinical codes. To ensure transparency, we design a topic-to-code attention decoder that links predicted codes to interpretable latent topics and salient textual evidence. ResultsExperiments on MIMIC-III and i2b2 datasets demonstrate that MedConGTM outperforms state-of-the-art baselines in both code prediction accuracy and topic coherence. It also provides interpretable code rationales aligned with clinical semantics. ConclusionsMedConGTM offers a powerful, interpretable, and clinically grounded solution for automated ICD/CPT/LOINC code assignment, bridging the gap between topic modeling, contrastive learning, and real-world healthcare applications.