Topic modeling with latent Dirichlet allocation for cancer disease posts Kanser hastalığı paylaşımları için Dirichlet ayrımı ile gizli konu modelleme


Creative Commons License

Altıntaş V., Albayrak M., Topal K.

Journal of the Faculty of Engineering and Architecture of Gazi University, cilt.36, sa.4, ss.2183-2196, 2021 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 36 Sayı: 4
  • Basım Tarihi: 2021
  • Doi Numarası: 10.17341/gazimmfd.734730
  • Dergi Adı: Journal of the Faculty of Engineering and Architecture of Gazi University
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Art Source, Compendex, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.2183-2196
  • Anahtar Kelimeler: Latent Dirichlet allocation, Natural language processing, Social media, Text mining, Topic modelling
  • Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

Purpose: The aim of this paper is to reveal the main topics discussed by examining reddit user comments about cancer disease. Theory and Methods: After the preproccesing, user comments are divided into topics with the help of the latent dirichlet allocation method. Results: The proposed approach using LDA has created consistent and semantically meaningful topics and clusters from user shares. The obtained topics can not only help people to interpret the texts in a large sharing collection in a way that can be interpreted by human beings but can also help patients and doctors discover new content that may be neglected. Conclusion: The results obtained with the LDA algorithm consist of the diagnosis of cancer disease, treatment process, moral-motivation during the disease period, chemotherapy period and medical support.