An improved ant algorithm with LDA-based representation for text document clustering


Onan A., Bulut H., KORUKOĞLU M. S.

Journal of Information Science, cilt.43, sa.2, ss.275-292, 2017 (SCI-Expanded, SSCI, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 43 Sayı: 2
  • Basım Tarihi: 2017
  • Doi Numarası: 10.1177/0165551516638784
  • Dergi Adı: Journal of Information Science
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus
  • Sayfa Sayıları: ss.275-292
  • Anahtar Kelimeler: Latent Dirichlet allocation, text clustering, text mining
  • Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.