A K-medoids based clustering scheme with an application to document clustering


Onan A.

2nd International Conference on Computer Science and Engineering, UBMK 2017, Antalya, Türkiye, 5 - 08 Ekim 2017, ss.354-359, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ubmk.2017.8093409
  • Basıldığı Şehir: Antalya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.354-359
  • Anahtar Kelimeler: Clustering, PAM, Randomized seeding, Text mining
  • Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

Clustering is an important unsupervised data analysis technique, which divides data objects into clusters based on similarity. Clustering has been studied and applied in many different fields, including pattern recognition, data mining, decision science and statistics. Clustering algorithms can be mainly classified as hierarchical and partitional clustering approaches. Partitioning around medoids (PAM) is a partitional clustering algorithms, which is less sensitive to outliers, but greatly affected by the poor initialization of medoids. In this paper, we augment the randomized seeding technique to overcome problem of poor initialization of medoids in PAM algorithm. The proposed approach (PAM++) is compared with other partitional clustering algorithms, such as K-means and K-means++ on text document clustering benchmarks and evaluated in terms of F-measure. The results for experiments indicate that the randomized seeding can improve the performance of PAM algorithm on text document clustering.