MetricHunter: A software metric dataset generator utilizing SourceMonitor upon public GitHub repositories


ÖZÇEVİK Y., ALTAY O.

SoftwareX, cilt.23, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 23
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1016/j.softx.2023.101499
  • Dergi Adı: SoftwareX
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: Software quality, Software metrics, Dataset construction, Git version control system
  • Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

Version control systems are pervasively consulted nowadays to obtain software metric datasets. Accordingly, machine learning is applied to predict different aspects of a software including quality monitoring, influence analysis, etc. However, construction of a metric dataset is challenging and the dataset content may affect the success of the learning-based models. In this study, we propose a dataset construction tool, MetricHunter, which is able to produce platform/language specific datasets that can be used for predicting the features of newly created software. The proposed tool is developed by C# programming language utilizing a known metric gathering tool, i.e. SourceMonitor, and the GitHub REST API for public repositories. Thus, one can construct a proper dataset from a graphical user interface by simply specifying the programming language or target platform. The outputs of the tool on a set of repositories are validated by investigating automatically generated attribute values and comparing them with the measurements of metric gathering tools as well as the GitHub metric values.