Predicting Machine Learning Model Performance from Dataset Characteristics


Efe A. G., ÇELİKTEN T., Ergün A. E., Onan A.

7th International Conference on Intelligent and Fuzzy Systems, INFUS 2025, İstanbul, Türkiye, 29 - 31 Temmuz 2025, cilt.1529 LNNS, ss.146-153, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 1529 LNNS
  • Doi Numarası: 10.1007/978-3-031-97992-7_17
  • Basıldığı Şehir: İstanbul
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.146-153
  • Anahtar Kelimeler: Dataset Characteristics, Machine Learning, Model Performance Prediction
  • Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

Choosing the most suitable machine learning model for a classification dataset is a crucial step in data science, as inefficient use can lead to wasted time and resources. Developing a framework that predicts the performance of tree-based, non-linear, and linear models based on dataset properties is the aim of this project. Experimental results show that the framework can predict model performance with reasonable accuracy, with fluctuations depending on the characteristics of the dataset. Intrinsic dimensionality and class imbalance have a significant influence on the prediction stability of different model types. The framework reduces the need for trial-and-error processes, especially for big databases, by providing data scientists with a practical tool to identify the models and methodologies that are most effective for their specific datasets.