IEEE Access, cilt.14, ss.4756-4791, 2026 (SCI-Expanded, Scopus)
Biomedicine datasets often contain noisy, imbalanced, and high-dimensional features, which makes the choice of neural network training strategy a decisive factor for clinical decision support. While artificial neural networks (ANNs) have been widely applied to medical diagnosis tasks, comparative analyses across a broad spectrum of training algorithms remain limited. This study addresses this gap by evaluating twelve distinct ANN training algorithms on sixteen publicly available biomedicine datasets. The performance of each method was assessed according to training duration, loss metrics, and classification accuracy. During the testing phase, essential metrics such as F-measure, G-mean, accuracy, sensitivity, specificity, and precision were evaluated. A Friedman test was performed to statistically verify the performance discrepancies among the algorithms. The findings indicated that the selection of the training algorithm significantly influences classification efficacy. The results reveal that Levenberg–Marquardt and Bayesian Regularization consistently achieve high predictive performance on complex datasets, whereas the Variable Learning Rate method demonstrates robust and scalable performance across heterogeneous data conditions. Importantly, the findings indicate that no universal “best” algorithm exists; optimal selection depends on dataset characteristics such as size, dimensionality, and noise distribution. By mapping algorithm strengths and weaknesses across diverse diagnostic scenarios, this work provides actionable guidelines for researchers and practitioners aiming to design reliable and efficient ANN-based decision support systems in healthcare. Compared to baseline gradient descent variants, advanced training algorithms such as Levenberg–Marquardt and Bayesian Regularization achieved up to 25% higher accuracy (e.g., on the Skin_Nonskin dataset) and 20% higher F-measure (e.g., on the Saheart dataset) across multiple biomedicine datasets.