IEEE Access, 2025 (SCI-Expanded)
In sports, revealing athletes with high potential to excel for sports schools is pivotal. In the literature, this process is called Talent Identification (TID) and is defined as "to knowthe players participating in the sport with the potential to be perfect." The problem discussed in this paper focuses on the early identification of an athlete’s talented sports branch before they are assigned to a specific branch. This determination process is based on the evaluation of general performance tests and assessments. TID solutions in the literature use AI-driven methods (i.e., Machine Learning, Neural Nets, etc.). However, they could not beat the following deficiencies: they can not be used with the dataset features having complex and non-linear relationships, are not scalable in the number of features, are not adaptable to hierarchical data, cannot generalize the solution, depend on any predefined thresholds or prior assumptions, are not adaptable to the other datasets, are not tolerable to the incomplete inputs. A two-stage TID solution has been introduced to address the deficiencies above and resolve the TID challenge. In the first stage (TID1), the admitted athletes are determined. In the second stage (TID2), athletes are classified into their talented branches (Football, basketball, volleyball, or athletics). TID1 uses our Shallow Deep learning (SDL) model to classify the admitted. In this stage, a remarkable performance was obtained with 98.85%. In TID2, nine different feature selection methods (four RFE-related methods, three SelectKBest-related methods, and Lasso and Boruta) are applied to reduce the number of features. After feature selection, our novel SCM-DL deep learning classifier model (apart from the architectures in literature, this model is constructed internally with parallel layers and carries a combinatorial layer that is beyond the combination of existing techniques) is applied and compared with Random Forest, Decision Tree, Extra Tree, and Support Vector Classifiers. The SCM-DL integrated with the RFE_DTC feature selection method achieved the highest performance for six features, yielding an accuracy rate of 97.40% and Matthews Correlation Coefficient performance rate of 96.6%. By this result, our model guided the coaches by indicating which features to focus on in talent identification.