Modeling of Genes Associated with Milk Yield in Some Dairy Breeds using Machine Learning


Yeğenoğlu E. D., Kaymaz Y., Can T. H., Gevrekçi Y., Bilbay E. M.

THEORETICAL AND APPLIED APPROACHES IN AGRICULTURE, FOREST AND AQUATIC SCIENCES, Prof. Dr. Gökhan ŞEN, Editör, Duvar, İzmir, ss.10-31, 2025

  • Yayın Türü: Kitapta Bölüm / Mesleki Kitap
  • Basım Tarihi: 2025
  • Yayınevi: Duvar
  • Basıldığı Şehir: İzmir
  • Sayfa Sayıları: ss.10-31
  • Editörler: Prof. Dr. Gökhan ŞEN, Editör
  • Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

Prolactin (PRL) is a multifunctional hormone with crucial roles in lactation, mammary gland development, metabolic regulation, and reproductive performance in dairy species. Single nucleotide polymorphisms (SNPs) within PRL can serve as genetic markers for selection in breeding programs, yet high-dimensional genotype–phenotype relationships remain challenging for conventional models.

This study aims to model the relationships between PRL-gene SNP/genotype frequencies and 305-day milk yield (LMY) in dairy cattle, sheep, and goat breeds using artificial neural networks (ANNs), complemented by phylogenetic analyses to reveal evolutionary patterns among PRL genes.

SNP data for PRL regions were obtained from the Animal-SNPAtlas database. Breed-LMY records were sourced from literature, merged with genotype calls encoded numerically (0–3). Two ANN architectures were designed per species: (1) genotype + LMY inputs; (2) genotype + breed + LMY inputs. Models were trained using Keras with Adam optimizer over 200 epochs and evaluated via mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R²) under 5-fold cross-validation. Phylogenetic analyses employed Maximum-Likelihood methods in MEGA7; conserved motifs were identified via MEME-Suite and annotated using InterProScan.

Including breed information improved ANN performance in sheep (R² from 0.27 to 0.55) and cattle (R² from –0.98 to –0.61) but had a minimal effect in goats. Phylogenetic trees revealed four to six major PRL paralog clades per species; motif analyses uncovered five conserved domains corresponding with functional regions of the hormone. Genotype-frequency spectra highlighted loci with high heterozygosity as candidate markers.

ANNs effectively captured complex genotype–phenotype relationships in small ruminants, suggesting utility in genomic selection. Future work should integrate whole-genome SNP panels and larger datasets to enhance predictive accuracy and practical breeding applications.

Keywords: Prolactin, Artificial Neural Network, Lactation Milk Yield, SNP, Dairy Breeds, Phylogeny