ChatGPT-4o′s performance on pediatric Vesicoureteral reflux

Akyol Onder, Esra; Ensari, Esra; ERTAN, PELİN

doi:10.1016/j.jpurol.2024.12.002

ChatGPT-4o′s performance on pediatric Vesicoureteral reflux

Akyol Onder E. N., Ensari E., ERTAN P.

Journal of Pediatric Urology, cilt.21, sa.2, ss.504-509, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 21 Sayı: 2
Basım Tarihi: 2025
Doi Numarası: 10.1016/j.jpurol.2024.12.002
Dergi Adı: Journal of Pediatric Urology
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, MEDLINE
Sayfa Sayıları: ss.504-509
Anahtar Kelimeler: Artificial intelligence, ChatGPT, Vesicoureteral reflux
Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

Introduction: Vesicoureteral reflux (VUR) is a common congenital or acquired urinary disorder in children. Chat Generative Pre-trained Transformer (ChatGPT) is an artificial intelligence-driven platform offering medical information. This research aims to assess the reliability and readability of ChatGPT-4o′s answers regarding pediatric VUR for general, non-medical audience. Materials and methods: Twenty of the most frequently asked English-language questions about VUR in children were used to evaluate ChatGPT-4o′s responses. Two independent reviewers rated the reliability and quality using the Global Quality Scale (GQS) and a modified version of the DISCERN tool. The readability of ChatGPT responses was assessed through the Flesch Reading Ease (FRE) Score, Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG). Results: Median mDISCERN and GQS scores were 4 (4–5) and 5 (3–5), respectively. Most of the responses of ChatGPT have moderate (55 %) and good (45 %) reliability according to the mDISCERN score and high quality (95 %) according to GQS. The mean ± standard deviation scores for FRE, FKGL, SMOG, GFI, and CLI of the text were 26 ± 12, 15 ± 2.5, 16.3 ± 2, 18.8 ± 2.9, and 15.3 ± 2.2, respectively, indicating a high level of reading difficulty. Discussion: While ChatGPT-4o offers accurate and high-quality information about pediatric VUR, its readability poses challenges, as the content is difficult to understand for a general audience. Conclusion: ChatGPT provides high-quality, accessible information about VUR. However, improving readability should be a priority to make this information more user-friendly for a broader audience.