Can ChatGPT-5 educate the public about vasectomy?: a Google Trends-based expert panel assessment


Creative Commons License

ALBAZ A. C., ERBATU O., YİĞİT O., ÜÇER O., TEMELTAŞ G., MÜEZZİNOĞLU T.

FRONTIERS IN DIGITAL HEALTH, cilt.8, 2026 (ESCI, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 8
  • Basım Tarihi: 2026
  • Doi Numarası: 10.3389/fdgth.2026.1726517
  • Dergi Adı: FRONTIERS IN DIGITAL HEALTH
  • Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus, EMBASE, Directory of Open Access Journals
  • Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

Background ChatGPT-5, the latest multimodal large language model (LLM), has gained remarkable public attention for its ability to provide real-time and context-aware health information. However, its effectiveness in addressing sensitive urological topics such as vasectomy has not been systematically evaluated. Objective This study aimed to evaluate the accuracy, completeness and public suitability of ChatGPT-5's responses to frequently asked questions about vasectomy, derived from Google Trends data reflecting real-world public interest. Methods A total of eight experts-four urologists, two public health specialists, one obstetrician-gynecologist and one fertility nurse-independently assessed ChatGPT-5's responses to ten high-frequency vasectomy-related questions. Each response was rated using six 5-point Likert-scale criteria: medical accuracy, completeness, clarity, tone, public usefulness and recommendability. Descriptive statistics, Kruskal-Wallis tests and two-way random-effects intraclass correlation coefficients (ICC, 95% CI) were applied for statistical analysis. Results The mean ratings across evaluation domains ranged from 3.75 to 4.04. Clarity of language and tone appropriateness received the highest scores, whereas medical accuracy and comprehensiveness demonstrated greater dispersion. No statistically significant differences were observed among expert subgroups (p > 0.05). Inter-rater reliability was very low (ICC = -0.01), indicating substantial variability across expert evaluations. Conclusions In this exploratory assessment, ChatGPT-5 responses to vasectomy-related public questions were frequently perceived as clear and appropriately framed for informational use. However, variability across expert ratings and the absence of layperson validation underscore the need for cautious interpretation. Large language model outputs may serve as supportive educational resources when accompanied by expert oversight and audience-specific adaptation.