Benchmarking Large Language Models for Biomedical Literature Summarization: Abstractive vs. Extractive Paradigms

ÇELİKTEN, TUĞBA; Onan, Aytug

doi:10.1109/access.2025.3604351

Benchmarking Large Language Models for Biomedical Literature Summarization: Abstractive vs. Extractive Paradigms

ÇELİKTEN T., Onan A.

IEEE Access, cilt.13, ss.152682-152715, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 13
Basım Tarihi: 2025
Doi Numarası: 10.1109/access.2025.3604351
Dergi Adı: IEEE Access
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.152682-152715
Anahtar Kelimeler: abstractive summarization, Extractive summarization, large language models, medical text summarization
Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

Summarizing medical texts has become an important tool for doctors and healthcare professionals by increasing the speed of access to information. In this context, abstractive and extractive summarization methods are the main techniques used to make texts concise without distorting their meaning. While abstractive summarization restructures the text to create more meaningful and contextually consistent summaries, extractive summarization summarizes by directly quoting from the source text. Analyses conducted with a dataset created using scientific articles written in different biomedical fields analyzed how accurately and consistently the models could summarize medical texts through the metrics used (SBERT, BLEU, ROUGE, etc.). In the evaluations, while abstractive summarization methods generally provide higher semantic consistency, extractive summarization offers higher word-matching accuracy. While abstractive summarization exhibits superior performance in metrics such as SBERT and preserves semantic integrity better, extractive summarization achieves higher results in metrics such as BLEU and ROUGE. In addition, it has been found that large models are generally more successful in extractive summarization, while medium-sized models show balanced performance in both methods. In the future, with the increasing use of LLMs in the medical field, it will be possible to make more accurate, faster, and more meaningful summaries. In addition, the integrated use of summarization methods can make clinical decision-support systems more efficient. Future studies can focus on making LLMs more effective with wider datasets and contextual suitability.