Exploring Text Similarity in Human and AI-Generated Scientific Abstracts: A Comprehensive Analysis

ÇELİKTEN, TUĞBA; Onan, Aytug

doi:10.1109/access.2025.3564867

Exploring Text Similarity in Human and AI-Generated Scientific Abstracts: A Comprehensive Analysis

ÇELİKTEN T., Onan A.

IEEE ACCESS, ss.74313-74334, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2025
Doi Numarası: 10.1109/access.2025.3564867
Dergi Adı: IEEE ACCESS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.74313-74334
Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

Text similarity is a crucial area of study that evaluates how similar texts are both semantically and syntactically. As data volumes increase, understanding the similarities and relationships between texts becomes essential, particularly in natural language processing (NLP) tasks such as text generation, summarization, and classification. This study examines the similarities between human-written scientific abstracts, AI-paraphrased abstracts, and AI-generated abstracts. Various methods, including cosine similarity, Word2Vec, and BERT, were evaluated based on mean, median, and standard deviation metrics. Among these, Word2Vec and FastText achieved the highest mean similarity scores (0.930), while BERT demonstrated superior performance with the highest median (0.841) and the lowest standard deviation (0.019) in the 'Human-Paraphrased' category, showing consistent results across datasets. Additionally, the research investigates the implications of these similarities for text analysis and ethical standards, comparing various techniques for measuring text similarity and analysing their effectiveness. The findings offer valuable insights into the application areas of text similarity analysis.