Toward Reliable Annotation in Low-Resource NLP: A Mixture of Agents Framework and Multi-LLM Benchmarking

Onan, Aytug; Nasution, Arbi; ÇELİKTEN, TUĞBA

doi:10.1109/access.2025.3643829

Toward Reliable Annotation in Low-Resource NLP: A Mixture of Agents Framework and Multi-LLM Benchmarking

Onan A., Nasution A. H., ÇELİKTEN T.

IEEE Access, cilt.13, ss.211620-211644, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 13
Basım Tarihi: 2025
Doi Numarası: 10.1109/access.2025.3643829
Dergi Adı: IEEE Access
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.211620-211644
Anahtar Kelimeler: Annotation quality, large language models, low-resource languages, mixture of agents, multilingual natural language processing, natural language understanding, text classification
Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

This paper introduces the Mixture-of-Agents (MoA) framework, a structured approach for improving the reliability of large language model (LLM)-based text annotation in low-resource NLP contexts. MoA employs coordinated agent interactions to enhance agreement, interpretability, and robustness without manual supervision. Evaluations on Turkish classification benchmarks demonstrate that MoA achieves up to 10-point improvements in macro-F1 over single-model baselines and significantly increases inter-agent consistency. Additionally, three novel reliability metrics—Conflict Rate (CR), Ambiguity Resolution Success Rate (ARSR), and Refinement Correction Rate (RCR)—are proposed to quantify annotation stability and correction dynamics. The results indicate that multi-agent coordination can substantially improve label quality, offering a scalable pathway toward trustworthy annotation in low-resource and cross-domain applications. The framework is language-agnostic and adaptable to other low-resource contexts beyond Turkish, including morphologically rich or typologically diverse languages such as Indonesian, Urdu, and Swahili. These findings highlight the scalability of MoA as a generalizable solution for multilingual and cross-domain annotation.