Satire Detection in Turkish News Articles: A Machine Learning Approach


Toçoğlu M. A., Onan A.

5th International Conference on Big Data Innovations and Applications, Innovate-Data 2019, İstanbul, Türkiye, 26 - 28 Ağustos 2019, cilt.1054, ss.107-117, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 1054
  • Doi Numarası: 10.1007/978-3-030-27355-2_8
  • Basıldığı Şehir: İstanbul
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.107-117
  • Anahtar Kelimeler: Fake news, Machine learning, Satire identification
  • Manisa Celal Bayar Üniversitesi Adresli: Evet

Özet

With the advances in information and communication technologies, an immense amount of information has been shared on social media and microblogging platforms. Much of the online content contains elements of figurative language, such as, irony, sarcasm and satire. The automatic identification of figurative language can be viewed as a challenging task in natural language processing, where linguistic entities, such as, metaphor, analogy, ambiguity, irony, sarcasm, satire, and so on, have been utilized to express more complex meanings. The predictive performance of sentiment classification schemes may degrade if figurative language within the text has not been properly addressed. Satirical text is a way of figurative communication, where ideas/opinions regarding a people, event or issue is expressed in a humorous way to criticize that entity. Satirical news can be deceptive and harmful. In this paper, we present a machine learning based approach to satire detection in Turkish news articles. In the presented scheme, we utilized three kinds of features to model lexical information, namely, unigrams, bigrams and tri-grams. In addition, term-frequency, term-presence and TF-IDF based schemes have been taken into consideration. In the classification phase, Naïve Bayes, support vector machines, logistic regression and C4.5 algorithms have been examined.