Lexicon–Sentiment-Based Model for Detecting Fake News

Authors

  • Abdulkadir Shehu Bichi Department of Computer Science, Bayero University Kano, Nigeria https://orcid.org/0009-0009-5552-3099
  • Ibrahim Said Ahmad Department of Computer Science, Bayero University Kano, Nigeria and Institute of Experiential AI, Northeastern University, USA
  • Amina Imam Abubakar Department of Computer Science, University of Abuja, Nigeria
  • Fa’iz Ibrahim Jibiya Department of Computer Science, Federal Polytechnic Bauchi, Nigeria https://orcid.org/0009-0003-3137-9818
  • Aisha Mustapha Ahmad Department of Computer Science, Bayero University Kano, Nigeria
  • Nur Bala Rabiu Department of Computer Science, Bayero University Kano, Nigeria

DOI:

https://doi.org/10.47852/bonviewAIA52023972

Keywords:

sentiment feature, lexical feature, unigram

Abstract

The spread of misinformation poses a challenge to social media platforms, which requires elaborate detection systems. Most existing approaches to detecting misinformation skip using multiple features and are therefore bound to specific topic domain. This study presents a new model, Lexicon–Sentiment-Based Model (LSBM), which combines lexical and sentiment features and applies them alongside unigrams to improve the chances of detecting fake news from any domain. We applied our model to three heterogeneous datasets: Fake and Authentic News Articles (44K samples), Combined Corpus Dataset1 (80K samples), and Merged FA-KES and CoAID datasets (70K samples)—which cover politics, health, economics, and entertainment. The proposed approach applies feature selection to discriminate evaluating linguistic and emotional constructs, such as punctuation, sentiment value, and Term Frequency–Inverse Document Frequency weighted unigrams. These features were then evaluated against six classifiers: logistic regression, decision tree, random forest, support vector machine, K-nearest neighbors, and Naive Bayes. The results demonstrated that the inclusion of lexicon–sentiment features with unigrams substantially enhanced detection accuracy: SVM achieved 97% accuracy in the Combined Corpus Datasets, while RF achieved 88% accuracy in cross-domain data. Regarding cross-domain robustness, LSBM overcomes dataset constraints by exploiting domain-neutral features such as emotional tone and lexical diversity. Regarding feature fusion, the combination of sentiment, lexical, and unigram features performs better than single-feature approaches; it increases accuracy more than 6% over models using only unigrams. Regarding interpretability, feature importance analysis shows marks, ALL CAPS, and sentiment scores as primary indicators of falsity. This research improves fake news detection through a tested framework adaptable to different subjects and large-scale datasets. Further work includes model expansion to multilingual scenarios and applying deep learning techniques for better analysis of semantics.

 

Received: 27 July 2024 | Revised: 11 June 2025 | Accepted: 4 July 2025

 

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

 

Data Availability Statement

The data that support the findings of this study are openly available in Lexical_Sentiment_Based_Model_Datasets at https://github.com/asbichi362/Lexical_Sentiment_Based_Model_Datasets.

 

Author Contribution Statement

Abdulkadir Shehu Bichi: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Funding acquisition. Ibrahim Said Ahmad: Conceptualization, Methodology, Validation, Resources, Writing – review & editing, Supervision, Project administration, Funding acquisition. Amina Imam Abubakar: Methodology, Writing – review & editing. Fa’iz Ibrahim Jibiya: Writing – review & editing. Aisha Mustapha Ahmad: Writing – review & editing. Nur Bala Rabiu: Writing – review & editing.


Metrics

Metrics Loading ...

Downloads

Published

2025-07-25

Issue

Section

Research Article

How to Cite

Bichi, A. S., Ahmad, I. S., Abubakar, A. I., Jibiya, F. I., Ahmad, A. M., & Rabiu, N. B. (2025). Lexicon–Sentiment-Based Model for Detecting Fake News. Artificial Intelligence and Applications. https://doi.org/10.47852/bonviewAIA52023972