Lexicon–Sentiment-Based Model for Detecting Fake News

Abdulkadir Shehu Bichi; Ibrahim Said Ahmad; Amina Imam Abubakar; Fa’iz Ibrahim Jibiya; Aisha Mustapha Ahmad; Nur Bala Rabiu

doi:10.47852/bonviewAIA52023972

Authors

Abdulkadir Shehu Bichi Department of Computer Science, Bayero University Kano, Nigeria https://orcid.org/0009-0009-5552-3099
Ibrahim Said Ahmad Department of Computer Science, Bayero University Kano, Nigeria and Institute of Experiential AI, Northeastern University, USA
Amina Imam Abubakar Department of Computer Science, University of Abuja, Nigeria
Fa’iz Ibrahim Jibiya Department of Computer Science, Federal Polytechnic Bauchi, Nigeria https://orcid.org/0009-0003-3137-9818
Aisha Mustapha Ahmad Department of Computer Science, Bayero University Kano, Nigeria
Nur Bala Rabiu Department of Computer Science, Bayero University Kano, Nigeria

DOI:

https://doi.org/10.47852/bonviewAIA52023972

Keywords:

sentiment feature, lexical feature, unigram

Abstract

The spread of misinformation poses a challenge to social media platforms, which requires elaborate detection systems. Most existing approaches to detecting misinformation skip using multiple features and are therefore bound to specific topic domain. This study presents a new model, Lexicon–Sentiment-Based Model (LSBM), which combines lexical and sentiment features and applies them alongside unigrams to improve the chances of detecting fake news from any domain. We applied our model to three heterogeneous datasets: Fake and Authentic News Articles (44K samples), Combined Corpus Dataset1 (80K samples), and Merged FA-KES and CoAID datasets (70K samples)—which cover politics, health, economics, and entertainment. The proposed approach applies feature selection to discriminate evaluating linguistic and emotional constructs, such as punctuation, sentiment value, and Term Frequency–Inverse Document Frequency weighted unigrams. These features were then evaluated against six classifiers: logistic regression, decision tree, random forest, support vector machine, K-nearest neighbors, and Naive Bayes. The results demonstrated that the inclusion of lexicon–sentiment features with unigrams substantially enhanced detection accuracy: SVM achieved 97% accuracy in the Combined Corpus Datasets, while RF achieved 88% accuracy in cross-domain data. Regarding cross-domain robustness, LSBM overcomes dataset constraints by exploiting domain-neutral features such as emotional tone and lexical diversity. Regarding feature fusion, the combination of sentiment, lexical, and unigram features performs better than single-feature approaches; it increases accuracy more than 6% over models using only unigrams. Regarding interpretability, feature importance analysis shows marks, ALL CAPS, and sentiment scores as primary indicators of falsity. This research improves fake news detection through a tested framework adaptable to different subjects and large-scale datasets. Further work includes model expansion to multilingual scenarios and applying deep learning techniques for better analysis of semantics.

Received: 27 July 2024 | Revised: 11 June 2025 | Accepted: 4 July 2025

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

Data Availability Statement

The data that support the findings of this study are openly available in Lexical_Sentiment_Based_Model_Datasets at https://github.com/asbichi362/Lexical_Sentiment_Based_Model_Datasets.

Author Contribution Statement

Abdulkadir Shehu Bichi: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Funding acquisition. Ibrahim Said Ahmad: Conceptualization, Methodology, Validation, Resources, Writing – review & editing, Supervision, Project administration, Funding acquisition. Amina Imam Abubakar: Methodology, Writing – review & editing. Fa’iz Ibrahim Jibiya: Writing – review & editing. Aisha Mustapha Ahmad: Writing – review & editing. Nur Bala Rabiu: Writing – review & editing.

Lexicon–Sentiment-Based Model for Detecting Fake News

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Journal Information

cimago-journal

Make a Submission

Keywords