Lexicon–Sentiment-Based Model for Detecting Fake News
DOI:
https://doi.org/10.47852/bonviewAIA52023972Keywords:
sentiment feature, lexical feature, unigramAbstract
The spread of misinformation poses a challenge to social media platforms, which requires elaborate detection systems. Most existing approaches to detecting misinformation skip using multiple features and are therefore bound to specific topic domain. This study presents a new model, Lexicon–Sentiment-Based Model (LSBM), which combines lexical and sentiment features and applies them alongside unigrams to improve the chances of detecting fake news from any domain. We applied our model to three heterogeneous datasets: Fake and Authentic News Articles (44K samples), Combined Corpus Dataset1 (80K samples), and Merged FA-KES and CoAID datasets (70K samples)—which cover politics, health, economics, and entertainment. The proposed approach applies feature selection to discriminate evaluating linguistic and emotional constructs, such as punctuation, sentiment value, and Term Frequency–Inverse Document Frequency weighted unigrams. These features were then evaluated against six classifiers: logistic regression, decision tree, random forest, support vector machine, K-nearest neighbors, and Naive Bayes. The results demonstrated that the inclusion of lexicon–sentiment features with unigrams substantially enhanced detection accuracy: SVM achieved 97% accuracy in the Combined Corpus Datasets, while RF achieved 88% accuracy in cross-domain data. Regarding cross-domain robustness, LSBM overcomes dataset constraints by exploiting domain-neutral features such as emotional tone and lexical diversity. Regarding feature fusion, the combination of sentiment, lexical, and unigram features performs better than single-feature approaches; it increases accuracy more than 6% over models using only unigrams. Regarding interpretability, feature importance analysis shows marks, ALL CAPS, and sentiment scores as primary indicators of falsity. This research improves fake news detection through a tested framework adaptable to different subjects and large-scale datasets. Further work includes model expansion to multilingual scenarios and applying deep learning techniques for better analysis of semantics.
Received: 27 July 2024 | Revised: 11 June 2025 | Accepted: 4 July 2025
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in Lexical_Sentiment_Based_Model_Datasets at https://github.com/asbichi362/Lexical_Sentiment_Based_Model_Datasets.
Author Contribution Statement
Abdulkadir Shehu Bichi: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Funding acquisition. Ibrahim Said Ahmad: Conceptualization, Methodology, Validation, Resources, Writing – review & editing, Supervision, Project administration, Funding acquisition. Amina Imam Abubakar: Methodology, Writing – review & editing. Fa’iz Ibrahim Jibiya: Writing – review & editing. Aisha Mustapha Ahmad: Writing – review & editing. Nur Bala Rabiu: Writing – review & editing.
Metrics
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.