An Analytical Framework for Addressing Imbalance in Fake Review Detection Using Augmented Text and Swarm Optimized Classifier

Richa Gupta; Indu Kashyap; Vinita Jindal

doi:10.47852/bonviewAIA52026221

Authors

Richa Gupta School of Engineering & Technology, Manav Rachna International Institute of Research & Studies and Department of Computer Science, University of Delhi, India
Indu Kashyap School of Engineering & Technology, Manav Rachna International Institute of Research & Studies, India
Vinita Jindal Department of Computer Science, University of Delhi, India

DOI:

https://doi.org/10.47852/bonviewAIA52026221

Keywords:

fake review detection, machine learning analytics, swarm optimization, text data augmentation, neural network training

Abstract

The rapid spread of online fake reviews threatens the consumers’ trust, business reputation, and confidence in e-commerce platforms. Detecting fake reviews is challenging because distinguishing them from real reviews is difficult for humans. Researchers have employed various machine learning models to address fake review detection. However, publicly available datasets often suffer from severe class imbalance, with significantly fewer fake reviews than real ones. Previous research has struggled with this data imbalance problem, yielding biased results and thereby failing to detect fraud reviews efficiently. Traditionally, oversampling or undersampling techniques have been used to handle imbalance, which results in information loss and/or overfitting. To address this data imbalance problem, GlOBiL, a novel framework that uses GPT-2 with GloVe embedding and an optimized Bi-LSTM classifier, has been proposed. To optimize Bi-LSTM, a novel staggered particle swarm optimization (SPSO) algorithm has also been proposed. GlOBiL operates in three phases: Phase I generates contextually similar, synthetic fraudulent reviews via GPT-2 augmentation. Phase II processes this augmented dataset using GloVe embedding. Further, the Bi-LSTM classifier is optimized using the proposed SPSO. Finally, Phase III trains the optimized classifier and identifies the reviews as fake and real. Four benchmark review datasets—YelpZIP, YelpNYC, YelpCHI hotel, and YelpCHI restaurant—were used for experimentation. The results show that the proposed GlOBiL outperformed baseline methods and nine published approaches. Average accuracy increased to 95.37%, 96.49%, 97.67%, and 97.13%, respectively. Consequently, GlOBiL aids consumers and businesses in detecting fake reviews, enhancing trust, and supporting informed decision-making on e-commerce platforms.

Received: 22 May 2025 | Revised: 12 August 2025 | Accepted: 22 October 2025

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

Data Availability Statement

The data that support the findings of this study are openly available in Kaggle at https://doi.org/10.1145/2783258.2783370, reference number [8].

Author Contribution Statement

Richa Gupta: Conceptualization, Methodology, Software, Investigation, Resources, Writing – original draft. Indu Kashyap: Supervision, Writing – review & editing. Vinita Jindal: Resources, Writing – review & editing, Supervision.

An Analytical Framework for Addressing Imbalance in Fake Review Detection Using Augmented Text and Swarm Optimized Classifier

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Journal Information

Make a Submission

Keywords