Transformer Attention-Driven Concept Extraction for Efficient Smishing Detection
DOI:
https://doi.org/10.47852/bonviewAIA62028760Keywords:
concept representation, smishing, SMS phishing, BERT, explainable AIAbstract
Short Message Service (SMS) phishing (smishing) is a form of phishing attack that uses mobile messaging as its delivery medium. Conventional detection methods for smishing attacks fall short at identifying such messages because they resemble spam. The conventional methods rely on static rules or shallow linguistic features. In this study, we propose a transformer-based attention-driven framework for detecting smishing. We develop a concept-level representation to improve both the accuracy and the explainability of our model. The approach extracts and sorts the message signatures uncovered in smishing attacks into three conceptual categories—textual, structural, and behavioral—each of which encompasses a different aspect of how smishing attacks attempt to accomplish. We utilize pretrained Bidirectional Encoder Representations from Transformers (BERT) to construct conceptual representations from the SMS messages. By leveraging BERT’s attention weights on the smishing concept categories, informative tokens and patterns that distinguish smishing from benign messages were identified. The final classification is performed using both a fully connected neural network layer and three classical machine learning baseline models, trained on the same features. The result demonstrates that our model achieves performance of (F1-score: 98.71%, accuracy: 99.32%) outperforming the baseline models. Ablation studies further confirmed that each concept category makes a meaningful contribution to the classification performance, with behavioral concept features having the highest impact. This work highlights the potential of attention-driven concept modeling for robust and explainable smishing detection.
Received: 16 December 2025 | Revised: 3 February 2026 | Accepted: 25 February 2026
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in the UCI Machine Learning Repository at https://doi.org/10.1145/2034691.2034742, reference number [27].
Author Contribution Statement
Zahriya Lawal Hassan: Conceptualization, Methodology, Resources, Writing – original draft, Visualization. Nor Fazlida Mohd Sani: Validation, Writing – review & editing, Supervision, Project administration, Funding acquisition. Muhammad Daniel Hafiz Abdullah: Validation, Writing – review & editing, Supervision, Project administration. Norwati Mustapha: Writing – review & editing, Supervision, Project administration.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Funding data
-
Universiti Putra Malaysia
Grant numbers UPM.RMC.800-2/1/2024/GP-IPS/9808600