Efficient Defense Against First Order Adversarial Attacks on Convolutional Neural Networks
DOI:
https://doi.org/10.47852/bonviewAIA52025957Keywords:
adversarial attacks, machine learning model securit, convolutional neural networks, fast gradient sign method (FGSM), projected gradient descent (PGD)Abstract
Machine learning models, especially neural networks, are vulnerable to adversarial attacks, where inputs are purposefully altered to induce incorrect predictions. These adversarial inputs closely resemble benign (unaltered) inputs, making them difficult to detect, and pose significant security risks in critical applications, such as autonomous vehicles, medical diagnostics, and financial transactions. Several methods exist to improve the model’s performance against these adversarial attacks, which typically modify the network architecture or training procedure. Often times, these adversarial training techniques only provide robustness against specific attack types and/or require substantial computational resources, making them impractical for real-world applications with limited resources. In this work, we propose a computationally-efficient adversarial fine-tuning approach to enhance the robustness of Convolutional Neural Networks (CNNs) against adversarial attacks and attain the same level of performance as the conventional adversarial training. More specifically, we propose to identify specific parts of the neural network model that are more vulnerable to adversarial attacks. Our analysis reveals that only a small portion of these vulnerable components accounts for a majority of the model’s errors caused by adversarial attacks. As such, we propose to selectively fine-tune these vulnerable components using different adversarial training methods to develop an effective and resource-efficient approach to improve model robustness. We empirically validate our proposed approach with varying dataset and algorithm parameters. We demonstrate that our approach can achieve similar performance as the more resource-intensive conventional adversarial training method.
Received: 18 April 2025 | Revised: 4 September 2025 | Accepted: 12 November 2025
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in github.io at https://doi.org/10.1109/5.726791, reference number [3].
Author Contribution Statement
Subah Karnine: Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization. Sadia Afrose: Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization. Hafiz Imtiaz: Conceptualization, Methodology, Formal analysis, Resources, Writing – original draft, Writing – review & editing, Supervision, Project administration.
Metrics
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.