Efficient Defense Against First Order Adversarial Attacks on Convolutional Neural Networks

Subah Karnine; Sadia Afrose; Hafiz Imtiaz

doi:10.47852/bonviewAIA52025957

Authors

Subah Karnine Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Bangladesh
Sadia Afrose Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Bangladesh
Hafiz Imtiaz Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Bangladesh

DOI:

https://doi.org/10.47852/bonviewAIA52025957

Keywords:

adversarial attacks, machine learning model securit, convolutional neural networks, fast gradient sign method (FGSM), projected gradient descent (PGD)

Abstract

Machine learning models, especially neural networks, are vulnerable to adversarial attacks, where inputs are purposefully altered to induce incorrect predictions. These adversarial inputs closely resemble benign (unaltered) inputs, making them difficult to detect, and pose significant security risks in critical applications, such as autonomous vehicles, medical diagnostics, and financial transactions. Several methods exist to improve the model’s performance against these adversarial attacks, which typically modify the network architecture or training procedure. Often times, these adversarial training techniques only provide robustness against specific attack types and/or require substantial computational resources, making them impractical for real-world applications with limited resources. In this work, we propose a computationally-efficient adversarial fine-tuning approach to enhance the robustness of Convolutional Neural Networks (CNNs) against adversarial attacks and attain the same level of performance as the conventional adversarial training. More specifically, we propose to identify specific parts of the neural network model that are more vulnerable to adversarial attacks. Our analysis reveals that only a small portion of these vulnerable components accounts for a majority of the model’s errors caused by adversarial attacks. As such, we propose to selectively fine-tune these vulnerable components using different adversarial training methods to develop an effective and resource-efficient approach to improve model robustness. We empirically validate our proposed approach with varying dataset and algorithm parameters. We demonstrate that our approach can achieve similar performance as the more resource-intensive conventional adversarial training method.

Received: 18 April 2025 | Revised: 4 September 2025 | Accepted: 12 November 2025

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

Data Availability Statement

The data that support the findings of this study are openly available in github.io at https://doi.org/10.1109/5.726791, reference number [3].

Author Contribution Statement

Subah Karnine: Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization. Sadia Afrose: Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization. Hafiz Imtiaz: Conceptualization, Methodology, Formal analysis, Resources, Writing – original draft, Writing – review & editing, Supervision, Project administration.

Efficient Defense Against First Order Adversarial Attacks on Convolutional Neural Networks

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Journal Information

Make a Submission

Keywords