Context-Free Word Importance Scores for Attacking Neural Networks
DOI:
https://doi.org/10.47852/bonviewJCCE2202406Keywords:
neural networks, adversarial attacks, NLPAbstract
Leave-One-Out (LOO) scores provide estimates of feature importance in neural networks, for adversarial attacks. In this work, we present context-free word scores as a query-efficient alternative. Experiments show that these approximations are quite effective for black box attacks on neural networks trained for text classification, particularly for CNNs. The model query count for this method scales as 0(vocan_size * model_input_length). It is independent of the number of examples and features to be perturbed.
Received: 13 July 2022 | Revised: 18 July 2022 | Accepted: 24 August 2022
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Metrics
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.