Using Variational Autoencoders with Machine Learning Algorithms in Cyber Security Applications

Thomas  Taylor; Amna  Eleyan; Mohammed  Al-Khalidi

doi:10.47852/bonviewAIA52024151

Authors

Thomas Taylor Department of Computing and Mathematics, Manchester Metropolitan University, UK
Amna Eleyan Department of Computing and Mathematics, Manchester Metropolitan University, UK https://orcid.org/0000-0002-2025-3027
Mohammed Al-Khalidi Department of Computing and Mathematics, Manchester Metropolitan University, UK https://orcid.org/0000-0002-1655-8514

DOI:

https://doi.org/10.47852/bonviewAIA52024151

Keywords:

machine learning, cybersecurity, autoencoder, neural network, SVM, variational autoencoders, malware

Abstract

In the evolving field of cybersecurity, detecting malicious activity in high-dimensional network data remains a persistent challenge for traditional machine learning (ML) techniques. This study investigates the use of convolutional variational autoencoders (VAEs) to generate latent features that enhance the performance of various ML classifiers on the 2015 NSL-KDD dataset. Classifiers, including Gaussian Naïve Bayes (GNB), support vector machines (SVMs) with Radial Basis Function (RBF) kernel, decision trees, and dense neural networks, were evaluated using metrics such as accuracy, precision, recall, F1 score, and the Matthews Correlation Coefficient (MCC). To assess the effectiveness of VAEs, Principal Component Analysis (PCA) was used as a baseline dimensionality reduction method, and performance comparisons were made. The best-performing model was an SVM with an RBF kernel, a PCA (threshold = 0.92), and a VAE with six latent features, achieving an accuracy of 82.8%, an F1 score of 0.830, and an MCC of 0.682. The results indicate that VAEs can significantly enhance classifier performance, particularly in GNB and SVM models, suggesting their value in developing more effective intrusion detection systems.

Received: 23 August 2024 | Revised: 3 June 2025 | Accepted: 15 July 2025

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

Data Availability Statement

The data that support the findings of this study are openly available in NSL-KDD Dataset at https://github.com/t-taylor/cvaes-research/tree/main/results.

Author Contribution Statement

Thomas Taylor: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Amna Eleyan: Conceptualization, Methodology, Writing – original draft, Writing – review & editing, Supervision, Project administration. Mohammed Al-Khalidi: Validation, Resources, Writing – original draft, Writing – review & editing, Visualization, Supervision.