Correlation-Based Feature Selection for Efficient Intrusion Detection: Comparative Evaluation of Machine Learning Models on CICIDS-2017
DOI:
https://doi.org/10.47852/bonviewJDSIS62027552Keywords:
intrusion detection system, XGBoost, Random Forest, MLP, feature selectionAbstract
The increased growth of network traffic has enabled digital communication, global connectivity, driving advances in e-commerce, and cloud computing. However, this growth also increases the risk of cyberattacks, making effective and efficient intrusion detection systems (IDS) essential. Although many studies have applied machine learning (ML) and deep learning (DL) to improve detection accuracy, the majority pay less attention to computational efficiency, which is critical for real-time deployment. The proposed study evaluates three widely used ML models, namely, Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP), on the CICIDS-2017 dataset, with and without correlation-based feature selection (CFS). Results show that RF and XGBoost achieved the highest accuracy (0.998) with very low false positive rates, while MLP lagged behind in both detection and runtime performance. Notably, applying CFS reduced RF's training time by 35% without sacrificing accuracy. The proposed study's findings confirm that ensemble models, particularly RF and XGBoost, provide a strong balance between accuracy and efficiency. Moreover, feature selection emerges as a simple yet effective strategy to lower computational cost, making IDS more practical for large-scale, real-time network environments.
Received: 1 September 2025 | Revised: 22 December 2025 | Accepted: 20 January 2026
Conflicts of Interest
Xuyang Shi is a specialist for Journal of Data Science and Intelligent Systems and was not involved in the editorial review or the decision to publish this article. The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in the Canadian Institute for Cybersecurity (CIC), University of New Brunswick (UNB), Canada, at http://cicresearch.ca/CICDataset/CIC-IDS-2017/.
Author Contribution Statement
Bilal Rafique: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Project administration. Sania Kanwal: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Project administration. Xuyang Shi: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Supervision, Project administration, Funding acquisition.Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Funding data
-
National Natural Science Foundation of China
Grant numbers 62572406 -
Sichuan Provincial Science and Technology Support Program
Grant numbers 23ZX7136