Correlation-Based Feature Selection for Efficient Intrusion Detection: Comparative Evaluation of Machine Learning Models on CICIDS-2017

Bilal Rafique; Sania Kanwal; Xuyang Shi

doi:10.47852/bonviewJDSIS62027552

Authors

Bilal Rafique School of Information and Control Engineering, Southwest University of Science and Technology, China https://orcid.org/0009-0008-9037-3148
Sania Kanwal School of Information and Control Engineering, Southwest University of Science and Technology, China https://orcid.org/0009-0003-5331-1259
Xuyang Shi School of Information and Control Engineering, Southwest University of Science and Technology, China

DOI:

https://doi.org/10.47852/bonviewJDSIS62027552

Keywords:

intrusion detection system, XGBoost, Random Forest, MLP, feature selection

Abstract

The increased growth of network traffic has enabled digital communication, global connectivity, driving advances in e-commerce, and cloud computing. However, this growth also increases the risk of cyberattacks, making effective and efficient intrusion detection systems (IDS) essential. Although many studies have applied machine learning (ML) and deep learning (DL) to improve detection accuracy, the majority pay less attention to computational efficiency, which is critical for real-time deployment. The proposed study evaluates three widely used ML models, namely, Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP), on the CICIDS-2017 dataset, with and without correlation-based feature selection (CFS). Results show that RF and XGBoost achieved the highest accuracy (0.998) with very low false positive rates, while MLP lagged behind in both detection and runtime performance. Notably, applying CFS reduced RF's training time by 35% without sacrificing accuracy. The proposed study's findings confirm that ensemble models, particularly RF and XGBoost, provide a strong balance between accuracy and efficiency. Moreover, feature selection emerges as a simple yet effective strategy to lower computational cost, making IDS more practical for large-scale, real-time network environments.

Received: 1 September 2025 | Revised: 22 December 2025 | Accepted: 20 January 2026

Conflicts of Interest

Xuyang Shi is a specialist for Journal of Data Science and Intelligent Systems and was not involved in the editorial review or the decision to publish this article. The authors declare that they have no conflicts of interest to this work.

Data Availability Statement

The data that support the findings of this study are openly available in the Canadian Institute for Cybersecurity (CIC), University of New Brunswick (UNB), Canada, at http://cicresearch.ca/CICDataset/CIC-IDS-2017/.

Author Contribution Statement

Bilal Rafique: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Project administration. Sania Kanwal: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Project administration. Xuyang Shi: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Supervision, Project administration, Funding acquisition.

Correlation-Based Feature Selection for Efficient Intrusion Detection: Comparative Evaluation of Machine Learning Models on CICIDS-2017

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Funding data

Journal Information

cimago-journal

Make a Submission

Announcements

Keywords