Enhancing Intrusion Detection Using Random Forest and SMOTE on the NSL‑KDD Dataset
Abstract
Intrusion Detection Systems (IDS) play a crucial role in identifying suspicious activities on computer networks. However, a major challenge in developing machine learning-based IDS is the issue of class imbalance, where attacks—being minority classes—are often overlooked by classification models. This study aims to construct an intrusion detection system based on the Random Forest algorithm integrated with the Synthetic Minority Over-sampling Technique (SMOTE) to address this problem. The NSL-KDD dataset is used for evaluation, with the data split into 80% for training and 30% for testing. Experiments include Random Forest-based feature selection and performance evaluation using accuracy, precision, recall, and F1-score metrics. The results show that the Random Forest–SMOTE combination achieves an accuracy of 99.78%, precision of 99.70%, recall of 99.88%, and an F1-score of 99.79%. The confusion matrix indicates a very low rate of false positives and false negatives. Additionally, selecting the most influential features such as src_bytes and dst_bytes improves model efficiency. Thus, the integration of Random Forest and SMOTE proves to be effective in enhancing detection sensitivity toward attacks without compromising model precision. This approach offers a significant contribution to the development of adaptive, accurate, and deployable IDS in real-world network environments.
References
Amirah, A., & Sanmorino, A. (2023). Deteksi intrusi siber pada sistem pembelajaran elektronik berbasis machine learning. Jurnal Ilmiah Informatika Global, 14(2), 12-16. https://doi.org/10.36982/jiig.v14i2.3227
Abdulhammed, R., Faezipour, M., & Abuzneid, A. (2020). Effective intrusion detection with minority class balancing techniques. Journal of Network and Computer Applications, 157, 102530. https://doi.org/10.1016/j.jnca.2020.102530
Enache, A. C., Sgârciu, V., & Petrescu-Niţă, A. (2015). Intelligent feature selection method rooted in binary bat algorithm for intrusion detection. Proceedings of the 2015 IEEE 10th Jubilee International Symposium on Applied Computational Intelligence and Informatics, 517-521. https://doi.org/10.1109/saci.2015.7208259
Hidayat, A. M. N. (2021). Sistem deteksi intrusi dan prevensi berbasis open source. Jurnal Instek (Informatika Sains Dan Teknologi), 6(1), 102. https://doi.org/10.24252/instek.v6i1.18642
Krstinić, D., Braović, M., Šerić, L., & Božić-Štulić, D. (2020). Multi-label classifier performance evaluation with confusion matrix. Computer Science & Information Technology, 100801. https://doi.org/10.5121/csit.2020.100801
Maulani, I. E., Putra, D. R. S., & Komarudin, K. (2023). Sistem deteksi intrusi cerdas: Studi perbandingan algoritma pembelajaran mesin untuk keamanan siber. Jurnal Sosial Teknologi, 3(11), 918-923. https://doi.org/10.59188/jurnalsostech.v3i11.987
Rahim, R., Ahanger, A. S., Khan, S. M., & Masoodi, F. (2021). Analysis of IDS using feature selection approach on NSL-KDD dataset. Proceedings of the International Conference on Computational Intelligence and Data Science, 475-481. https://doi.org/10.52458/978-93-91842-08-6-45
Ramadhani, F., Septiana, D., Amalia, S. N., Fadilah, P. M., & Satria, A. (2024). Klasifikasi risiko gizi buruk pada ibu hamil menggunakan metode random forest. Djtechno Jurnal Teknologi Informasi, 5(2), 370-380. https://doi.org/10.46576/djtechno.v5i2.4815
SaiRajKumar, M. (2024). GuardianAI: Smart intrusion detection for modern threats. Research Square. https://doi.org/10.21203/rs.3.rs-3897226/v1
Simanjuntak, R. P., & Sijabat, R. R. M. (2024). Meningkatkan keamanan siber dalam lingkungan Internet of Things (IoT) dengan menggunakan sistem deteksi intrusi berbasis pembelajaran mesin. Dike, 2(2), 62-68. https://doi.org/10.69688/dike.v2i2.106
Sulandri, S., Basuki, A., & Bachtiar, F. A. (2021). Metode deteksi intrusi menggunakan algoritme extreme learning machine dengan correlation-based feature selection. Jurnal Teknologi Informasi Dan Ilmu Komputer, 8(1), 103. https://doi.org/10.25126/jtiik.0813358
Shafiq, M., Tian, Z., Sun, Y., Du, X., & Guizani, N. (2022). Network intrusion detection using adaptive synthetic sampling and ensemble learning. IEEE Access, 10, 2978–2989. https://doi.org/10.1109/ACCESS.2022.3141617
Tan, T., Sama, H., Wijaya, G., & Aboagye, O. E. (2023). Studi perbandingan deteksi intrusi jaringan menggunakan machine learning: (Metode SVM dan ANN). Jurnal Teknologi Dan Informasi, 13(2), 152-164. https://doi.org/10.34010/jati.v13i2.10484
Tasmi, T., Antony, F., Dhamyanti, D., Setiawan, H., & Oklilas, F. (2023). Pengenalan pola serangan pada Internet of Thing (IoT) menggunakan support vector machine (SVM) dengan tiga kernel. Jurnal Processor, 18(2). https://doi.org/10.33998/processor.2023.18.2.1457
Zhao, Z., Zhou, W., Qiu, Z., Li, A., & Wang, J. (2021). Research on Ctrip customer churn prediction model based on random forest. Lecture Notes in Computer Science, 511-523. https://doi.org/10.1007/978-3-030-92632-8_48
Zhong, Z., Chen, Y., & Gan, C. (2021). Enhancing intrusion detection by random oversampling on imbalanced datasets. Computers & Security, 104, 102159. https://doi.org/10.1016/j.cose.2021.102159