Enhancing Intrusion Detection Using Random Forest and SMOTE on the NSL‑KDD Dataset

Febri Hidayat Saputra; Ilham Ilham; Muhammad Rizal; Wisda Wisda; First Wanita; Mursalim Mursalim; Arif Fadillah

doi:10.61628/jsce.v6i3.2056

Febri Hidayat Saputra Universitas Teknologi Akba Makassar
Ilham Ilham Universitas Teknologi Akba Makassar
Muhammad Rizal Universitas Teknologi Akba Makassar
Wisda Wisda Universitas Teknologi Akba Makassar
First Wanita Universitas Teknologi Akba Makassar
Mursalim Mursalim Universitas Teknologi Akba Makassar
Arif Fadillah IkesT Muhammadiyah Palembang

DOI: https://doi.org/10.61628/jsce.v6i3.2056

Keywords: Intrusion Detection System; Random Forest; SMOTE; Cybersecurity; NSL-KDD Dataset 1. Introduction

Abstract

Intrusion Detection Systems (IDS) play a crucial role in identifying suspicious activities on computer networks. However, a major challenge in developing machine learning-based IDS is the issue of class imbalance, where attacks—being minority classes—are often overlooked by classification models. This study aims to construct an intrusion detection system based on the Random Forest algorithm integrated with the Synthetic Minority Over-sampling Technique (SMOTE) to address this problem. The NSL-KDD dataset is used for evaluation, with the data split into 80% for training and 30% for testing. Experiments include Random Forest-based feature selection and performance evaluation using accuracy, precision, recall, and F1-score metrics. The results show that the Random Forest–SMOTE combination achieves an accuracy of 99.78%, precision of 99.70%, recall of 99.88%, and an F1-score of 99.79%. The confusion matrix indicates a very low rate of false positives and false negatives. Additionally, selecting the most influential features such as src_bytes and dst_bytes improves model efficiency. Thus, the integration of Random Forest and SMOTE proves to be effective in enhancing detection sensitivity toward attacks without compromising model precision. This approach offers a significant contribution to the development of adaptive, accurate, and deployable IDS in real-world network environments.

References

Amirah, A., & Sanmorino, A. (2023). Deteksi intrusi siber pada sistem pembelajaran elektronik berbasis machine learning. Jurnal Ilmiah Informatika Global, 14(2), 12-16. https://doi.org/10.36982/jiig.v14i2.3227

Abdulhammed, R., Faezipour, M., & Abuzneid, A. (2020). Effective intrusion detection with minority class balancing techniques. Journal of Network and Computer Applications, 157, 102530. https://doi.org/10.1016/j.jnca.2020.102530

Enache, A. C., Sgârciu, V., & Petrescu-Niţă, A. (2015). Intelligent feature selection method rooted in binary bat algorithm for intrusion detection. Proceedings of the 2015 IEEE 10th Jubilee International Symposium on Applied Computational Intelligence and Informatics, 517-521. https://doi.org/10.1109/saci.2015.7208259

Hidayat, A. M. N. (2021). Sistem deteksi intrusi dan prevensi berbasis open source. Jurnal Instek (Informatika Sains Dan Teknologi), 6(1), 102. https://doi.org/10.24252/instek.v6i1.18642

Krstinić, D., Braović, M., Šerić, L., & Božić-Štulić, D. (2020). Multi-label classifier performance evaluation with confusion matrix. Computer Science & Information Technology, 100801. https://doi.org/10.5121/csit.2020.100801

Maulani, I. E., Putra, D. R. S., & Komarudin, K. (2023). Sistem deteksi intrusi cerdas: Studi perbandingan algoritma pembelajaran mesin untuk keamanan siber. Jurnal Sosial Teknologi, 3(11), 918-923. https://doi.org/10.59188/jurnalsostech.v3i11.987

Rahim, R., Ahanger, A. S., Khan, S. M., & Masoodi, F. (2021). Analysis of IDS using feature selection approach on NSL-KDD dataset. Proceedings of the International Conference on Computational Intelligence and Data Science, 475-481. https://doi.org/10.52458/978-93-91842-08-6-45

Ramadhani, F., Septiana, D., Amalia, S. N., Fadilah, P. M., & Satria, A. (2024). Klasifikasi risiko gizi buruk pada ibu hamil menggunakan metode random forest. Djtechno Jurnal Teknologi Informasi, 5(2), 370-380. https://doi.org/10.46576/djtechno.v5i2.4815

SaiRajKumar, M. (2024). GuardianAI: Smart intrusion detection for modern threats. Research Square. https://doi.org/10.21203/rs.3.rs-3897226/v1

Simanjuntak, R. P., & Sijabat, R. R. M. (2024). Meningkatkan keamanan siber dalam lingkungan Internet of Things (IoT) dengan menggunakan sistem deteksi intrusi berbasis pembelajaran mesin. Dike, 2(2), 62-68. https://doi.org/10.69688/dike.v2i2.106

Sulandri, S., Basuki, A., & Bachtiar, F. A. (2021). Metode deteksi intrusi menggunakan algoritme extreme learning machine dengan correlation-based feature selection. Jurnal Teknologi Informasi Dan Ilmu Komputer, 8(1), 103. https://doi.org/10.25126/jtiik.0813358

Shafiq, M., Tian, Z., Sun, Y., Du, X., & Guizani, N. (2022). Network intrusion detection using adaptive synthetic sampling and ensemble learning. IEEE Access, 10, 2978–2989. https://doi.org/10.1109/ACCESS.2022.3141617

Tan, T., Sama, H., Wijaya, G., & Aboagye, O. E. (2023). Studi perbandingan deteksi intrusi jaringan menggunakan machine learning: (Metode SVM dan ANN). Jurnal Teknologi Dan Informasi, 13(2), 152-164. https://doi.org/10.34010/jati.v13i2.10484

Tasmi, T., Antony, F., Dhamyanti, D., Setiawan, H., & Oklilas, F. (2023). Pengenalan pola serangan pada Internet of Thing (IoT) menggunakan support vector machine (SVM) dengan tiga kernel. Jurnal Processor, 18(2). https://doi.org/10.33998/processor.2023.18.2.1457

Zhao, Z., Zhou, W., Qiu, Z., Li, A., & Wang, J. (2021). Research on Ctrip customer churn prediction model based on random forest. Lecture Notes in Computer Science, 511-523. https://doi.org/10.1007/978-3-030-92632-8_48

Zhong, Z., Chen, Y., & Gan, C. (2021). Enhancing intrusion detection by random oversampling on imbalanced datasets. Computers & Security, 104, 102159. https://doi.org/10.1016/j.cose.2021.102159