2601002807
  • Open Access
  • Article

Investigating Oversampling Techniques to Mitigate Class Imbalance in Network Intrusion Detection Datasets

  • Huy Minh Dinh *,   
  • Wei Zong,   
  • Yang-Wai Chow

Received: 14 Nov 2025 | Revised: 21 Dec 2025 | Accepted: 09 Jan 2026 | Published: 27 Jan 2026

Abstract

Network Intrusion Detection Systems (NIDS) play a crucial role in safeguarding computer networks against increasingly sophisticated cyber threats. However, the performance of machine learning–based NIDS is often constrained by severe class imbalance, in which benign traffic dominates and rare attack types are underrepresented, resulting in biased learning and reduced detection of minority classes that are critical to identify. This study presents a comprehensive comparative analysis of traditional and deep learning–based oversampling methods to mitigate class imbalance and enhance classification performance in NIDS. The evaluation is conducted on the UNSW-NB15 and TON IoT benchmark datasets using a range of machine learning and deep learning classifiers, with performance assessed using metrics suitable for imbalanced data. Results show that traditional and hybrid oversampling methods provide stable and interpretable improvements, whereas deep generative approaches exhibit strong potential but greater variability across classifiers. In the UNSW-NB15 dataset, severe class imbalance and class overlap limit performance gains from resampling, while in the TON IoT dataset, classifiers achieve strong baselines even without oversampling. XGBoost consistently demonstrates robust and reliable performance across datasets. Overall, KMeans-SMOTE, SMOTE-NC, and CVAE emerge as the most effective oversampling techniques under varying conditions. This study highlights the trade-offs between interpretability, stability, and detection performance, offering practical guidance for selecting oversampling strategies to improve rare attack detection in practical cybersecurity applications.

References 

  • 1.

    Garcia-Teodoro, P.; Diaz-Verdejo, J.; Macia-Fernandez, G.; et al. Anomaly-based network intrusion detection: Techniques, systems and challenges. Comput. Secur. 2009, 28, 18–28.

  • 2.

    Khraisat, A.; Gondal, I.; Vamplew, P.; et al. Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2019, 2, 20.

  • 3.

    Schrotter, M.; Niemann, A.; Schnor, B. A comparison of neural-network-based intrusion detection against signature-based detection in IoT networks. Information 2024, 15, 164.

  • 4.

    Ahmad, R.; Alsmadi, I.; Alhamdani, W.; et al. Zero-day attack detection: a systematic literature review. Artif. Intell. Rev. 2023, 56, 10733–10811.

  • 5.

    Hwang, R.H.; Peng, M.C.; Huang, C.W.; et al. An unsupervised deep learning model for early network traffic anomaly detection. IEEE Access 2020, 8, 30387–30399.

  • 6.

    Ma, W. Analysis of anomaly detection method for Internet of things based on deep learning. Trans. Emerg. Telecommun. Technol. 2020, 31, e3893.

  • 7.

    Van, N.T.; Thinh, T.N.; Sach, L.T. An anomaly-based network intrusion detection system using deep learning. In Proceedings of the International Conference on System Science and Engineering (ICSSE), Ho Chi Minh City, Vietnam, 21–23 July 2017; pp. 210–214.

  • 8.

    Bace, R.G.; Mell, P. Intrusion Detection Systems. 2001. Available online: https://all.net/books/standards/NIST-CSRC/csrc.nist.gov/publications/nistpubs/800-31/sp800-31.pdf (accessed on 1 November 2025).

  • 9.

    Dong, B.; Wang, X. Comparison deep learning method to traditional methods using for network intrusion detection. In Proceedings of the 8th IEEE International Conference on Communication Software and Networks (ICCSN), Beijing, China, 4–6 June 2016; pp. 581–585.

  • 10.

    Rodda, S.; Erothi, U.S.R. Class imbalance problem in the network intrusion detection systems. In Proceedings of the International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, India, 3–5 March 2016; pp. 2685–2688.

  • 11.

    Rani, M.; Gagandeep. Effective network intrusion detection by addressing class imbalance with deep neural networks multimedia tools and applications. Multimed. Tools Appl. 2022, 81, 8499–8518.

  • 12.

    Longadge, R.; Dongre, S. Class imbalance problem in data mining review. arXiv 2013, arXiv:1305.1707.

  • 13.

    Al-Qarni, E.A.; Al-Asmari, G.A. Addressing imbalanced data in network intrusion detection: A review and survey. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 136–143.

  • 14.

    Tran, N.; Chen, H.; Jiang, J.; et al. Effect of class imbalance on the performance of machine learning-based network intrusion detection. Int. J. Perform. Eng. 2021, 17, 741.

  • 15.

    Wheelus, C.; Bou-Harb, E.; Zhu, X. Tackling class imbalance in cyber security datasets. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI), Salt Lake City, UT, USA, 6–9 July 2018; pp. 229–232.

  • 16.

    Kudithipudi, S.; Narisetty, N.; Kancherla, G.R.; et al. Evaluating the Efficacy of Resampling Techniques in Addressing Class Imbalance for Network Intrusion Detection Systems Using Support Vector Machines. Ing. Des Syst. Inf. 2023, 28, 1229.

  • 17.

    Rahma, F.; Rachmadi, R.F.; Pratomo, B.A.; et al. Assessing the Effectiveness of Oversampling and Undersampling Techniques for Intrusion Detection on an Imbalanced Dataset. In Proceedings of the IEEE Industrial Electronics and Applications Conference (IEACon), Penang, Malaysia, 6–7 November 2023; pp. 92–97.

  • 18.

    Bagui, S.; Li, K. Resampling imbalanced data for network intrusion detection datasets. J. Big Data 2021, 8, 6.

  • 19.

    Ahmed, H.A.; Hameed, A.; Bawany, N.Z. Network intrusion detection using oversampling technique and machine learning algorithms. PeerJ Comput. Sci. 2022, 8, e820.

  • 20.

    Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; et al. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357.

  • 21.

    Derhab, A.; Aldweesh, A.; Emam, A.Z.; et al. Intrusion detection system for internet of things based on temporal convolution neural network and efficient feature engineering. Wirel. Commun. Mob. Comput. 2020, 2020, 6689134.

  • 22.

    Bulavas, V.; Marcinkevicius, V.; Ruminski, J. Study of multi-class classification algorithms’ performance on highly imbalanced network intrusion datasets. Informatica 2021, 32, 441–475.

  • 23.

    Omari, K.; Taoussi, C.; Oukhatar, A. Comparative Analysis of Undersampling, Oversampling, and SMOTE Techniques for Addressing Class Imbalance in Phishing Website Detection. Int. J. Adv. Comput. Sci. Appl. 2025, 16, 751–757.

  • 24.

    He, H.; Bai, Y.; Garcia, E.A.; et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328.

  • 25.

    Liu, J.; Gao, Y.; Hu, F. A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Comput. Secur. 2021, 106, 102289.

  • 26.

    Wang, Z.; Jiang, D.; Huo, L.; et al. An efficient network intrusion detection approach based on deep learning. Wirel. Netw. 2021. https://doi.org/10.1007/s11276-021-02698-9.

  • 27.

    Pan, L.; Xie, X. Network intrusion detection model based on PCA+ ADASYN and XGBoost. In Proceedings of the 3rd International Conference on E-Business, Information Management and Computer Science, Wuhan, China, 5–6 December 2020; pp. 44–48.

  • 28.

    Douzas, G.; Bacao, F.; Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf. Sci. 2018, 465, 1–20.

  • 29.

    Wu, T.; Fan, H.; Zhu, H.; et al. Intrusion detection system combined enhanced random forest with SMOTE algorithm. Eurasip J. Adv. Signal Process. 2022, 2022, 39.

  • 30.

    Talukder, M.A.; Khalid, M.; Sultana, N. A hybrid machine learning model for intrusion detection in wireless sensor networks leveraging data balancing and dimensionality reduction. Sci. Rep. 2025, 15, 4617.

  • 31.

    Priyadarsini, P.I.; Leela, P.S.; Jyothi, B. Towards intelligent machine learning models for intrusion detection system. Turk. J. Comput. Math. Educ. 2021, 12, 643–655.

  • 32.

    Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing; Springer: Berlin/Heidelberg, Germany, 2005; pp. 878–887.

  • 33.

    Zhang, J.; Zhang, Y.; Li, K. A network intrusion detection model based on the combination of relieff and borderline-smote. In Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence, Qingdao, China, 3–6 July 2020; pp. 199–203.

  • 34.

    Sun, Y.; Que, H.; Cai, Q.; et al. Borderline smote algorithm and feature selection-based network anomalies detection strategy. Energies 2022, 15, 4751.

  • 35.

    Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. Available online: http://www.deeplearningbook.org (accessed on 4 November 2025).

  • 36.

    Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114.

  • 37.

    Doersch, C. Tutorial on variational autoencoders. arXiv 2016, arXiv:1606.05908.

  • 38.

    Sohn, K.; Lee, H.; Yan, X. Learning structured output representation using deep conditional generative models. Adv. Neural Inf. Process. Syst. 2015, 28.

  • 39.

    Xu, X.; Li, J.; Yang, Y.; et al. Toward effective intrusion detection using log-cosh conditional variational autoencoder. IEEE Internet Things J. 2020, 8, 6187–6196.

  • 40.

    Yang, Y.; Zheng, K.; Wu, C.; et al. Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network. Sensors 2019, 19, 2528.

  • 41.

    Azmin, S.; Islam, A.M.A.A. Network intrusion detection system based on conditional variational laplace autoencoder. In Proceedings of the 7th International Conference on Networking, Systems and Security, Dhaka, Bangladesh, 22–24 December 2020; pp. 82–88.

  • 42.

    Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; et al. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27.

  • 43.

    Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 214–223.

  • 44.

    Gulrajani, I.; Ahmed, F.; Arjovsky, M.; et al. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst. 2017, 30.

  • 45.

    Mu, Z.; Shi, X.; Dogan, S. Information system security reinforcement with wgan-gp for detection of zero-day attacks. In Proceedings of the 7th International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 24–27 May 2024; pp. 105–110.

  • 46.

    Zhang, L.; Jiang, S.; Shen, X.; et al. PWG-IDS: An intrusion detection model for solving class imbalance in IIoT networks using generative adversarial networks. arXiv 2021, arXiv:2110.03445.

  • 47.

    Tavallaee, M.; Bagheri, E.; Lu, W.; et al. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6.

  • 48.

    Moustafa, N.; Slay, J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6.

  • 49.

    Moustafa, N. A new distributed architecture for evaluating AI-based security systems at the edge: Network TON IoT datasets. Sustain. Cities Soc. 2021, 72, 102994.

  • 50.

    Moustafa, N.; Slay, J. The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Secur. Journal: Glob. Perspect. 2016, 25, 18–31.

  • 51.

    Booij, T.M.; Chiscop, I.; Meeuwissen, E.; Moustafa, N.; Den Hartog, F.T. ToN IoT: The role of heterogeneity and the need for standardization of features and attack types in IoT network intrusion data sets. IEEE Internet Things J. 2021, 9, 485–496.

  • 52.

    Zoghi, Z.; Serpen, G. UNSW-NB15 computer security dataset: Analysis through visualization. Secur. Priv. 2024, 7, e331.

  • 53.

    Salman, T.; Bhamare, D.; Erbad, A.; et al. Machine learning for anomaly detection and categorization in multi-cloud environments. In Proceedings of the IEEE 4th international conference on cyber security and cloud computing (CSCloud), New York, NY, USA, 26–28 June 2017; pp. 97–103. 

Share this article:
How to Cite
Dinh, H. M.; Zong, W.; Chow, Y.-W. Investigating Oversampling Techniques to Mitigate Class Imbalance in Network Intrusion Detection Datasets. Pragmatic Cybersecurity 2026, 1 (1), 4.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2026 by the authors.