ETD PDF

Classifying Imbalanced Data for DDoS Attack Detection

Citation

Alghamdi, Amal Ali. (2021-12). Classifying Imbalanced Data for DDoS Attack Detection. Theses and Dissertations Collection, University of Idaho Library Digital Collections. https://www.lib.uidaho.edu/digital/etd/items/alghamdi_idaho_0089n_12287.html

Title:
Classifying Imbalanced Data for DDoS Attack Detection
Author:
Alghamdi, Amal Ali
Date:
2021-12
Embargo Remove Date:
2023-12-21
Program:
Computer Science
Subject Category:
Computer science
Abstract:

In the first quarter of 2021, researchers witnessed over 2.8 million DDoS attacks—a 32% increase from the same period in 2020, as reported by Info-Security magazine on May 18, 2021. The magazine also noted that the number of attacks against educational institutions has increased by 41% over the past three quarters. DDoS has become a serious issue for many organizations and individuals. The evolution of networks has ushered in a level of complexity that is the enemy of security. Currently, attacks are more prevalent and at the same time more noticeable due to the variety of features that exist on networks, a consequence of the constant escalation between attackers and defenders. Machine learning algorithms (MLAs) have become a tool to help thicken the layers of defense. To be effective, MLAs must be trained in ways that provide high confidence for detection and prevention, which boils down to precision and accuracy (i.e., low false positives and/or high true positives). This work has developed a setup for establishing a measured intrusion detection system (IDS) that can help to better understand and identify the various unique features of a network in order to better prevent DoS and DDos attacks from being successful.

The goal is to develop models that can predict (i.e., classify) with high precision and accuracy different types of DoS/DDoS attacks with low false positive/negative rates. In addition to dealing with the multiclass classification and extremely imbalance problems, the derived model leverages two feature selection techniques to reduce the number of features in the dataset and help improve the model's execution time, thereby reducing the IDS complexity. A combination of under-sampling with adjusting weight was applied to handle the imbalance problem. The extracted data was evaluated using supervised MLAs, including Random Forest, Decision tree, Naive Bayes, Logistic regression, and ensemble methods. Ensemble methods using supervised outcomes aim to improve the overall performance of the classification. The experiments utilized the popular benchmark NSL-KDD and CICIDS2017 datasets. Random Forest achieved the best performance results, decreasing 37% of the training and test time. In addition to solving the imbalance problem caused by feature selection, it increased accuracy 6.25% and FPR 21%. The random forest model has achieved 99% accuracy and 0.0001 for the False-Positive rate. Furthermore, it can detect minor classes with more than 80% accuracy.

Description:
masters, M.Engr., Computer Science -- University of Idaho - College of Graduate Studies, 2021-12
Major Professor:
Sheldon, Frederick
Committee:
Marshall, Xiaogang; Song, Jia; Soule, Terry
Defense Date:
2021-12
Identifier:
Alghamdi_idaho_0089N_12287
Type:
Text
Format Original:
PDF
Format:
application/pdf

Contact us about this record

Rights
Rights:
In Copyright - Educational Use Permitted. For more information, please contact University of Idaho Library Special Collections and Archives Department at libspec@uidaho.edu.
Standardized Rights:
http://rightsstatements.org/vocab/InC-EDU/1.0/