Detection of IoT Botnets using Decision Trees
International Data Corporation (IDC) data estimates that 152,200 Internet of things (IoT) devices will be connected to the Internet every minute by the year 2025. This rapid expansion in the utilization of IoT devices in everyday life leads to an increase in the attack surface for cybercriminals. IoT devices are frequently compromised and used for the creation of botnets. However, it is difficult to apply the traditional methods to counteract IoT botnets and thus calls for finding effective and efficient methods to mitigate such threats. In this work, the network snapshots of IoT traffic infected with two botnets, i.e., Mirai and Bashlite, are studied. Specifically, the collected datasets include network traffic from 9 different IoT devices such as baby monitor, doorbells, thermostat, web cameras, and security cameras. Each dataset consists of 115 stream aggregation feature statistics like weight, mean, covariance, correlation coefficient, standard deviation, radius, and magnitude with a timeframe decay factor, along with a class label defining the traffic as benign or anomalous.
The goal of the research is to identify a proper machine learning method that can detect IoT botnet traffic accurately and in real-time on IoT edge devices with low computation power, in order to form the first line of defense in an IoT network. The initial step is to identify the most important features that distinguish between benign and anomalous traffic for IoT devices. Specifically, the Input Perturbation Ranking algorithm with XGBoostis applied to find the 9 most important features among the 115 features. These 9 features can be collected in real time and be applied as inputs to any detection method. Next, a supervised predictive machine learning method, i.e., Decision Trees, is proposed for faster and accurate detection of botnet traffic. The advantage of using decision trees over other machine learning methodologies, is that it achieves accurate results with low computation time and power. Unlike deep learning methodologies, decision trees can provide visual representation of the decision making and detection process. This can be easily translated into explicit security policies in the IoT environment. In the experiments conducted, it can be clearly seen that decision trees can detect anomalous traffic with an accuracy of 99.997% and takes 59 seconds for training and 0.068 seconds for prediction, which is much faster than the state-of-art deep-learning based detector, i.e., Kitsune. Moreover, our results show that decision trees have an extremely low false positive rate of 0.019%. Using the 9 most important features, decision trees can further reduce the processing time while maintaining the accuracy. Hence, decision trees with important features are able to accurately and efficiently detect IoT botnets in real time and on a low performance edge device such as Raspberry Pi.