Statistics reveal a huge increase in cyberattacks making technology businesses more susceptible to data loss. With increasing application of machine learning in different domains, studies have been focused on building cognitive models for traffic anomaly detection in a communication network. These studies have led to generation of datasets containing network traffic data packets, usually captured using softwares like Wireshark. These datasets contain high dimensional data corresponding to benign data packets and attack data packets of known attacks. Recent research has mainly focused on developing machine learning architectures that are able to extract useful information from high dimensional datasets to detect attack data packets in a network. In addition, machine learning algorithms are currently trained to detect only documented attacks with available training data. However, with the proliferation of new cyberattacks and zero-day attacks with little to no training data available, current employed algorithms have little to no success in detecting new attacks. In this thesis, we focus on detecting rare attacks using transfer learning from a dataset containing information pertaining to known attacks.

In the literature, there is proof of concept for both classical machine learning and deep learning approaches for anomaly detection. We show that a deep learning approach outperforms explicit statistical modeling based approaches by at least 21% for the used dataset. We perform a preliminary survey of candidate deep learning architectures before testing for transferability and propose a Convolutional Neural Network architecture that is 99.65% accurate in classifying attack data packets.

To test for transferability, we train this proposed CNN architecture with a known attack and test it's performance on attacks that are unknown to the network. For this model to extract adequate information for transferability, the model requires a higher representation of attack data in the training dataset with the current attack data comprising only 20% of the dataset. To overcome the problem of small training sets, several techniques to boost the number of attack data packets are employed like a novel synthetic dataset based training and bootstrapped dataset training.

Our study results in identification of training-testing attack pairs that show high learning transferability. Most of the strong and consistent correlations are observed among Denial of Service(DoS) training-testing attack pairs. Furthermore, we propose hypotheses for model generalization. Our results are validated by a study of dataset features and attack characteristics using the Recursive Feature Elimination(RFE) algorithm.