Malicious URLs pose serious cyber-security threats to the Internet users. It is critical to detect malicious URLs so that they could be blocked from user access. In the past few years, several techniques have been proposed to differentiate malicious URLs from benign ones with the help of machine learning. Machine learning algorithms learn trends and patterns in a data-set and use them to identify any anomalies. In this work, we attempt to find generic features for detecting malicious URLs by analyzing two publicly available malicious URL data-sets. In order to achieve this task, we identify a list of substantial features that can be used to classify all types of malicious URLs. Then, we select the most significant lexical features by using Chi-Square and ANOVA based statistical tests. The effectiveness of these feature sets is then tested by using a combination of single and ensemble machine learning algorithms. We build a machine learning based real-time malicious URL detection system as a web service to detect malicious URLs in a browser. We implement a chrome extension that intercepts a browser’s URL requests and sends them to web service for analysis. We implement the web service as well that classifies a URL as benign or malicious using the saved ML model. We also evaluate the performance of our web service to test whether the service is scalable.
History
Degree Type
Master of Science in Electrical and Computer Engineering