USER ATTRIBUTION IN DIGITAL FORENSICS THROUGH MODELING KEYSTROKE AND MOUSE USAGE DATA USING XGBOOST
The increase in the use of digital devices, has vastly increased the amount of data used and consequently, has increased the availability and relevance of digital evidence. Typically, digital evidence helps to establish the identity of an offender by identifying the username or the user account logged into the device at the time of offense. Investigating officers need to establish the link between that user and an actual person. This is difficult in the case of computers that are shared or compromised. Also, the increasing amount of data in digital investigations necessitates the use of advanced data analysis approaches like machine learning, while keeping pace with the constantly evolving techniques. It also requires reporting on known error rates for these advanced techniques. There have been several research studies exploring the use of behavioral biometrics to support this user attribution in digital forensics. However, the use of the state-of-the-art XGBoost algorithm, hasn’t been explored yet. This study builds on previously conducted research by modeling user interaction using the XGBoost algorithm, based on features related to keystroke and mouse usage, and verifying the performance for user attribution. With an F1 score and Area Under the Receiver Operating Curve (AUROC) of .95, the algorithm successfully attributes the user event to the right user. The XGBoost model also outperforms other classifiers based on algorithms such as Support Vector Machines (SVM), Boosted SVM and Random Forest.
- Doctor of Philosophy
- Computer and Information Technology
- West Lafayette