Purdue University Graduate School

File(s) under embargo

Reason: I am trying to get a paper published on the same work at a conference.





until file(s) become available

Data-based Explanations of Random Forest using Machine Unlearning

posted on 2023-12-03, 07:23 authored by Tanmay Laxman SurveTanmay Laxman Surve

Tree-based machine learning models, such as decision trees and random forests, are one of the most widely used machine learning models primarily because of their predictive power in supervised learning tasks and ease of interpretation. Despite their popularity and power, these models have been found to produce unexpected or discriminatory behavior. Given their overwhelming success for most tasks, it is of interest to identify root causes of the unexpected and discriminatory behavior of tree-based models. However, there has not been much work on understanding and debugging tree-based classifiers in the context of fairness. We introduce FairDebugger, a system that utilizes recent advances in machine unlearning research to determine training data subsets responsible for model unfairness. Given a tree-based model learned on a training dataset, FairDebugger identifies the top-k training data subsets responsible for model unfairness, or bias, by measuring the change in model parameters when parts of the underlying training data are removed. We describe the architecture of FairDebugger and walk through real-world use cases to demonstrate how FairDebugger detects these patterns and their explanations.


Degree Type

  • Master of Science


  • Computer and Information Technology

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Romila Pradhan

Additional Committee Member 2

Julia Rayz

Additional Committee Member 3

John A Springer