Integrating Automated Reasoning with Machine Learning for Structured Prediction and Scientific Discovery
Structural data are ubiquitous in our daily lives. However, decision-making with models learned from such data still remains a significant challenge when Machine Learning (ML) and Automated Reasoning (AR) are applied in isolation. Learning without reasoning often fails to generate output satisfying combinatorial constraints, while reasoning without learning often yields rigid models, that lack flexibility to evolving environments. Integrating ML and AR is essential but remains largely unsolved.
My research focuses on embedding automated reasoning into machine learning, to tackle challenging problems in structured prediction and AI-driven scientific discovery. My models are able to generate valid outputs satisfying complex combinatorial constraints, which greatly surpasses pure learning-based approaches. Moreover, these models are adaptive to evolving training data distributions, addressing the limitations of pure reasoning algorithms. My learning algorithms offer tight theoretical guarantees and demonstrate great empirical improvements in accuracy. Specifically, my contributions are:
(a) Combining AR and ML to ensure constraint satisfaction in machine learning: By embedding AR solvers as differentiable layers into neural network-based ML models, my work ensures constraint satisfaction of the predicted output when solving a variety of structural learning problems across operations research, combinatorial optimization, and natural language processing. Notably, in a data-driven vehicle dispatching task, our approach generates routes that 100% satisfy constraints while previous approaches produce less than 1% valid routes.
(b) Combining AR and ML to accelerate AI-driven scientific discovery: Integrating scientific approach-inspired reasoning, my work accelerates the discovery of physical knowledge from experimental data. My approach significantly extended the capabilities of existing methods in solving datasets with multiple independent variables. My approach successfully discovers ground-truth scientific expressions involving up to 50 variables, whereas previous approaches struggle with equations of just three variables.
My vision is to build an AI ecosystem with safety and robustness guarantees, encoding physical knowledge and operational constraints into machine learning models through automated reasoning. Meanwhile, I aim to enable the discovery of knowledge and constraints automatically from data, creating systems that are adaptive, reliable, and capable of addressing complex real-world challenges.
Funding
This research was supported by NSF grant CCF-1918327, NSF Career Award IIS-2339844, DOE – Fusion Energy Science grant: DE-SC0024583.
History
Degree Type
- Doctor of Philosophy
Department
- Computer Science
Campus location
- West Lafayette