Purdue University Graduate School
Browse

Integrating Automated Reasoning with Machine Learning for Structured Prediction and Scientific Discovery

thesis
posted on 2025-05-14, 18:43 authored by Nan JiangNan Jiang

Structural data are ubiquitous in our daily lives. However, decision-making with models learned from such data still remains a significant challenge when Machine Learning (ML) and Automated Reasoning (AR) are applied in isolation. Learning without reasoning often fails to generate output satisfying combinatorial constraints, while reasoning without learning often yields rigid models, that lack flexibility to evolving environments. Integrating ML and AR is essential but remains largely unsolved.

My research focuses on embedding automated reasoning into machine learning, to tackle challenging problems in structured prediction and AI-driven scientific discovery. My models are able to generate valid outputs satisfying complex combinatorial constraints, which greatly surpasses pure learning-based approaches. Moreover, these models are adaptive to evolving training data distributions, addressing the limitations of pure reasoning algorithms. My learning algorithms offer tight theoretical guarantees and demonstrate great empirical improvements in accuracy. Specifically, my contributions are:

(a) Combining AR and ML to ensure constraint satisfaction in machine learning: By embedding AR solvers as differentiable layers into neural network-based ML models, my work ensures constraint satisfaction of the predicted output when solving a variety of structural learning problems across operations research, combinatorial optimization, and natural language processing. Notably, in a data-driven vehicle dispatching task, our approach generates routes that 100% satisfy constraints while previous approaches produce less than 1% valid routes.

(b) Combining AR and ML to accelerate AI-driven scientific discovery: Integrating scientific approach-inspired reasoning, my work accelerates the discovery of physical knowledge from experimental data. My approach significantly extended the capabilities of existing methods in solving datasets with multiple independent variables. My approach successfully discovers ground-truth scientific expressions involving up to 50 variables, whereas previous approaches struggle with equations of just three variables.

My vision is to build an AI ecosystem with safety and robustness guarantees, encoding physical knowledge and operational constraints into machine learning models through automated reasoning. Meanwhile, I aim to enable the discovery of knowledge and constraints automatically from data, creating systems that are adaptive, reliable, and capable of addressing complex real-world challenges.

Funding

This research was supported by NSF grant CCF-1918327, NSF Career Award IIS-2339844, DOE – Fusion Energy Science grant: DE-SC0024583.

History

Degree Type

  • Doctor of Philosophy

Department

  • Computer Science

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Yexiang Xue

Additional Committee Member 2

Willem-Jan Van Hoeve

Additional Committee Member 3

Jean Honorio

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC