Purdue University Graduate School
Browse

Trustworthy Reinforcement Learning for Dynamic Pricing and Large Language Model Alignment

Download (12.1 MB)
thesis
posted on 2025-07-09, 14:12 authored by Pangpang LiuPangpang Liu
<p dir="ltr">As personalized decision-making systems become more prevalent, ensuring their trustworthiness—in terms of robustness, fairness, and alignment with human intent—has become increasingly critical. This dissertation investigates trustworthy reinforcement learning (RL) in the context of dynamic pricing and human-in-the-loop learning.</p><p><br></p><p dir="ltr">In Chapter 2, we study the challenge of strategic buyer behavior in personalized pricing, where users may manipulate their reported features to gain lower prices. Such strategic manipulation can significantly degrade a seller’s revenue and undermine system integrity. We propose a dynamic pricing policy that accounts for this adversarial behavior and adapts accordingly, ensuring robust revenue performance under such strategic interactions.</p><p><br></p><p dir="ltr">In Chapter 3, we extend the study of dynamic pricing by introducing fairness constraints. Contextual pricing decisions that lead to group-based disparities—such as those by race or gender—can trigger negative perceptions or legal violations. We design a fairness-aware pricing framework that balances revenue optimization with social responsibility, even when buyers strategically manipulate their sensitive attributes.</p><p><br></p><p dir="ltr">In Chapter 4, we address the challenge of data efficiency and alignment in reinforcement learning from human feedback (RLHF), a key component in aligning large language models (LLMs) with human preferences. Human feedback is often costly and noisy. To this end, we propose a dual active learning framework to identify the most informative data and labelers. We further develop a pessimistic RL approach that ensures safe and reliable policy learning based on uncertain feedback.</p><p><br></p><p dir="ltr">Together, these three lines of work contribute to the foundation of trustworthy reinforcement learning in real-world decision-making systems, spanning economic robustness, ethical fairness, and human-centric learning.</p>

History

Degree Type

  • Doctor of Philosophy

Department

  • Management

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Will Wei Sun

Additional Committee Member 2

J.George Shanthikumar

Additional Committee Member 3

Yichen Zhang

Additional Committee Member 4

Weibin Mo

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC