Purdue University Graduate School
Browse

MODEL-FREE ALGORITHMS FOR CONSTRAINED REINFORCEMENT LEARNING IN DISCOUNTED AND AVERAGE REWARD SETTINGS

Download (2.2 MB)
thesis
posted on 2024-10-07, 17:12 authored by Qinbo BaiQinbo Bai

Reinforcement learning (RL), which aims to train an agent to maximize its accumulated reward through time, has attracted much attention in recent years. Mathematically, RL is modeled as a Markov Decision Process, where the agent interacts with the environment step by step. In practice, RL has been applied to autonomous driving, robotics, recommendation systems, and financial management. Although RL has been greatly studied in the literature, most proposed algorithms are model-based, which requires estimating the transition kernel. To this end, we begin to study the sample efficient model-free algorithms under different settings.

Firstly, we propose a conservative stochastic primal-dual algorithm in the infinite horizon discounted reward setting. The proposed algorithm converts the original problem from policy space to the occupancy measure space, which makes the non-convex problem linear. Then, we advocate the use of a randomized primal-dual approach to achieve O(\eps^-2) sample complexity, which matches the lower bound.

However, when it comes to the infinite horizon average reward setting, the problem becomes more challenging since the environment interaction never ends and can’t be reset, which makes reward samples not independent anymore. To solve this, we design an epoch-based policy-gradient algorithm. In each epoch, the whole trajectory is divided into multiple sub-trajectories with an interval between each two of them. Such intervals are long enough so that the reward samples are asymptotically independent. By controlling the length of trajectory and intervals, we obtain a good gradient estimator and prove the proposed algorithm achieves O(T^3/4) regret bound.

History

Degree Type

  • Doctor of Philosophy

Department

  • Electrical and Computer Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Vaneet Aggarwal

Additional Committee Member 2

Xiaojun Lin

Additional Committee Member 3

Christopher G. Brinton

Additional Committee Member 4

Amrit Singh Bedi

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC