REINFORCEMENT LEARNING FOR CONCAVE OBJECTIVES AND CONVEX CONSTRAINTS
Formulating RL with MDPs work typically works for a single objective, and hence, they are not readily applicable where the policies need to optimize multiple objectives or to satisfy certain constraints while maximizing one or multiple objectives, which can often be conflicting. Further, many applications such as robotics or autonomous driving do not allow for violating constraints even during the training process. Currently, existing algorithms do not simultaneously combine multiple objectives and zero-constraint violations, sample efficiency, and computational complexity. To this end, we study sample efficient Reinforcement Learning with concave objective and convex constraints, where an agent maximizes a concave, Lipschitz continuous function of multiple objectives while satisfying a convex cost objective. For this setup, we provide a posterior sampling algorithm which works with a convex optimization problem to solve for the stationary distribution of the states and actions. Further, using our Bellman error based analysis, we show that the algorithm obtains a near-optimal Bayesian regret bound for the number of interaction with the environment. Moreover, with an assumption of existence of slack policies, we design an algorithm that solves for conservative policies which does not violate constraints and still achieves the near-optimal regret bound. We also show that the algorithm performs significantly better than the existing algorithm for MDPs with finite states and finite actions.
History
Degree Type
- Doctor of Philosophy
Department
- Electrical and Computer Engineering
Campus location
- West Lafayette