Reinforcement Learning-based Human Operator Decision Support Agent for Highly Transient Industrial Processes
Most industrial processes are not fully-automated. Although reference tracking can be handled by low-level controllers, initializing and adjusting the reference, or setpoint, values, are commonly tasks assigned to human operators. A major challenge that arises, though, is control policy variation among operators which in turn results in inconsistencies in the final product. In order to guide operators to pursue better and more consistent performance, researchers have explored the optimal control policy through different approaches. Although in different applications, researchers use different approaches, an accurate process model is still crucial to the approaches. However, for a highly transient process (e.g., the startup of a manufacturing process), modeling can be challenging and inaccurate, and approaches highly relying on a process model may not work well. One example is process startup in a twin-roll steel strip casting process and motivates this work.
In this dissertation, I propose three offline reinforcement learning (RL) algorithms which require the RL agent to learn a control policy from a fixed dataset that is pre-collected by human operators during operations of the twin-roll casting process. Compared to existing offline RL algorithms, the proposed algorithms focus on exploiting the best control policy used by human operators rather than exploring new control policies constrained by the existing policies. In addition, in existing offline RL algorithms, there is not enough consideration of the imbalanced dataset problem. In the second and the third proposed algorithms, I leverage the idea of cost sensitive learning to incentivize the RL agent to learn the most valuable control policy, rather than the most common one represented in the dataset. In addition, since the process model is not available, I propose a performance metric that does not require a process model or simulator for agent testing. The third proposed algorithm is compared with benchmark offline RL algorithms and achieves better and more consistent performance.
Funding
Castrip
History
Degree Type
- Doctor of Philosophy
Department
- Mechanical Engineering
Campus location
- West Lafayette