Social Reinforcement Learning
There are various real-world applications that involve large number of interacting agents, for e.g., viral marketing, personalized teaching, healthcare, recommendation systems, online communication platforms. However, much of the existing work in Multi-Agent Reinforcement Learning (MARL) focuses on small number of agents. The standard approaches to train a complex model for each user in a decentralized fashion are impractical for thousands of agents. Centralized learning is also infeasible due to the curse of dimensionality and exponential increase in joint representations. There is an opportunity to utilize the interactions and correlations between agents, to develop RL approaches that can scale for large number of agents. However, user interactions are typically sparse. In this dissertation, we dene Social Reinforcement Learning as a sub class of MARL for domains with large number of agents with relatively few (sparse) relations and interactions between them.
We consider the important task of fake news mitigation as an example to demonstrate the real-world applicability of our proposed Social RL approaches. First, we propose a centralized Social RL approach to estimate incentives (interventions) required to promote the spread of true news in a social network|in order to mitigate the impact of fake news. We model news diffusion as a Multivariate Hawkes Process (MHP) and make interventions that are learnt via policy optimization in a Markov Decision Process (MDP). The key insight is to estimate the response a user will get from the social network upon sharing a post, as it indicates her impact on news diffusion, and will thus help in efficient allocation of incentive. Second, we develop an efficient centralized Social RL approach to address the challenges of computational complexity (associated with large number of agents), and sparse interaction data. Our key idea is to reduce the model size by dynamically clustering users based on their payoff and contribution to the goal. Lastly, the above proposed centralized approaches can be applied when the environment is fully observable to all agents, with a common system shared between all agents. To develop solutions for scenarios where agents receive only a partial view of the environment, and agents can also have separate individual goals, we propose a Social RL approach that is more decentralized. Our key idea is to use sequential parameter sharing and ego-network extrapolation to incorporate agent correlations and improvise estimates of the partially hidden system information. We evaluate our proposed approaches on two Twitter datasets. Our centralized learning methods outperform other alternatives that do not consider estimates of user feedback when learning how to allocate incentives. Furthermore, by clustering users, we are able to achieve faster convergence along with learning more accurate estimates, compared to baselines that do not model agent correlations or only use static clusters. Additionally, our decentralized learning approach achieves performance equivalent to that of centralized learning approach and superior performance to other baselines that either consider complete system information available to an agent, or other estimates of the hidden environment state.