## File(s) under embargo

## 3

day(s)until file(s) become available

# Machine Learning Algorithms for Influence Maximization on Social Networks

With an increasing number of users spending time on social media platforms and engaging with family, friends, and influencers within communities of interest (such as in fashion, cooking, gaming, etc.), there are significant opportunities for marketing firms to leverage word-of-mouth advertising on these platforms. In particular, marketing firms can select sets of influencers within relevant communities to sponsor, namely by providing free product samples to those influencers so that so they will discuss and promote the product on their social media accounts.

The question of which set of influencers to sponsor is known as **influence maximization** (IM) formally defined as follows: "if we can try to convince a subset of individuals in a social network to adopt a new product or innovation, and the goal is to trigger a large cascade of further adoptions, which set of individuals should we target?'' Under standard diffusion models, this optimization problem is known to be NP-hard. This problem has been widely studied in the literature and several approaches for solving it have been proposed. Some approaches provide near-optimal solutions but are costly in terms of runtime. On the other hand, some approaches are faster but heuristics, i.e., do not have approximation guarantees.

In this dissertation, we study the influence maximization problem extensively. We provide efficient algorithms for solving the original problem and its important generalizations. Furthermore, we provide theoretical guarantees and experimental evaluations to support the claims made in this dissertation.

We first study the original IM problem referred to as the discrete influence maximization (DIM) problem where the marketer can either provide a free sample to an influencer or not, i.e., they cannot give fractional discounts like 10% off, etc. As already mentioned the existing solution methods (for instance, the simulation-based greedy algorithm) provide near-optimal solutions that are costly in terms of runtime and the approaches that are faster do not have approximation guarantees. Motivated by the idea of addressing this trade-off between accuracy and runtime, we propose a community-aware divide-and-conquer framework to provide a time-efficient solution to the DIM problem. The proposed framework outperforms the standard methods in terms of runtime and the heuristic methods in terms of influence.

We next study a natural extension of the DIM problem referred to as the fractional influence maximization (FIM) problem where the marketer may offer fractional discounts (as opposed to either providing a free sample to an influencer or not in the DIM problem) to the influencers. Clearly, the FIM problem provides more flexibility to the marketer in allocating the available budget among different influencers. The existing solution methods propose to use a continuous extension of the simulation-based greedy approximation algorithm for solving the DIM problem. This continuous extension suggests greedily building the solution for the given fractional budget by taking small steps through the interior of the feasible region. On the contrary, we first characterize the solution to the FIM problem in terms of the solution to the DIM problem. We then use this characterization to propose an efficient greedy approximation algorithm that only iterates through the corners of the feasible region. This leads to huge savings in terms of runtime compared to the existing methods that suggest iterating through the interior of the feasible region. Furthermore, we provide an approximation guarantee for the proposed greedy algorithm to solve the FIM problem.

Finally, we study another extension of the DIM problem referred to as the online discrete influence maximization (ODIM) problem, where the marketer provides free samples not just once but repeatedly over a given time horizon and the goal is to maximize the cumulative influence over time while receiving instantaneous feedback. The existing solution methods are based on semi-bandit instantaneous feedback where the knowledge of some intermediate aspects of how the influence propagates in the social network is assumed or observed. For instance, which specific individuals became influenced at the intermediate steps during the propagation? However, for social networks with user privacy, this information is not available. Hence, we consider the ODIM problem with full-bandit feedback where no knowledge of the underlying social network or diffusion process is assumed. We note that the ODIM problem is an instance of the stochastic combinatorial multi-armed bandit (CMAB) problem with submodular rewards. To solve the ODIM problem, we provide an efficient algorithm that outperforms the existing methods in terms of influence, and time and space complexities.

Furthermore, we point out the connections of influence maximization with a related problem of disease outbreak prevention and a more general problem of submodular maximization. The methods proposed in this dissertation can also be used to solve those problems.

## Funding

### This material is based upon work supported in part by the National Science Foundation under Grants No. 1742847, 2149588, and 2149617.

## History

## Degree Type

- Doctor of Philosophy

## Department

- Industrial Engineering

## Campus location

- West Lafayette