Statistical Learning and Model Criticism for Networks and Point Processes

Yang, Jiasen

doi:10.25394/PGS.8985725.v1

Yang-Dissertation.pdf (14.07 MB)

Statistical Learning and Model Criticism for Networks and Point Processes

thesis

posted on 2019-08-16, 16:56 authored by Jiasen YangJiasen Yang

Networks and point processes provide flexible tools for representing and modeling complex dependencies in data arising from various social and physical domains. Graphs, or networks, encode relational dependencies between entities, while point processes characterize temporal or spatial interactions among events.

In the first part of this dissertation, we consider dynamic network data (such as communication networks) in which links connecting pairs of nodes appear continuously over time. We propose latent space point process models to capture two different aspects of the data: (i) communication occurs at a higher rate between individuals with similar latent attributes (i.e., homophily); and (ii) individuals tend to reciprocate communications from others, but in a varied manner. Our framework marries ideas from point process models, including Poisson and Hawkes processes, with ideas from latent space models of static networks. We evaluate our models on several real-world datasets and show that a dual latent space model, which accounts for heterogeneity in both homophily and reciprocity, significantly improves performance in various link prediction and network embedding tasks.

In the second part of this dissertation, we develop nonparametric goodness-of-fit tests for discrete distributions and point processes that contain intractable normalization constants, providing the first generally applicable and computationally feasible approaches under those circumstances. Specifically, we propose and characterize Stein operators for discrete distributions, and construct a general Stein operator for point processes using the Papangelou conditional intensity function. Based on the proposed Stein operators, we establish kernelized Stein discrepancy measures for discrete distributions and point processes, which enable us to develop nonparametric goodness-of-fit tests for un-normalized density/intensity functions. We apply the kernelized Stein discrepancy tests to discrete distributions (including network models) as well as temporal and spatial point processes. Our experiments demonstrate that the proposed tests typically outperform two-sample tests based on the maximum mean discrepancy, which, unlike our goodness-of-fit tests, assume the availability of exact samples from the null model.

Funding

NSF IIS-1149789, IIS-1546488, IIS-1618690, CCF-0939370

History

Degree Type

Doctor of Philosophy

Department

Statistics

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Jennifer Neville

Additional Committee Member 2

Vinayak Rao

Additional Committee Member 3

Petros Drineas

Additional Committee Member 4

David Gleich

Additional Committee Member 5

Hao Zhang

Usage metrics

Keywords

Network data Point process Model criticism Stein's method Kernel methods Goodness-of-fit testing Statistics

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Statistical Learning and Model Criticism for Networks and Point Processes

Funding

NSF IIS-1149789, IIS-1546488, IIS-1618690, CCF-0939370

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Additional Committee Member 4

Additional Committee Member 5

Usage metrics

Categories

Keywords

Licence

Exports