The generalization performance of neural networks can be improved by respecting invariances inherent in a problem. For instance, it is often the case that rotating an image

does not change its meaning; the target variable is likely invariant to rotations of the input. Models for graph and set data are typically designed to be permutation-invariant, since the order of set elements and isomorphisms of graphs usually do not change their meaning, but a unified paradigm for both types of data is lacking. Moreover, existing models are either insufficiently flexible or difficult to reliably train in practice. This dissertation presents Janossy pooling (JP), a novel framework for training neural networks that are (approximately) invariant to a prespecified finite group of transformations (e.g., permutations), with a focus on graphs and sets. Motivated by partial exchangeability and Janossy densities, we view transformation-invariant models as averages over all transformations of the input, each passed to a transformation-sensitive function. As the set of transformations can be very large, we advance three approximation strategies: π-Stochastic Gradient Descent (π-SGD), k-ary approximations, and poly-canonical orderings. Compared to state-of-the-art approaches, JP is capable of expressing a richer class of models than Message Passing Graph Neural Networks and does not necessitate the use of highly complex and discontinuous functions for modeling sets effectively. Empirical evidence, including experiments on molecular and protein-protein-interaction datasets, supports the theory we develop and the practical benefits of JP. However, on another note, it may be unclear whether enforcing invariances is appropriate in any given task, such as in the case of time-evolving graphs.

Accordingly, we propose a data-driven scheme where the extent of transformation sensitivity is determined by a regularization hyperparameter. This approach invokes the Birkhoff-von Neumann theorem, making it directly applicable to both graphs and sets, and establishes a link between existing methods and infinitely strong regularization. Finally, we verify this approach experimentally.