Understanding Deep Neural Networks and other Nonparametric Methods in Machine Learning

Xu, Yixi

doi:10.25394/PGS.8085005.v1

Purdue_University_Thesis_Template-5.pdf (1.01 MB)

Understanding Deep Neural Networks and other Nonparametric Methods in Machine Learning

thesis

posted on 2019-08-02, 18:57 authored by Yixi XuYixi Xu

It is a central problem in both statistics and computer science to understand the theoretical foundation of machine learning, especially deep learning. During the past decade, deep learning has achieved remarkable successes in solving many complex artificial intelligence tasks. The aim of this dissertation is to understand deep neural networks (DNNs) and other nonparametric methods in machine learning. In particular, three machine learning models have been studied: weight normalized DNNs, sparse DNNs, and the compositional nonparametric model.

The first chapter presents a general framework for norm-based capacity control for L_p,q weight normalized DNNs. We establish the upper bound on the Rademacher complexities of this family. Especially, with an L_1,infty normalization, we discuss properties of a width-independent capacity control, which only depends on the depth by a square root term. Furthermore, if the activation functions are anti-symmetric, the bound on the Rademacher complexity is independent of both the width and the depth up to a log factor. In addition, we study the weight normalized deep neural networks with rectified linear units (ReLU) in terms of functional characterization and approximation properties. In particular, for an L_1,infty weight normalized network with ReLU, the approximation error can be controlled by the L₁ norm of the output layer.

In the second chapter, we study L_1,infty-weight normalization for deep neural networks with bias neurons to achieve the sparse architecture. We theoretically establish the generalization error bounds for both regression and classification under the L_1,infty-weight normalization. It is shown that the upper bounds are independent of the network width and k^1/2-dependence on the network depth k. These results provide theoretical justifications on the usage of such weight normalization to reduce the generalization error. We also develop an easily implemented gradient projection descent algorithm to practically obtain a sparse neural network. We perform various experiments to validate our theory and demonstrate the effectiveness of the resulting approach.

In the third chapter, we propose a compositional nonparametric method in which a model is expressed as a labeled binary tree of 2k+1 nodes, where each node is either a summation, a multiplication, or the application of one of the q basis functions to one of the m₁ covariates. We show that in order to recover a labeled binary tree from a given dataset, the sufficient number of samples is O(k log(m₁q)+log(k!)), and the necessary number of samples is Omega(k log(m₁q)-log(k!)). We further propose a greedy algorithm for regression in order to validate our theoretical findings through synthetic experiments.

History

Degree Type

Doctor of Philosophy

Department

Statistics

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Dr. Xiao Wang

Additional Committee Member 2

Dr. Chuanhai Liu

Additional Committee Member 3

Dr. Jun Xie

Additional Committee Member 4

Dr. Lingsong Zhang

Usage metrics

Keywords

Deep neural networks Generalization Overfitting Rademarcher complexity Sparsity Nonparametric statistics Statistics

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Understanding Deep Neural Networks and other Nonparametric Methods in Machine Learning

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Additional Committee Member 4

Usage metrics

Categories

Keywords

Licence

Exports