Purdue University Graduate School
Browse

SPARSE DEEP LEARNING FOR TIME SERIES DATA AND MAGNITUDE PRUNING OF LARGE PRETRAINED TRANSFORMER MODELS AND TEMPERING LEARNING

Download (4.39 MB)
thesis
posted on 2025-05-02, 18:52 authored by Mingxuan ZhangMingxuan Zhang

Sparse deep learning has proven to be an effective technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale model compression. While most existing research has focused on settings with independent and identically distributed (i.i.d.) observations, there has been limited exploration of scenarios involving dependent data, such as time series and sequential data in natural language processing (NLP). This work addresses this gap by establishing a theoretical foundation for sparse deep learning with dependent data. It demonstrates that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under suitable conditions, enabling accurate prediction uncertainty quantification. Experimental results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in quantifying uncertainty for time series data. Additionally, the method consistently identifies autoregressive orders in time series and surpasses existing approaches in large-scale model compression, with practical applications in fields like finance, healthcare, and energy.

The success of pruning techniques in RNN-based language models has inspired further exploration of their applicability to modern large language models. Pretrained transformer models have revolutionized NLP with their state-of-the-art performance but face challenges in real-world deployment due to their massive parameter counts. To tackle this issue, parameter pruning strategies have been explored, including magnitude and sensitivity based approaches. However, traditional magnitude pruning has shown limitations, particularly in transfer learning scenarios for modern NLP tasks. A novel pruning algorithm, Mixture Gaussian Prior Pruning (MGPP), is introduced to address these challenges. By employing a mixture Gaussian prior for regularization, MGPP prunes non-expressive weights while retaining the models expressive capabilities. Extensive evaluations on a variety of NLP tasks, including natural language understanding, question answering, and natural language generation, demonstrate the effectiveness of MGPP, particularly in high-sparsity settings. Theoretical analysis further supports the consistency of sparse transformers, providing insights into the success of this approach. These advancements contribute to optimizing large-scale language models for real-world applications, improving efficiency while maintaining performance.

State-space modeling has recently emerged as a powerful technique across various fields, including biology, finance, and engineering. However, its potential for training deep neural networks (DNNs) and its applicability to generative modeling remain underexplored. In this part of the dissertation, we introduce tempering learning, a novel algorithm that leverages state-space modeling to train deep neural networks. By manually constructing a tempering ladder, we transform the original learning problem to a data assimilation problem. In addition to its optimization advantages, tempering learning can be extended to one-step image generation through a diffusion-like process. Extensive experiments demonstrate the effectiveness of our approach across classical machine learning tasks, while also showcasing its promise for one-step unconditional image generation on CIFAR-10 and ImageNet datasets.

History

Degree Type

  • Doctor of Philosophy

Department

  • Statistics

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Faming Liang

Advisor/Supervisor/Committee co-chair

Lingsong Zhang

Additional Committee Member 2

Qifan Song

Additional Committee Member 3

Jun Xie

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC