Multiple Learning for Generalized Linear Models in Big Data
thesisposted on 2021-12-19, 17:27 authored by Xiang LiuXiang Liu
Big data is an enabling technology in digital transformation. It perfectly complements ordinary linear models and generalized linear models, as training well-performed ordinary linear models and generalized linear models require huge amounts of data. With the help of big data, ordinary and generalized linear models can be well-trained and thus offer better services to human beings. However, there are still many challenges to address for training ordinary linear models and generalized linear models in big data. One of the most prominent challenges is the computational challenges. Computational challenges refer to the memory inflation and training inefficiency issues occurred when processing data and training models. Hundreds of algorithms were proposed by the experts to alleviate/overcome the memory inflation issues. However, the solutions obtained are locally optimal solutions. Additionally, most of the proposed algorithms require loading the dataset to RAM many times when updating the model parameters. If multiple model hyper-parameters needed to be computed and compared, e.g. ridge regression, parallel computing techniques are applied in practice. Thus, multiple learning with sufficient statistics arrays are proposed to tackle the memory inflation and training inefficiency issues.