Using a Scalable Feature Selection Approach For Big Data Regressions

Cheng, Qingdong

doi:10.25394/PGS.8796893.v1

Using_A_Scalable_Feature_Selection_Approach_For_Big_Data_Regressions.pdf (1.26 MB)

Using a Scalable Feature Selection Approach For Big Data Regressions

thesis

posted on 2019-08-13, 16:54 authored by Qingdong ChengQingdong Cheng

Logistic regression is a widely used statistical method in data analysis and machine learning. When the capacity of data is large, it is time-consuming and even infeasible to perform big data machine learning using the traditional approach. Therefore, it is crucial to come up with an efficient way to evaluate feature combinations and update learning models. With the approach proposed by Yang, Wang, Xu, and Zhang (2018) a system can be represented using small enough matrices, which can be hosted in memory. These working sufficient statistics matrices can be applied in updating models in logistic regression. This study applies the working sufficient statistics approach in logistic regression machine learning to examine how this new method improves the performance. This study investigated the difference between the performance of this new working sufficient statistics approach and performance of the traditional approach on Spark\rq s machine learning package. The experiments showed that the working sufficient statistics method could improve the performance of training the logistic regression models when the input size was large.

History

Degree Type

Master of Science

Department

Computer and Information Technology

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Baijian Yang

Additional Committee Member 2

Dominic Kao

Additional Committee Member 3

Tonglin Zhang

Usage metrics

Keywords

Spark Big data Logistic Regression Applied Computer Science Statistics

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Using a Scalable Feature Selection Approach For Big Data Regressions

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Usage metrics

Categories

Keywords

Licence

Exports