Sample Size Determination in Multivariate Parameters With Applications to Nonuniform Subsampling in Big Data High Dimensional Linear Regression

Wang, Yu

doi:10.25394/PGS.17158982.v1

Purdue_University_Thesis_Yu_f3.pdf (2.04 MB)

Sample Size Determination in Multivariate Parameters With Applications to Nonuniform Subsampling in Big Data High Dimensional Linear Regression

thesis

posted on 2021-12-20, 15:33 authored by Yu WangYu Wang

Subsampling is an important method in the analysis of Big Data. Subsample size determination (SSSD) plays a crucial part in extracting information from data and in breaking
the challenges resulted from huge data sizes. In this thesis, (1) Sample size determination
(SSD) is investigated in multivariate parameters, and sample size formulas are obtained for
multivariate normal distribution. (2) Sample size formulas are obtained based on concentration inequalities. (3) Improved bounds for McDiarmid’s inequalities are obtained. (4) The
obtained results are applied to nonuniform subsampling in Big Data high dimensional linear
regression. (5) Numerical studies are conducted.
The sample size formula in univariate normal distribution is a melody in elementary
statistics. It appears that its generalization to multivariate normal (or more generally multivariate parameters) hasn’t been caught much attention to the best of our knowledge. In
this thesis, we introduce a definition for SSD, and obtain explicit formulas for multivariate
normal distribution, in gratifying analogy of the sample size formula in univariate normal.
Commonly used concentration inequalities provide exponential rates, and sample sizes
based on these inequalities are often loose. Talagrand (1995) provided the missing factor to
sharpen these inequalities. We obtained the numeric values of the constants in the missing
factor and slightly improved his results. Furthermore, we provided the missing factor in
McDiarmid’s inequality. These improved bounds are used to give shrunken sample sizes

History

Degree Type

Doctor of Philosophy

Department

Mathematics

Campus location

Indianapolis

Advisor/Supervisor/Committee Chair

Hanxiang Peng

Additional Committee Member 2

Fang Li

Additional Committee Member 3

Jyoti Sarkar

Additional Committee Member 4

Fei Tan

Usage metrics

Keywords

Sample Size Determination Statistics

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Sample Size Determination in Multivariate Parameters With Applications to Nonuniform Subsampling in Big Data High Dimensional Linear Regression

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Additional Committee Member 4

Usage metrics

Categories

Keywords

Licence

Exports