Estimation and Uncertainty Quantification in Tensor Completion with Side Information

Ibriga, Somnooma Hilda Marie Bernadette

doi:10.25394/PGS.15079320.v1

Final_Thesis_Deposited.pdf (2.65 MB)

Estimation and Uncertainty Quantification in Tensor Completion with Side Information

thesis

posted on 2021-07-30, 11:37 authored by Somnooma Hilda Marie Bernadette IbrigaSomnooma Hilda Marie Bernadette Ibriga

This work aims to provide solutions to two significant issues in the effective use and practical application of tensor completion as a machine learning method. The first solution addresses the challenge in designing fast and accurate recovery methods in tensor completion in the presence of highly sparse and highly missing data. The second takes on the need for robust uncertainty quantification methods for the recovered tensor.

Covariate-assisted Sparse Tensor Completion

In the first part of the dissertation, we aim to provably complete a sparse and highly missing tensor in the presence of covariate information along tensor modes. Our motivation originates from online advertising where users click-through-rates (CTR) on ads over various devices form a CTR tensor that can have up to 96% missing entries and has many zeros on non-missing entries. These features makes the standalone tensor completion method unsatisfactory. However, beside the CTR tensor, additional ad features or user characteristics are often available. We propose Covariate-assisted Sparse Tensor Completion (COSTCO) to incorporate covariate information in the recovery of the sparse tensor. The key idea is to jointly extract latent components from both the tensor and the covariate matrix to learn a synthetic representation. Theoretically, we derive the error bound for the recovered tensor components and explicitly quantify the improvements on both the reveal probability condition and the tensor recovery accuracy due to covariates. Finally, we apply COSTCO to an advertisement dataset from a major internet platform consisting of a CTR tensor and ad covariate matrix, leading to 23% accuracy improvement over the baseline methodology. An important by-product of our method is that clustering analysis on ad latent components from COSTCO reveal interesting and new ad clusters, that link different product industries which are not formed in existing clustering methods. Such findings could be directly useful for better ad planning procedures.

Uncertainty Quantification in Covariate-assisted Tensor Completion

In the second part of the dissertation, we propose a framework for uncertainty quantification for the imputed tensor factors obtained from completing a tensor with covariate information. We characterize the distribution of the non-convex estimator obtained from using the algorithm COSTCO down to fine scales. This distributional theory in turn allows us to construct proven valid and tight confidence intervals for the unseen tensor factors. The proposed inferential procedure enjoys several important features: (1) it is fully adaptive to noise heteroscedasticity, (2) it is data-driven and automatically adapts to unknown noise distributions and (3) in the high missing data regime, the inclusion of side information in the tensor completion model yields tighter confidence intervals compared to those obtained from standalone tensor completion methods.

History

Degree Type

Doctor of Philosophy

Department

Statistics

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Bruce Craig

Advisor/Supervisor/Committee co-chair

Wei Sun

Additional Committee Member 2

Jun Xie

Additional Committee Member 3

Anindya Badra

Usage metrics

Keywords

Tensor Analysis Machine Learning Non Convex Optimization Statistical Theory Statistics Optimisation Applied Statistics

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Estimation and Uncertainty Quantification in Tensor Completion with Side Information

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Advisor/Supervisor/Committee co-chair

Additional Committee Member 2

Additional Committee Member 3

Usage metrics

Categories

Keywords

Licence

Exports