On Transfer Learning Techniques for Machine Learning

Das, Debasmit

doi:10.25394/PGS.12221597.v1

On Transfer Learning Techniques for Machine Learning

thesis

posted on 2020-04-30, 21:04 authored by Debasmit DasDebasmit Das



Recent progress in machine learning has been mainly due to
the availability of large amounts of annotated data used for training complex
models with deep architectures. Annotating this training data becomes
burdensome and creates a major bottleneck in maintaining machine-learning
databases. Moreover, these trained models fail to generalize to new categories
or new varieties of the same categories. This is because new categories or new
varieties have data distribution different from the training data distribution.
To tackle these problems, this thesis proposes to develop a family of
transfer-learning techniques that can deal with different training (source) and
testing (target) distributions with the assumption that the availability of
annotated data is limited in the testing domain. This is done by using the
auxiliary data-abundant source domain from which useful knowledge is
transferred that can be applied to data-scarce target domain. This transferable
knowledge serves as a prior that biases target-domain predictions and prevents
the target-domain model from overfitting. Specifically, we explore structural
priors that encode relational knowledge between different data entities, which
provides more informative bias than traditional priors. The choice of the
structural prior depends on the information availability and the similarity
between the two domains. Depending on the domain similarity and the information
availability, we divide the transfer learning problem into four major
categories and propose different structural priors to solve each of these
sub-problems.


This thesis first focuses on the
unsupervised-domain-adaptation problem, where we propose to minimize domain
discrepancy by transforming labeled source-domain data to be close to unlabeled
target-domain data.  For this problem,
the categories remain the same across the two domains and hence we assume that
the structural relationship between the source-domain samples is carried over
to the target domain. Thus, graph or hyper-graph is constructed as the
structural prior from both domains and a graph/hyper-graph matching formulation
is used to transform samples in the source domain to be closer to samples in
the target domain. An efficient optimization scheme is then proposed to tackle
the time and memory inefficiencies associated with the matching problem. The
few-shot learning problem is studied next, where we propose to transfer
knowledge from source-domain categories containing abundantly labeled data to
novel categories in the target domain that contains only few labeled data. The
knowledge transfer biases the novel category predictions and prevents the model
from overfitting. The knowledge is encoded using a neural-network-based prior
that transforms a data sample to its corresponding class prototype. This neural
network is trained from the source-domain data and applied to the target-domain
data, where it transforms the few-shot samples to the novel-class prototypes
for better recognition performance. The few-shot learning problem is then
extended to the situation, where we do not have access to the source-domain
data but only have access to the source-domain class prototypes. In this limited
information setting, parametric neural-network-based priors would overfit to
the source-class prototypes and hence we seek a non-parametric-based prior
using manifolds. A piecewise linear manifold is used as a structural prior to
fit the source-domain-class prototypes. This structure is extended to the
target domain, where the novel-class prototypes are found by projecting the
few-shot samples onto the manifold. Finally, the zero-shot learning problem is
addressed, which is an extreme case of the few-shot learning problem where we
do not have any labeled data in the target domain. However, we have high-level
information for both the source and target domain categories in the form of
semantic descriptors. We learn the relation between the sample space and the
semantic space, using a regularized neural network so that classification of
the novel categories can be carried out in a common representation space. This
same neural network is then used in the target domain to relate the two spaces.
In case we want to generate data for the novel categories in the target domain,
we can use a constrained generative adversarial network instead of a
traditional neural network. Thus, we use structural priors like graphs, neural
networks and manifolds to relate various data entities like samples, prototypes
and semantics for these different transfer learning sub-problems. We explore
additional post-processing steps like pseudo-labeling, domain adaptation and
calibration and enforce algorithmic and architectural constraints to further
improve recognition performance. Experimental results on standard transfer
learning image recognition datasets produced competitive results with respect
to previous work. Further experimentation and analyses of these methods
provided better understanding of machine learning as well.

Funding

NSF IIS-1813935

History

Degree Type

Doctor of Philosophy

Department

Electrical and Computer Engineering

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

C. S. George Lee

Additional Committee Member 2

Stanley H. Chan

Additional Committee Member 3

Guang Lin

Additional Committee Member 4

Guang Cheng

Usage metrics

Keywords

Transfer learning Computer Vision Machine Learning Domain Adaptation Few-shot Learning Zero-shot Learning Knowledge Representation and Machine Learning

Licence

CC BY 4.0

On Transfer Learning Techniques for Machine Learning

Funding

NSF IIS-1813935

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Additional Committee Member 4

Usage metrics

Categories

Keywords

Licence

Exports