LB-CNN & HD-OC, DEEP LEARNING ADAPTABLE BINARIZATION TOOLS FOR LARGE SCALE IMAGE CLASSIFICATION
The computer vision task of classifying natural images is a primary driving force behind modern AI algorithms. Deep Convolutional Neural Networks (CNNs) demonstrate state of the art performance in large scale multi-class image classification tasks. However, due to the many layers and millions of parameters these models are considered to be black box algorithms. The decisions of these models are further obscured due to a cumbersome multi-class decision process. There exists another approach called class binarization in the literature which determines the multi-class prediction outcome through a sequence of binary decisions.The focus of this dissertation is on the integration of the class-binarization approach to multi-class classification with deep learning models, such as CNNs, for addressing large scale image classification problems. Three works are presented to address the integration.
In the first work, Error Correcting Output Codes (ECOCs) are integrated into CNNs by inserting a latent-binarization layer prior to the CNNs final classification layer. This approach encapsulates both encoding and decoding steps of ECOC into a single CNN architecture. EM and Gibbs sampling algorithms are combined with back-propagation to train CNN models with Latent Binarization (LB-CNN). The training process of LB-CNN guides the model to discover hidden relationships similar to the semantic relationships known apriori between the categories. The proposed models and algorithms are applied to several image recognition tasks, producing excellent results.
In the second work, Hierarchically Decodeable Output Codes (HD-OCs) are proposedto compactly describe a hierarchical probabilistic binary decision process model over the features of a CNN. HD-OCs enforce more homogeneous assignments of the categories to the dichotomy labels. A novel concept called average decision depth is presented to quantify the average number of binary questions needed to classify an input. An HD-OC is trained using a hierarchical log-likelihood loss that is empirically shown to orient the output of the latent feature space to resemble the hierarchical structure described by the HD-OC. Experiments are conducted at several different scales of category labels. The experiments demonstrate strong performance and powerful insights into the decision process of the model.
In the final work, the literature of enumerative combinatorics and partially ordered sets isused to establish a unifying framework of class-binarization methods under the Multivariate Bernoulli family of models. The unifying framework theoretically establishes simple relationships for transitioning between the different binarization approaches. Such relationships provide useful investigative tools for the discovery of statistical dependencies between large groups of categories. They are additionally useful for incorporating taxonomic information as well as enforcing structural model constraints. The unifying framework lays the groundwork for future theoretical and methodological work in addressing the fundamental issues of large scale multi-class classification.
- Doctor of Philosophy
- West Lafayette