Sarkhan PhD Dissertation final.pdf (45.67 MB)
Download file

Fine-Grained Bayesian Zero-Shot Object Recognition

Download (45.67 MB)
posted on 03.01.2022, 01:10 by Sarkhan BadirliSarkhan Badirli
Building machine learning algorithms to recognize objects in real-world tasks is a very challenging problem. With increasing number of classes, it becomes very costly and impractical to collect samples for all classes to obtain an exhaustive data to train the model. This limited labeled data bottleneck prevails itself more profoundly over fine grained object classes where some of these classes may lack any labeled representatives in the training data. A robust algorithm in this realistic scenario will be required to classify samples from well-represented classes as well as to handle samples from unknown origin. In this thesis, we break down this difficult task into more manageable sub-problems and methodically explore novel solutions to address each component in a sequential order.

We begin with zero-shot learning (ZSL) scenario where classes that are lacking any labeled images in the training data, i.e., unseen classes, are assumed to have some semantic descriptions associated with them. The ZSL paradigm is motivated by analogy to humans’ learning process. We human beings can recognize new categories by just knowing some semantic descriptions of them without even seeing any instances from these categories. We
develop a novel hierarchical Bayesian classifier for ZSL task. The two-layer architecture of the model is specifically designed to exploit the implicit hierarchy present among classes, in particular evident in fine-grained datasets. In the proposed method, there are latent classes that define the class hierarchy in the image space and semantic information is used to build the Bayesian hierarchy around these meta-classes. Our Bayesian model imposes local priors on semantically similar classes that share the same meta-class to realize knowledge transfer. We finally derive posterior predictive distributions to reconcile information about local and global priors and then blend them with data likelihood for the final likelihood calculation. With its closed form solution, our two-layer hierarchical classifier proves to be fast in training and flexible to model both fine and coarse-grained datasets. In particular, for challenging fine-grained datasets the proposed model can leverage the large number of seen classes to its advantage for a better local prior estimation without sacrificing on seen class accuracy.
Side information plays a critical role in ZSL and ZSL models hold on a strong assumption that the side information is strongly correlated with image features. Our model uses side information only to build hierarchy, thus, no explicit correlation between image features is assumed. This in turn leads the Bayesian model to be very resilient to various side
information sources as long as they are discriminative enough to define class hierarchy.

When dealing with thousands of classes, it becomes very difficult to obtain semantic descriptions for fine grained classes. For example, in species classification where classes display very similar morphological traits, it is impractical if not impossible to derive characteristic
visual attributes that can distinguish thousands of classes. Moreover, it would be unrealistic to assume that an exhaustive list of visual attributes characterizing all object classes, both seen and unseen, can be determined based only on seen classes. We propose DNA as a side
information to overcome this obstacle in order to do fine grained zero-shot species classification. We demonstrate that 658 base pair long DNA barcodes can be sufficient to serve as a robust source of side information for newly compiled insect dataset with more than thousand
classes. The experiments is further validated on well-known CUB dataset on which DNA attributes proves to be as competitive as word vectors. Our proposed Bayesian classifier delivers state of the art results on both datasets while using DNA as side information.

Traditional ZSL framework, however, is not quite suitable for scalable species identification and discovery. For example, insects are one of the largest groups of animal kingdom
with estimated 5.5 million species yet only 20% of them is described. We extend the traditional ZSL into a more practical framework where no explicit side information is available for unseen classes. We transform our Bayesian model to utilize taxonomical hierarchy of species
to perform insect identification at scale. Our approach is the first to combine two different data modalities, namely image and DNA information, to perform insect identification with
more than thousand classes. Our algorithm not only classifies known species with impressive 97% accuracy but also identifies unknown species and classify them to their true genus with 81% accuracy.

Our approach has the ability to address some major societal issues in climate change such as changing insect distributions and measuring biodiversity across the world. We believe this work can pave the way for more precise and more importantly the scalable monitoring of
biodiversity and can become instrumental in offering objective measures of the impacts of recent changes our planet has been going through.


National Science Foundation, 1252648 - ISS


Degree Type

Doctor of Philosophy


Computer Science

Campus location


Advisor/Supervisor/Committee Chair

Murat Dundar

Advisor/Supervisor/Committee co-chair

Clifton W. Bingham

Additional Committee Member 2

Bedrich Benes

Additional Committee Member 3

George Mohler