Learning Pose and State-Invariant Object Representations for Fine-Grained Recognition and Retrieval
Object Recognition and Retrieval is a fundamental problem in Computer Vision that involves recognizing objects and retrieving similar object images through visual queries. While deep metric learning is commonly employed to learn image embeddings for solving such problems, the representations learned using existing methods are not robust to changes in viewpoint, pose, and object state, especially for fine-grained recognition and retrieval tasks. To overcome these limitations, this dissertation aims to learn robust object representations that remain invariant to such transformations for fine-grained tasks. First, it focuses on learning dual pose-invariant embeddings to facilitate recognition and retrieval at both the category and finer object-identity levels by learning category and object-identity specific representations in separate embedding spaces simultaneously. For this, the PiRO framework is introduced that utilizes an attention-based dual encoder architecture and novel pose-invariant ranking losses for each embedding space to disentangle the category and object representations while learning pose-invariant features. Second, the dissertation introduces ranking losses that cluster multi-view images of an object together in both the embedding spaces while simultaneously pulling the embeddings of two objects from the same category closer in the category embedding space to learn fundamental category-specific attributes and pushing them apart in the object embedding space to learn discriminative features to distinguish between them. Third, the dissertation addresses state-invariance and introduces a novel ObjectsWithStateChange dataset to facilitate research in recognizing fine-grained objects with state changes involving structural transformations in addition to pose and viewpoint changes. Fourth, it proposes a curriculum learning strategy to progressively sample object images that are harder to distinguish for training the model, enhancing its ability to capture discriminative features for fine-grained tasks amidst state changes and other transformations. Experimental evaluations demonstrate significant improvements in object recognition and retrieval performance compared to previous methods, validating the effectiveness of the proposed approaches across several challenging datasets under various transformations.
History
Degree Type
- Doctor of Philosophy
Department
- Electrical and Computer Engineering
Campus location
- West Lafayette