Purdue University Graduate School
Browse

Learning Pose and State-Invariant Object Representations for Fine-Grained Recognition and Retrieval

Download (51.74 MB)
thesis
posted on 2024-07-11, 20:35 authored by Rohan SarkarRohan Sarkar

Object Recognition and Retrieval is a fundamental problem in Computer Vision that involves recognizing objects and retrieving similar object images through visual queries. While deep metric learning is commonly employed to learn image embeddings for solving such problems, the representations learned using existing methods are not robust to changes in viewpoint, pose, and object state, especially for fine-grained recognition and retrieval tasks. To overcome these limitations, this dissertation aims to learn robust object representations that remain invariant to such transformations for fine-grained tasks. First, it focuses on learning dual pose-invariant embeddings to facilitate recognition and retrieval at both the category and finer object-identity levels by learning category and object-identity specific representations in separate embedding spaces simultaneously. For this, the PiRO framework is introduced that utilizes an attention-based dual encoder architecture and novel pose-invariant ranking losses for each embedding space to disentangle the category and object representations while learning pose-invariant features. Second, the dissertation introduces ranking losses that cluster multi-view images of an object together in both the embedding spaces while simultaneously pulling the embeddings of two objects from the same category closer in the category embedding space to learn fundamental category-specific attributes and pushing them apart in the object embedding space to learn discriminative features to distinguish between them. Third, the dissertation addresses state-invariance and introduces a novel ObjectsWithStateChange dataset to facilitate research in recognizing fine-grained objects with state changes involving structural transformations in addition to pose and viewpoint changes. Fourth, it proposes a curriculum learning strategy to progressively sample object images that are harder to distinguish for training the model, enhancing its ability to capture discriminative features for fine-grained tasks amidst state changes and other transformations. Experimental evaluations demonstrate significant improvements in object recognition and retrieval performance compared to previous methods, validating the effectiveness of the proposed approaches across several challenging datasets under various transformations.

History

Degree Type

  • Doctor of Philosophy

Department

  • Electrical and Computer Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Avinash Kak

Additional Committee Member 2

Samuel Midkiff

Additional Committee Member 3

Timothy Rogers

Additional Committee Member 4

Tanmay Prakash

Usage metrics

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC