FAZT: FEW AND ZERO-SHOT FRAMEWORK TO LEARN TEMPO-VISUAL EVENTS FROM LITTLE OR NO DATA
thesisposted on 2021-12-20, 14:24 authored by Naveen MadapanaNaveen Madapana
Supervised classification methods based on deep learning have achieved great success in many domains and tasks that are previously unimaginable. Such approaches build on learning paradigms that require hundreds of examples in order to learn to classify objects or events. Thus, their immediate application to the domains with few or no observations is limited. This is because of the lack of ability to rapidly generalize to new categories from a few examples or from high-level descriptions of categories. This can be attributed to the significant gap between the way machines represent knowledge and the way humans represent categories in their minds and learn to recognize them. In this context, this research represents categories as semantic trees in a high-level attribute space and proposes an approach to utilize these representations to conduct N-Shot, Few-Shot, One-Shot, and Zero-Shot Learning (ZSL). This work refers to this paradigm as the problem of general classification (GCP) and proposes a unified framework for GCP referred to as the Few and Zero-Shot Technique (FAZT). FAZT framework is an end-to-end approach that uses trainable 3D convolutional neural networks and recurrent neural networks to simultaneously optimize for both the semantic and the classification tasks. Lastly, the problem of systematically obtaining semantic attributes by utilizing domain-specific ontologies is presented. The proposed framework is validated in the domains of hand gesture and action/activity recognition, however, this research can be applied to other domains such as video understanding, the study of human behavior, emotion recognition, etc. First, an attribute-based dataset for gestures is developed in a systematic manner by relying on literature in gestures and semantics, and crowdsourced platforms such as Amazon Mechanical Turk. To the best of our knowledge, this is the first ZSL dataset for hand gestures (ZSGL dataset). Next, our framework is evaluated in two experimental conditions: 1. Within-category (to test the attribute recognition power) and 2. Across-category (to test the ability to recognize an unknown category). In addition, we conducted experiments in zero-shot, one-shot, few-shot and continuous learning conditions in both open-set and closed-set scenarios. Results showed that our framework performs favorably on the ZSGL, Kinetics, UIUC Action, UCF101 and HMDB51 action datasets in all the experimental conditions.
GestureClean: A Touchless Interaction Language for the Operating Room
Agency for Healthcare Research and QualityFind out more...