File(s) under embargo
2
month(s)22
day(s)until file(s) become available
TOWARDS IMPROVED REPRESENTATIONS ON HUMAN ACTIVITY UNDERSTANDING
Human action recognition stands as a cornerstone in the domain of computer vision, with its utility spanning across emergency response, sign language interpretation, and the burgeoning fields of augmented and virtual reality. The transition from conventional video-based recognition to skeleton-based methodologies has been a transformative shift, offering a robust alternative less susceptible to environmental noise and more focused on the dynamics of human movement.
This body of work encapsulates the evolution of action recognition, emphasizing the pivotal role of Graph Convolution Network (GCN) based approaches, particularly through the innovative InfoGCN framework. InfoGCN has set a new precedent in the field by introducing an information bottleneck-based learning objective, a self-attention graph convolution module, and a multi-modal representation of the human skeleton. These advancements have collectively elevated the accuracy and efficiency of action recognition systems.
Addressing the prevalent challenge of occlusions, particularly in single-camera setups, the Pose Relation Transformer (PORT) framework has been introduced. Inspired by the principles of Masked Language Modeling in natural language processing, PORT refines the detection of occluded joints, thereby enhancing the reliability of pose estimation under visually obstructive conditions.
Building upon the foundations laid by InfoGCN, the Skeleton ODE framework has been developed for online action recognition, enabling real-time inference without the need for complete action observation. By integrating Neural Ordinary Differential Equations, Skeleton ODE facilitates the prediction of future movements, thus reducing latency and paving the way for real-time applications.
The implications of this research are vast, indicating a future where real-time, efficient, and accurate human action recognition systems could significantly impact various sectors, including healthcare, autonomous vehicles, and interactive technologies. Future research directions point towards the integration of multi-modal data, the application of transfer learning for enhanced generalization, the optimization of models for edge computing, and the ethical deployment of action recognition technologies. The potential for these systems to contribute to healthcare, particularly in patient monitoring and disease detection, underscores the need for continued interdisciplinary collaboration and innovation.
Funding
FW-HTF 1839971
History
Degree Type
- Doctor of Philosophy
Department
- Electrical and Computer Engineering
Campus location
- West Lafayette