APPLYING MULTIMODAL SENSING TO HUMAN MOTION TRACKING IN MOBILE SYSTEMS
Billions of “smart” things in our lives have been equipped with various sensors. Current devices, such as smartphones, smartwatches, tablets, and VR/AR headsets, are equipped with a variety of embedded sensors, e.g. accelerometer, gyroscope, magnetometer, camera, GPS sensor, etc. Based on these sensor data, many technologies have been developed to track human motion at different granularities and to enable new applications. This dissertation examines two challenging problems in human motion tracking. One problem is the ID association issue when utilizing external sensors to simultaneously track multiple people. Although an “outside” system can track all human movements in a designated area, it needs to digitally associate each tracking trajectory to the corresponding person, or say the smart device carried by that person, to provide customized service based on the tracking results. Another problem is the inaccuracy caused by limited sensing information when merely using the embedded sensors located on the devices being tracked. Since sensor data may contain inevitable noises and there is no external beacon used as a reference point for calibration, it is hard to accurately track human motion only with internal sensors.
In this dissertation, we focus on applying multimodal sensing to perform human motion tracking in mobile systems. To address the two above problems separately, we conduct the following research works. (1) The first work seeks to enable public cameras to send personalized messages to people without knowing their phone addresses. We build a system which utilizes the users’ motion patterns captured by the cameras as their communication addresses, and depends on their smartphones to locally compare the sensor data with the addresses and to accept the correct messages. To protect user privacy, the system requires no data from the users and transforms the motion patterns into low-dimensional codes to prevent motion leaks. (2) To enhance distinguishability and scalability of the camera-to-human communication system, we introduce context features which include both motion patterns and ambience features (e.g. magnetic field, Wi-Fi fingerprint, etc.) to identify people. The enhanced system achieves higher association accuracy and is demonstrated to work with dense people in a retailer, with a fixed-length packet overhead. The first two works explore the potential of widely deployed surveillance cameras and provide a generic underlay to various practical applications, such as automatic audio guide, indoor localization, and sending safety alerts. (3) We close this dissertation with a fine-grained motion tracking system which aims to track the positions of two hand-held motion controllers in a mobile VR system. To achieve high tracking accuracy without external sensors, we introduce new types of information, e.g. ultrasonic ranging among the headset and the controllers, and a kinematic arm model. Effectively fusing this additional information with inertial sensing generates accurate controller positions in real time. Compared with commodity mobile VR controllers which only support rotational tracking, our system provides an interactive VR experience by letting the user actually move the controllers’ positions in a VR scene. To summarize, this dissertation shows that multimodal sensing can further explore the potential power in sensor data and can take sensor-based applications to the next generation of innovation.