Hao_Kang_PhD_Dissertation_V4.pdf (15.11 MB)
Download file


Download (15.11 MB)
posted on 2019-08-14, 15:06 authored by Hao KangHao Kang
Camera view composing, such as photography, has become inseparable from everyday life. Especially with the development of drone technology, the flying mobile camera is accessible and affordable and has been used to take impressive photos. However, the process of acquiring the desired view requires manual searches and adjustments, which are usually time consuming and tedious. The situation is exacerbated with difficulty in the controlling of a mobile camera that has many Degree of Freedom. It becomes complicated to compose a well-framed view, because experience, timing, and aesthetic are all indispensable. Therefore, professional view composing with a mobile camera is not an easy task for most people. Powered by deep learning, recent breakthroughs in artificial intelligence have enabled machines to perform human-level automation in several tasks. The advances in automatic decision-making and autonomous control have the potential to improve the camera view composing process significantly.

We observe that (a) the human-robot interaction can be more intuitive and natural for photography tasks, and (b) the drone photography tasks can be further automated by learning professional photo taken patterns with data-driven methods. In this work, we present two novel frameworks for drone photography basing on the two observations. First, we demonstrate a multi-touch gesture-controlled gimbaled-drone photography framework-FlyCam. FlyCam abstracts the camera and the drone into a single flying camera object and supports the entire control intuitively on a single mobile device with simple touch gestures. Second, we present a region-of-interest based, reinforced drone photography framework-Dr$^{3}$Cam. Our full automation Dr$^{3}$Cam is built on top of state-of-the-art reinforcement learning research and enables the camera agent to seek for good views and compose visually appealing photos intelligently. Results show that FlyCam can significantly reduce the workload and increase the efficiency in human-robot interaction, while Dr$^{3}$Cam performs human-level view composing automation for drone photography tasks.


Degree Type

Doctor of Philosophy


Computer Graphics Technology

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair


Additional Committee Member 2


Additional Committee Member 3


Additional Committee Member 4