Machine Learning and Deep Learning Approaches to Print defect Detection, Face Set Recognition, Face Alignment, and Visual Enhancement in Space and Time
thesisposted on 2021-07-21, 18:43 authored by Xiaoyu XiangXiaoyu Xiang
The research includes machine Learning and Deep Learning Approaches to Print Defect Detection, Face Set Recognition and Face Alignment, and Visual-Enhancement in Space and Time. This thesis consists of six parts which are related to 6 projects:
In Chapter 1, the first project focuses on detection of local printing defects including gray spots and solid spots. We propose a coarse-to-fine method to detect local defects in a block-wise manner and aggregate the blockwise attributes to generate the feature vector of the whole test page for a further ranking task. In the detection part, we first select candidate regions by thresholding a single feature. Then more detailed features of candidate blocks are calculated and sent to a decision tree that is previously trained on our training dataset. The final result is given by the decision tree model to control the false alarm rate while maintaining the required miss rate.
Chapter 2 introduces face set recognition and Chapter 3 is about face alignment. In order to reduce the computational complexity of comparing face sets, we propose a deep neural network that can compute and aggregate the face feature vectors with different weights. As for face alignment, our goal is to solve the jittering of landmark locations when applied on video. We propose metrics and corresponding methods around this goal.
In recent years, mobile photography has become increasingly prevalent in our lives with social media due to its high portability and convenience. However, many challenges still exist in distributing high-quality mobile images and videos under the limit of data capacity, hardware storage, and network bandwidth. Therefore, we have been exploring enhancement techniques to improve the image and video qualities, considering both effectiveness and efficiency for a wide variety of applications, including WhatsApp, Portal, TikTok, even the printing industry. Chapter 4 introduces single image super-resolution to handle real-world images with various degradations, and its influence on several downstream high-level computer vision tasks. Next, Chapter 5 studies on headshot image restoration with multiple references, which is an application of visual enhancement under more specific scenarios. Finally, as a step towards the temporal domain enhancement, the Zooming SlowMo framework for fast and accurate space-time video super-resolution will be introduced in Chapter 6.