UNRESTRICTED CONTROLLABLE ATTACKS FOR SEGMENTATION NEURAL NETWORKS
Despite the rapid development of adversarial attacks on machine learning models, many types of new adversarial examples remain unknown. Undiscovered types of adversarial attacks pose a
serious concern for the safety of the models, which raises the issue about the effectiveness of current adversarial robustness evaluation. Image semantic segmentation is a practical computer
vision task. However, segmentation networks’ robustness under adversarial attacks receives insufficient attention. Recently, machine learning researchers started to focus on generating
adversarial examples beyond the norm-bound restriction for segmentation neural networks. In this thesis, a simple and efficient method: AdvDRIT is proposed to synthesize unconstrained controllable adversarial images leveraging conditional-GAN. Simple CGAN yields poor image quality and low attack effectiveness. Instead, the DRIT (Disentangled Representation Image Translation) structure is leveraged with a well-designed loss function, which can generate valid adversarial images in one step. AdvDRIT is evaluated on two large image datasets: ADE20K and Cityscapes. Experiment results show that AdvDRIT can improve the quality of adversarial examples by decreasing the FID score down to 40% compared to state-of-the-art generative models such as Pix2Pix, and also improve the attack success rate 38% compared to other adversarial attack methods including PGD.