Purdue University Graduate School
Browse

File(s) under embargo

Reason: This work is currently under review by CVPR 2024 conference

5

month(s)

17

day(s)

until file(s) become available

FOCALSR: REVISITING IMAGE SUPER-RESOLUTION TRANSFORMERS WITH FFT-ENABLED CROSS ATTENTION LAYERS

thesis
posted on 2023-12-06, 13:51 authored by Botong OuBotong Ou

Motion blur arises from camera instability or swift movement of subjects within a scene. The objective of image deblurring is to eliminate these blur effects, thereby enhancing the image's quality. This task holds significant relevance, particularly in the era of smartphones and portable cameras. Yet, it remains a challenging issue, notwithstanding extensive research undertaken over many years. The fundamental concept in deblurring an image involves restoring a blurred pixel back to its initial state.

Deep learning (DL) algorithms, recognized for their capability to identify unique and significant features from datasets, have gained significant attention in the field of machine learning. These algorithms have been increasingly adopted in geoscience and remote sensing (RS) for analyzing large volumes of data. In these applications, low-level attributes like spectral and texture features form the foundational layer. The high-level feature representations derived from the upper layers of the network can be directly utilized in classifiers for pixel-based analysis. Thus, for enhancing the accuracy of classification using RS data, ensuring the clarity and quality of each collected data in the dataset is crucial for the effective construction of deep learning models.

In this thesis, we present the FFT-Cross Attention Transformer, an innovative approach amalgamating channel-focused and window-centric self-attention within a state-of-the-art(SOTA) Vision Transformer model. Augmented with a Fast Fourier Convolution Layer, this approach extends the Transformer's capability to capture intricate details in low-resolution images. Employing unified task pre-training during model development, we confirm the robustness of these enhancements through comprehensive testing, resulting in substantial performance gains. Notably, we achieve a remarkable 1dB improvement in the PSNR metric for remote sensing imagery, underscoring the transformative potential of the FFT-Cross Attention Transformer in advancing image processing and domain-specific vision tasks.

History

Degree Type

  • Master of Science

Department

  • Computer and Information Technology

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Baijian Yang

Advisor/Supervisor/Committee co-chair

Gang Shao

Additional Committee Member 2

Jin Wei-Kocsis

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC