FOCALSR: REVISITING IMAGE SUPER-RESOLUTION TRANSFORMERS WITH FFT-ENABLED CROSS ATTENTION LAYERS

Ou, Botong

doi:10.25394/PGS.24712446.v1

File(s) under embargo

Reason: This work is currently under review by CVPR 2024 conference

4

month(s)

8

day(s)

until file(s) become available

FOCALSR: REVISITING IMAGE SUPER-RESOLUTION TRANSFORMERS WITH FFT-ENABLED CROSS ATTENTION LAYERS

thesis

posted on 2023-12-06, 13:51 authored by Botong OuBotong Ou

Motion blur arises from camera instability or swift movement of subjects within a scene. The objective of image deblurring is to eliminate these blur effects, thereby enhancing the image's quality. This task holds significant relevance, particularly in the era of smartphones and portable cameras. Yet, it remains a challenging issue, notwithstanding extensive research undertaken over many years. The fundamental concept in deblurring an image involves restoring a blurred pixel back to its initial state.

Deep learning (DL) algorithms, recognized for their capability to identify unique and significant features from datasets, have gained significant attention in the field of machine learning. These algorithms have been increasingly adopted in geoscience and remote sensing (RS) for analyzing large volumes of data. In these applications, low-level attributes like spectral and texture features form the foundational layer. The high-level feature representations derived from the upper layers of the network can be directly utilized in classifiers for pixel-based analysis. Thus, for enhancing the accuracy of classification using RS data, ensuring the clarity and quality of each collected data in the dataset is crucial for the effective construction of deep learning models.

In this thesis, we present the FFT-Cross Attention Transformer, an innovative approach amalgamating channel-focused and window-centric self-attention within a state-of-the-art(SOTA) Vision Transformer model. Augmented with a Fast Fourier Convolution Layer, this approach extends the Transformer's capability to capture intricate details in low-resolution images. Employing unified task pre-training during model development, we confirm the robustness of these enhancements through comprehensive testing, resulting in substantial performance gains. Notably, we achieve a remarkable 1dB improvement in the PSNR metric for remote sensing imagery, underscoring the transformative potential of the FFT-Cross Attention Transformer in advancing image processing and domain-specific vision tasks.

History

Degree Type

Master of Science

Department

Computer and Information Technology

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Baijian Yang

Advisor/Supervisor/Committee co-chair

Gang Shao

Additional Committee Member 2

Jin Wei-Kocsis

Usage metrics

Keywords

Computer Vision Deep Learning Image Processing

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under embargo

4

8

FOCALSR: REVISITING IMAGE SUPER-RESOLUTION TRANSFORMERS WITH FFT-ENABLED CROSS ATTENTION LAYERS

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Advisor/Supervisor/Committee co-chair

Additional Committee Member 2

Usage metrics

Categories

Keywords

Licence

Exports