IMAGE CAPTIONING USING TRANSFORMER ARCHITECTURE

Nanal, Wrucha A

doi:10.25394/PGS.21674945.v1

Wrucha_Nanal_Thesis_Work_PDF.pdf (1.44 MB)

IMAGE CAPTIONING USING TRANSFORMER ARCHITECTURE

thesis

posted on 2022-12-06, 02:38 authored by Wrucha A NanalWrucha A Nanal

The domain of Deep Learning that is related to generation of textual description of images is called ‘Image Captioning.’ The central idea behind Image Captioning is to identify key features of an image and create meaningful sentences that describe the image. The current popular models include image captioning using Convolution Neural Network - Long Short-Term Memory (CNN-LSTM) based models and Attention based models. This research work first identifies the drawbacks of existing image captioning models namely – sequential style of execution, vanishing gradient problem and lack of context during training.

This work aims at resolving the discovered problems by creating a Contextually Aware Image Captioning (CATIC) Model. The Transformer architecture, which solves the issues of vanishing gradients and sequential execution, forms the basis of the suggested model. In order to inject the contextualized embeddings of the caption sentences, this work uses Bidirectional Encoder Representation of Transformers (BERT). This work uses Remote Sensing Image Captioning Dataset. The results of the CATIC model are evaluated using BLEU, METEOR and ROGUE scores. On comparison the proposed model outperforms the CNN-LSTM model in all metrices. When compared to the Attention based model’s metrices, the CATIC model outperforms for BLEU2 and ROGUE metrices and gives competitive results for others.

History

Degree Type

Master of Science

Department

Computer Science

Campus location

Fort Wayne

Advisor/Supervisor/Committee Chair

Mohommadreza Hajiarbabi

Additional Committee Member 2

Jin Soung Yoo

Additional Committee Member 3

Venkata Inokollu

Usage metrics

Keywords

Transformer Architecture Remote Sensing Images

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

IMAGE CAPTIONING USING TRANSFORMER ARCHITECTURE

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Usage metrics

Categories

Keywords

Licence

Exports