Purdue University Graduate School
Sami_Naqvi_Thesis_Purdue_University.pdf (4.46 MB)


Download (4.46 MB)
posted on 2022-09-08, 16:01 authored by Sami NaqviSami Naqvi

During recent years, the field of computer vision has evolved rapidly. Convolutional Neural Networks (CNNs) have become the chosen default for implementing computer vision tasks. The popularity is based on how the CNNs have successfully performed the wellknown

computer vision tasks such as image annotation, instance segmentation, and others with promising outcomes. However, CNNs have their caveats and need further research to turn them into reliable machine learning algorithms. The disadvantages of CNNs become more evident as the approach to breaking down an input image becomes apparent. Convolutional neural networks group blobs of pixels to identify objects in a given image. Such a

technique makes CNNs incapable of breaking down the input images into sub-parts, which could distinguish the orientation and transformation of objects and their parts. The functions in a CNN are competent at learning only the shift-invariant features of the object in an image. The discussed limitations provides researchers and developers a purpose for further enhancing an effective algorithm for computer vision.

The opportunity to improve is explored by several distinct approaches, each tackling a unique set of issues in the convolutional neural network’s architecture. The Capsule Network (CapsNet) which brings an innovative approach to resolve issues pertaining to affine transformations

by sharing transformation matrices between the different levels of capsules. While, the Residual Network (ResNet) introduced skip connections which allows deeper networks

to be more powerful and solves vanishing gradient problem.

The motivation of these fusion of these advantageous ideas of CapsNet and ResNet with Squeeze and Excite (SE) Block from Squeeze and Excite Network, this research work presents SE-Residual Capsule Network (SE-RCN), an efficient neural network model. The proposed model, replaces the traditional convolutional layer of CapsNet with skip connections and SE Block to lower the complexity of the CapsNet. The performance of the model is demonstrated on the well known datasets like MNIST and CIFAR-10 and a substantial reduction in the number of training parameters is observed in comparison to similar neural networks. The proposed SE-RCN produces 6.37 Million parameters with an accuracy of 99.71% on the MNIST dataset and on CIFAR-10 dataset it produces 10.55 Million parameters with 83.86% accuracy.


Degree Type

  • Master of Science in Electrical and Computer Engineering


  • Electrical and Computer Engineering

Campus location

  • Indianapolis

Advisor/Supervisor/Committee Chair

Mohamed El-Sharkawy

Additional Committee Member 2

Brian King

Additional Committee Member 3

Maher Rizkalla

Usage metrics