Exploring Lexical Sensitivities in Word Prediction Models: A case study on BERT
thesisposted on 2020-12-01, 15:04 authored by Kanishka MisraKanishka Misra
Estimating word probabilities in context is the most fundamental mechanism underlying the training of neural network-based language processing models.
Models pre-trained using this mechanism tend to learn task independent representations that exhibit a variety of semantic regularities that are desirable for language processing.
While prediction based tasks have become an important component for these models, much is unknown about what kinds of information the models draw from context to inform word probabilities.
The present work aims to advance the understanding of word prediction models by integrating perspectives from the psycholinguistic phenomenon of semantic priming, and presents a case study analyzing the lexical properties of the pretrained BERT model.
Using stimuli that cause priming in humans, this thesis relates BERT's sensitivity towards lexical cues with predictive contextual constraints and finer-grained lexical relations.
To augment the empirical methodology utilized to behaviorally analyze BERT, this thesis draws on the knowledge-rich paradigm of Ontological Semantics and fuzzy-inferences supported by its practical realization, the Ontological Semantics Technology, to qualitatively relate BERT's predictive mechanisms to meaning interpretation in context.
The findings establish the importance of considering predictive constraint effects of context in studies that behaviorally analyze language processing models, and highlight possible parallels with human processing.