Text Normalization based on Error Type using Pre-trained Language Model

Ko, You Lim

doi:10.25394/PGS.14511942.v1

Thesis_YoulimKo_042921.pdf (4.07 MB)

Text Normalization based on Error Type using Pre-trained Language Model

thesis

posted on 2021-04-29, 18:39 authored by You Lim KoYou Lim Ko

With the emergence of Social media and its growing popularity, there has been substantial growth in User Generated Content (UGC), which holds great potential in extracting meaningful information. Due to the dynamic nature of social media contents, many Natural Language Processing (NLP) systems have suffered from performance degradation due to the original intention in development for application to standard data. To resolve this significant drop in performance, normalization of non-standard data was introduced as a pre-processing step for processing non-standard texts before being applied to these downstream tasks. This thesis focuses on investigating the incorporation of the pre-trained language model BERT in normalization and the varying performance of normalization methods based on different types of errors. In this study, the BERT model is used for the candidate generation of normalization and simple ranking methods are further applied for the candidate ranking on the normalization candidates generated through BERT. The candidate generation performance of BERT and the ranking performance of different methods are investigated based on the different types of errors.

History

Degree Type

Master of Science

Department

Computer and Information Technology

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Eric T. Matson

Advisor/Supervisor/Committee co-chair

Baijian Yang

Additional Committee Member 2

Julia M. Rayz

Usage metrics

Keywords

Natural language processing text normalization pre-trained language model Natural Language Processing

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Text Normalization based on Error Type using Pre-trained Language Model

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Advisor/Supervisor/Committee co-chair

Additional Committee Member 2

Usage metrics

Categories

Keywords

Licence

Exports