PhD_Thesis_Chang_Li_Final_Draft.pdf (4.32 MB)
IMPROVING STANCE AND BIAS DETECTION IN TEXT BY MODELING SOCIAL CONTEXT
thesisposted on 2021-04-30, 03:12 authored by Chang LiChang Li
Understanding the stance and bias reflected in the text is an essential part of achieving machine intelligence. Successful detection of them will not only provide us with a huge amount of insights about public opinion and sentiment but also lay the foundation for serving the most reliable and accurate information to meet people's needs. Traditionally, this problem is often modeled merely as a text classification task. However, it is highly challenging due to the huge variation involved in opinion expressions as well as the need for background knowledge and commonsense reasoning. Meanwhile, just as we want to understand a word based on its context, we also have social contexts for a piece of text, including its author, its sharing pattern online, and its narrative about notable entities and events. These important factors have been largely ignored in previous work. In this dissertation, we tackle this problem by proposing three novel neural network models. Each of them capturing one important social context that can provide rich signals for the detection of stance and bias. The first model aims at predicting the stance of posts from online debate forums. We proposed a structured representation learning model that can make use of the authorship relation and conversational structure in debates. It takes advantage of both collective relational classification methods and distributed representation learning. The performance boost after the inference that is defined over the embedding space. The second model focuses on bias detection in news articles. We identify the social context available for many news articles, which is the engagement pattern over social media. We construct the social information graph involving news articles and apply GCN to aggregate local neighborhood information when generating graph representations. A joint text and graph model is then used to propagate information from both directions. Experimental results show even little social signals can lead to significant improvement. Last but not least, we explore the situation where we cannot obtain context information for test articles. In this case, we designed pre-training strategies that can inject external knowledge about entities, frames, and sharing users into the text model so that it can better identify relevant text spans for bias classification. We also show larger performance gains can be achieved when the supervision is limited, demonstrating the advantage of our model in such cases. Empirical results demonstrate that our models significantly outperform competitive baseline methods, by more accurately regularize the text representation given additional signals available in the social context and by identifying the portion of the text where stance and bias are most readily perceptible.
- Doctor of Philosophy
- Computer Science
- West Lafayette