Purdue University Graduate School
Master_Thesis_DZ.pdf (786.16 kB)

Examination of Gender Bias in News Articles

Download (786.16 kB)
posted on 2021-12-19, 17:43 authored by Damin ZhangDamin Zhang
Reading news articles from online sources has become a major choice of obtaining information for many people. Authors who wrote news articles could introduce their own biases either unintentionally or intentionally by using or choosing to use different words to describe otherwise neutral and factual information. Such intentional word choices could create conflicts among different social groups, showing explicit and implicit biases. Any type of biases within the text could affect the reader’s view of the information. One type of biases in natural language is gender bias that had been discovered in a lot of Natural Language Processing (NLP) models, largely attributed to implicit biases in the training text corpora. Analyzing gender bias or stereotypes in such large corpora is a hard task. Previous methods of bias detection were applied to short text like tweets, and to manually built datasets, but little works had been done on long text like news articles in large corpora. Simply detecting bias on annotated text does not help to understand how it was generated and reproduced. Instead, we used structural topic modeling on a large unlabelled corpus of news articles, incorporated qualitative results and quantitative analysis to examine how gender bias was generated and reproduced. This research extends the prior knowledge of bias detection and proposed a method for understanding gender bias in real-world settings. We found that author gender correlated to the topic-gender prevalence and skewed media-gender distribution assist understanding gender bias within news articles.


Degree Type

  • Master of Science


  • Computer and Information Technology

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Julia Rayz

Additional Committee Member 2

John Springer

Additional Committee Member 3

Baijian Yang

Usage metrics



    Ref. manager