Reading news articles from online sources has become a major choice of obtaining
information for many people. Authors who wrote news articles could introduce their own biases
either unintentionally or intentionally by using or choosing to use different words to describe
otherwise neutral and factual information. Such intentional word choices could create conflicts
among different social groups, showing explicit and implicit biases. Any type of biases within the
text could affect the reader’s view of the information. One type of biases in natural language is
gender bias that had been discovered in a lot of Natural Language Processing (NLP) models,
largely attributed to implicit biases in the training text corpora. Analyzing gender bias or
stereotypes in such large corpora is a hard task. Previous methods of bias detection were applied
to short text like tweets, and to manually built datasets, but little works had been done on long text
like news articles in large corpora. Simply detecting bias on annotated text does not help to
understand how it was generated and reproduced. Instead, we used structural topic modeling on a
large unlabelled corpus of news articles, incorporated qualitative results and quantitative analysis
to examine how gender bias was generated and reproduced. This research extends the prior
knowledge of bias detection and proposed a method for understanding gender bias in real-world
settings. We found that author gender correlated to the topic-gender prevalence and skewed
media-gender distribution assist understanding gender bias within news articles.