Purdue University Graduate School
Browse

Leveraging Linguistic Insights for Uncertainty Calibration of ChatGPT and Evaluating Crowdsourced Annotations

Download (1.79 MB)
thesis
posted on 2024-07-09, 15:36 authored by Venkata Divya Sree PulipatiVenkata Divya Sree Pulipati

The quality of crowdsource annotations has always been a challenge due to the variability in annotators backgrounds, task complexity, the subjective nature of many labeling tasks, and various other reasons. Hence, it is crucial to evaluate these annotations to ensure their reliability. Traditionally, human experts evaluate the quality of crowdsourced annotations, but this approach has its own challenges. Hence, this paper proposes to leverage large language models like ChatGPT-4 to evaluate one of the existing crowdsourced MAVEN dataset and explore its potential as an alternative solution. However, due to stochastic nature of LLMs, it is important to discern when to trust and question LLM responses. To address this, we introduce a novel approach that applies Rubin's framework for identifying and using linguistic cues within LLM responses as indicators of LLMs certainty levels. Our findings reveal that ChatGPT-4 successfully identified 63% of the incorrect labels, highlighting the potential for improving data label quality through human-AI collaboration on these identified inaccuracies. This study underscores the promising role of LLMs in evaluating crowdsourced data annotations offering a way to enhance accuracy and fairness of crowdsource annotations while saving time and costs.


History

Degree Type

  • Master of Science

Department

  • Computer and Information Technology

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Tianyi Li

Additional Committee Member 2

Tatiana Ringenberg

Additional Committee Member 3

Julia Rayz

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC