Purdue University Graduate School
Browse

AN ANALYSIS ON SHORT-FORM TEXT AND DERIVED ENGAGEMENT

Download (5.68 MB)
thesis
posted on 2024-07-22, 04:39 authored by Ryan J SchwarzRyan J Schwarz
<p dir="ltr">Short text has historically proven challenging to work with in many Natural Language<br>Processing (NLP) applications. Traditional tasks such as authorship attribution benefit<br>from having longer samples of work to derive features from. Even newer tasks, such as<br>synthetic text detection, struggle to distinguish between authentic and synthetic text in<br>the short-form. Due to the widespread usage of social media and the proliferation of freely<br>available Large Language Models (LLMs), such as the GPT series from OpenAI and Bard<br>from Google, there has been a deluge of short-form text on the internet in recent years.<br>Short-form text has either become or remained a staple in several ubiquitous areas such as<br>schoolwork, entertainment, social media, and academia. This thesis seeks to analyze this<br>short text through the lens of NLP tasks such as synthetic text detection, LLM authorship<br>attribution, derived engagement, and predicted engagement. The first focus explores the task<br>of detection in the binary case of determining whether tweets are synthetically generated or<br>not and proposes a novel feature extraction technique to improve classifier results. The<br>second focus further explores the challenges presented by short-form text in determining<br>authorship, a cavalcade of related difficulties, and presents a potential work around to those<br>issues. The final focus attempts to predict social media engagement based on the NLP<br>representations of comments, and results in some new understanding of the social media<br>environment and the multitude of additional factors required for engagement prediction.</p>

History

Degree Type

  • Doctor of Philosophy

Department

  • Computer Science

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Clifton W. Bingham

Advisor/Supervisor/Committee co-chair

Edward J. Delp

Additional Committee Member 2

Dan Goldwasser

Additional Committee Member 3

Lin Tan

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC