AN ANALYSIS ON SHORT-FORM TEXT AND DERIVED ENGAGEMENT

Schwarz, Ryan J

doi:10.25394/PGS.26335663.v1

AN ANALYSIS ON SHORT-FORM TEXT AND DERIVED ENGAGEMENT

thesis

posted on 2024-07-22, 04:39 authored by Ryan J SchwarzRyan J Schwarz

Short text has historically proven challenging to work with in many Natural Language
Processing (NLP) applications. Traditional tasks such as authorship attribution benefit
from having longer samples of work to derive features from. Even newer tasks, such as
synthetic text detection, struggle to distinguish between authentic and synthetic text in
the short-form. Due to the widespread usage of social media and the proliferation of freely
available Large Language Models (LLMs), such as the GPT series from OpenAI and Bard
from Google, there has been a deluge of short-form text on the internet in recent years.
Short-form text has either become or remained a staple in several ubiquitous areas such as
schoolwork, entertainment, social media, and academia. This thesis seeks to analyze this
short text through the lens of NLP tasks such as synthetic text detection, LLM authorship
attribution, derived engagement, and predicted engagement. The first focus explores the task
of detection in the binary case of determining whether tweets are synthetically generated or
not and proposes a novel feature extraction technique to improve classifier results. The
second focus further explores the challenges presented by short-form text in determining
authorship, a cavalcade of related difficulties, and presents a potential work around to those
issues. The final focus attempts to predict social media engagement based on the NLP
representations of comments, and results in some new understanding of the social media
environment and the multitude of additional factors required for engagement prediction.

History

Degree Type

Doctor of Philosophy

Department

Computer Science

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Clifton W. Bingham

Advisor/Supervisor/Committee co-chair

Edward J. Delp

Additional Committee Member 2

Dan Goldwasser

Additional Committee Member 3

Lin Tan

Usage metrics

Keywords

Synthetic Text Detection Authorship Attribution Engagement Prediction

Licence

CC BY 4.0

AN ANALYSIS ON SHORT-FORM TEXT AND DERIVED ENGAGEMENT

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Advisor/Supervisor/Committee co-chair

Additional Committee Member 2

Additional Committee Member 3

Usage metrics

Categories

Keywords

Licence

Exports