Discourse Relation Dataset on Hands-on Engineering Tutorial Monologue Video Scripts Based on PDTB3 Scheme
My research focuses on developing a specialized discourse relation dataset from hands-on engineering tutorial monologue video transcripts, based on the Penn Discourse Treebank 3.0 (PDTB-3) annotation framework. The core objective of this work is to systematically analyze and annotate how knowledge is structured and conveyed in technical instructional videos, which are a growing resource for self-guided learning in engineering domains.
The study involves collecting transcripts from real-world engineering tutorials such as those on plumbing, welding, electrical wiring, and HVAC systems and annotating them with discourse relations that capture the logical, temporal, and causal connections between instructional steps. Using the PDTB-3 taxonomy, I categorized these relations into senses such as Contingency.Cause, Temporal.Synchronous, Expansion.Instantiation, and Comparison.Contrast, among others.
What sets this research apart is its domain specificity and its incorporation of challenging discourse phenomena, including implicit connectives, non-adjacent argument spans, alternative lexicalizations (AltLex), and compound relations. These features reflect the nuanced ways expert instructors communicate procedural knowledge. To support this, I implemented a two-stage annotation process first, AI-assisted segmentation , followed by manual refinement and inter-annotator agreement evaluation.
Additionally, the dataset was used to benchmark the discourse annotation capabilities of several Large Language Models (LLMs), such as ChatGPT-4, Claude 3 Sonnet, Gemini 2.0, and Grok 3.0. Metrics like accuracy and F1-score were used to compare their performance against human annotations.
Overall, the research aims to contribute a high-quality resource that can be used to improve automated discourse understanding in technical education contexts. It bridges gaps in existing discourse datasets by targeting domain-specific, spoken-form, procedural knowledge—making it highly relevant for applications in AI-based tutoring systems, expert knowledge transfer, and educational content analysis.
History
Degree Type
- Master of Science in Industrial Technology
Department
- Computer and Information Technology
Campus location
- Hammond