<p dir="ltr">Detecting subtle and manipulative forms of online child exploitation, such as grooming, remains a major challenge for automated systems. Existing approaches typically operate at the conversation level, overlooking the dynamic and incremental nature of grooming, especially when it involves coercion, manipulation, or implicit social control. This dissertation reframes grooming detection as a segment-level social reasoning task and systematically evaluates large language models (LLMs) in their ability to identify grooming-related behaviors, with particular emphasis on coercive intent and pragmatic function. To support this investigation, a novel evaluation data set was created by extending the PAN 2012 corpus with segment-level annotations driven by theory that capture grooming presence, coercion strategies, and speech acts. These labels enabled rigorous benchmarking of three open-weight LLMs, LLaMA-2-7B, LLaMA-3.1-8B, and Mistral-7B, using a chunk-wise memory-augmented generation framework. This setup mimics the cumulative reasoning process of human annotators by summarizing prior context and feeding it into the model across segments, allowing the LLM to track conversational dynamics over time.</p><p dir="ltr">The results show that grooming detection remains difficult for current LLMs, with low precision and inconsistent temporal performance. In contrast, coercion detection was more tractable across models. Moreover, the models differed in adaptability: LLaMA-3.1-8B was most consistent in coercion detection but overly conservative in speech act extraction, LLaMA-2-7B tended to overpredict and Mistral-7B achieved the best balance across grooming detection and coercion detection task performance. The findings of this study highlight key limitations in current LLMs’ ability to detect grooming, particularly their difficulty in recognizing behavior that unfolds gradually and relies on social context. Addressing this gap will require models that are not only strategy-aware but also supported by datasets that reflect the complexity of real-world abuse, annotated at a finer level and grounded in pragmatic and relational theories. Memory-augmented inference, combined with thoughtful dataset design, offers a path forward for building AI systems that can better identify the less obvious, yet no less harmful, forms of manipulation common in digital spaces.</p>