<p dir="ltr">Software reliability is of vital importance given software's increasingly critical role in areas like healthcare and transportation, where failures can lead to significant monetary losses and even loss of human life. Unreliability can arise at any of the four primary stages of the Software Development Life Cycle (SDLC): Planning and Designing, Implementation, Quality Assurance, and Deployment. Each stage presents unique challenges, including bridging the gap between informal documentation and formal specifications, generating reliable code, producing effective test cases, and securely auditing binaries. Traditional approaches to addressing these challenges often struggle with scalability, generalizability, and practical effectiveness, particularly when tackling inherently complex or undecidable problems.</p><p><br></p><p dir="ltr">In response to these persistent challenges, advancements in modern Natural Language Processing (NLP) techniques, including Large Language Models (LLMs), offer promising new directions. These models have shown strong capabilities in understanding and generating natural language and other structured artifacts. This thesis investigates the application of modern NLP techniques to improve software reliability. It presents novel applications of NLP techniques across key software engineering tasks throughout the SDLC and provides empirical evaluations to assess their capabilities and feasibility.</p><p><br></p><p dir="ltr">The thesis explores the application of NLP techniques through five core projects. It begins by demonstrating how NLP can bridge the gap between informal documentation and formal specifications through DocTer, a pattern-based technique that extracts input constraints from natural language to guide the testing of deep learning libraries. This approach is then extended into CEDAR, a continuous testing framework designed to support the reliability of rapidly evolving deep learning libraries through scalable and automated testing across versions. The work also conducts the first empirical study evaluating the effectiveness of LLMs in generating formal specifications from software documentation and comments, comparing them to traditional methods. Furthermore, it introduces ReSym, a hybrid approach combining LLMs with program analysis to recover symbolic information from stripped binaries for reverse engineering and security purposes. Finally, the thesis presents CoRe, a benchmark designed to evaluate LLMs' code reasoning capabilities through fundamental static analysis tasks.</p><p><br></p><p dir="ltr">Together, these projects demonstrate how modern NLP techniques, encompassing both traditional pattern-based methods and LLMs, can be effectively leveraged across various stages of the SDLC to improve software reliability. The research shows that these approaches can be applied individually or combined with traditional techniques for more reliable and scalable solutions. The empirical evaluations provide valuable insights into the feasibility, strengths, and current limitations of LLM-based methods for tasks like specification generation and code reasoning. By highlighting both the practical applications and the existing challenges for LLMs, this work guides future research toward developing more robust and dependable NLP-driven software engineering practices.</p>