<pre>Modern deep neural networks achieve state-of-the-art performance across numerous domains, yet this performance comes at the cost of substantial computational resources and energy requirements. As AI systems continue to scale, the energy footprint of both training and inference has become increasingly unsustainable. This dissertation explores hardware-software co-design strategies to address these energy challenges by leveraging probabilistic bits (p-bits), also known as binary stochastic neurons (BSNs), which offer promising pathways toward efficient hardware implementations.<br>This work makes three primary contributions. First, I develop a building block for p-circuits and establish a systematic methodology for energy benchmarking of p-circuit-based neural networks. This framework provides metrics and design parameters that explicitly connect algorithmic design choices to hardware energy costs, enabling more informed co-design decisions.<br>Second, I introduce the Binary Stochastic Forward-Forward (BSFF) algorithm, a training approach that eliminates the need for backpropagation by employing p-bit activations throughout the network. This method replaces computationally expensive multiply-accumulate operations with more efficient index-and-accumulate primitives. Through analytical modeling, I show that this approach can substantially reduce activation compute energy while maintaining compatibility with existing weight quantization techniques.<br>Third, I propose an on-device inference refinement technique based on multi-sample integer stochastic evaluation. By averaging predictions across a small number of stochastic samples, this method improves the accuracy of quantized networks without requiring hardware modifications, and provides a tunable accuracy-energy trade-off that can be adjusted post-deployment.<br>I evaluate these methods across standard benchmarks including MNIST, CIFAR-10, and ImageNet, testing on architectures ranging from ResNets to Vision Transformers. The results demonstrate that probabilistic approaches can achieve competitive accuracy while offering significant energy advantages. While the majority of experiments are conducted in software simulation, the findings provide important foundational insights for future hardware implementations of probabilistic activations and p-circuit primitives.<br>Overall, this dissertation contributes to the broader goal of sustainable AI by demonstrating how device-level stochasticity, when combined with principled co-design methodologies, can help mitigate the growing energy demands of deep learning systems.</pre><p></p>