<p dir="ltr">Machine learning and causal inference have largely developed as separate fields, but there is growing interest in cross-learning between them. In particular, causality is closely connected to generalization and robustness in machine learning. This dissertation comprises two studies, each addressing one of the two areas. In the first study, this dissertation presents a theoretical analysis of the convergence behavior of Stochastic Gradient Descent (SGD) under data contamination, specifying conditions for convergence and cases of divergence. Guided by this analysis, a robust variant of SGD that balances batch size and learning rate is developed. Compared to the existing robust gradient descent method that uses full samples, the proposed approach greatly reduces computational costs while maintaining the same estimation error rate. This robust method is applied to an autoencoder neural network model, offering a solution that enhances robustness without modifying the network structure. Through experiments on synthetic and image datasets, the effectiveness of this robust approach has been shown.</p><p dir="ltr">In the second study, this dissertation clarifies a fundamental difference between causal inference and traditional statistical inference by formalizing a mathematical distinction between their respective parameters. This dissertation connects two major approaches to causal inference, the potential outcomes framework and causal structure graphs, which are typically studied separately. While the unconfoundedness assumption in the potential outcomes framework cannot be assessed from an observational dataset alone, causal structure graphs help explain when causal effects are identifiable through graphical models. A statistical test is proposed to assess the unconfoundedness assumption, equivalent to the absence of unmeasured confounding, by comparing two datasets: a randomized controlled trial and an observational study. The test controls the Type I error probability, and its power under linear models is analyzed. The approach provides a practical method to evaluate when real-world data are suitable for causal inference.</p>