Purdue University Graduate School
panayotis-manganaris-thesis-final.pdf (3.99 MB)

Multi-fidelity Machine Learning for Perovskite Band Gap Predictions

Download (3.99 MB)
posted on 2023-06-16, 18:02 authored by Panayotis Thalis ManganarisPanayotis Thalis Manganaris

A wide range of optoelectronic applications demand semiconductors optimized for purpose.

My research focused on data-driven identification of ABX3 Halide perovskite compositions for optimum photovoltaic absorption in solar cells.

I trained machine learning models on previously reported datasets of halide perovskite band gaps based on first principles computations performed at different fidelities.

Using these, I identified mixtures of candidate constituents at the A, B or X sites of the perovskite supercell which leveraged how mixed perovskite band gaps deviate from the linear interpolations predicted by Vegard's law of mixing to obtain a selection of stable perovskites with band gaps in the ideal range of 1 to 2 eV for visible light spectrum absorption.

These models predict the perovskite band gap using the composition and inherent elemental properties as descriptors.

This enables accurate, high fidelity prediction and screening of the much larger chemical space from which the data samples were drawn.

I utilized a recently published density functional theory (DFT) dataset of more than 1300 perovskite band gaps from four different levels of theory, added to an experimental perovskite band gap dataset of \textasciitilde{}100 points, to train random forest regression (RFR), Gaussian process regression (GPR), and Sure Independence Screening and Sparsifying Operator (SISSO) regression models, with data fidelity added as one-hot encoded features.

I found that RFR yields the best model with a band gap root mean square error of 0.12 eV on the total dataset and 0.15 eV on the experimental points.

SISSO provided compound features and functions for direct prediction of band gap, but errors were larger than from RFR and GPR.

Additional insights gained from Pearson correlation and Shapley additive explanation (SHAP) analysis of learned descriptors suggest the RFR models performed best because of (a) their focus on identifying and capturing relevant feature interactions and (b) their flexibility to represent nonlinear relationships between such interactions and the band gap.

The best model was deployed for predicting experimental band gap of 37785 hypothetical compounds.

Based on this, we identified 1251 stable compounds with band gap predicted to be between 1 and 2 eV at experimental accuracy, successfully narrowing the candidates to about 3% of the screened compositions.


Degree Type

  • Master of Science


  • Materials Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Arun Mannodi-Kanakkithodi

Additional Committee Member 2

Alejandro Strachan

Additional Committee Member 3

Kendra Erk