Purdue University Graduate School
Browse

A Comprehensive Approach to Evaluating Usability and Hyperparameter Selection for Synthetic Data Generation

Download (4.63 MB)
thesis
posted on 2024-07-20, 18:38 authored by Adriana Louise WatsonAdriana Louise Watson

Data is the key component of every machine-learning algorithm. Without sufficient quantities of quality data, the vast majority of machine learning algorithms fail to perform. Acquiring the data necessary to feed algorithms, however, is a universal challenge. Recently, synthetic data production methods have become increasingly relevant as a method of ad-dressing a variety of data issues. Synthetic data allows researchers to produce supplemental data from an existing dataset. Furthermore, synthetic data anonymizes data without losing functionality. To advance the field of synthetic data production, however, measuring the quality of produced synthetic data is an essential step. Although there are existing methods for evaluating synthetic data quality, the methods tend to address finite aspects of the data quality. Furthermore, synthetic data evaluation from one study to another varies immensely adding further challenge to the quality comparison process. Finally, al-though tools exist to automatically tune hyperparameters, the tools fixate on traditional machine learning applications. Thus, identifying ideal hyperparameters for individual syn-thetic data generation use cases is also an ongoing challenge.

History

Degree Type

  • Master of Science

Department

  • Engineering Technology

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Dr. Grant Richards

Additional Committee Member 2

Dr. Brittany Newell

Additional Committee Member 3

Dr. Alex Damarjian

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC