Purdue University Graduate School
Browse
- No file added yet -

General Resource Management for Computationally Demanding Scientific Software

Download (6.09 MB)
thesis
posted on 2022-10-17, 19:29 authored by Xinchen GuoXinchen Guo

Many scientific problems contain nonlinear systems of equations that require multiple iterations to reach converged results. Such software pattern follows the bulk synchronous parallel model. In that sense, an iteration is a superstep, which includes computation of local data, global communication to update data for the next iteration, and synchronization between iterations. In modern HPC environments, MPI is used to distribute data and OpenMP is used to accelerate computation of each data. More MPI processes increase the cost of communication and synchronization whereas more OpenMP threads increase the overhead of multithreading. A proper combination of MPI and OpenMP is critical to accelerate each superstep. Proper orchestration of MPI processes and OpenMP threads is also needed to efficiently use the underlying hardware resources.

  

Purdue’s multi-purpose nanodevice simulation tool NEMO5 distributes the computation of independent spectral points by MPI. The computation of each spectral point is accelerated with OpenMP threads. A few examples of resource utilization optimizations are presented. One type of simulation applies the non-equilibrium Green’s function method to accurately predict drug molecules. Our profiling results suggest the optimum combination has more MPI processes and fewer OpenMP threads. However, NEMO5's memory usage has large spikes for each spectral point. Such behavior limits the concurrency of spectral point calculation due to the lack of swap space on HPC nodes to prevent out-of-memory. 


A distributed resource management framework is proposed and developed to automatically and dynamically manage memory and CPU usage. The concurrent calculation of spectral points is pipelined to avoid simultaneous peak memory usage. This allows more MPI processes and fewer OpenMP threads for higher parallel efficiency. Automatic CPU usage adjustment also reduces the time cost to fill and drain the calculation pipeline. The resource management framework requires minimum code intrusion and successfully speeds up the calculation. It can also be generalized for other simulation software.

History

Degree Type

  • Doctor of Philosophy

Department

  • Electrical and Computer Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Gerhard Klimeck

Advisor/Supervisor/Committee co-chair

Tillmann Kubis

Additional Committee Member 2

Milind Kulkarni

Additional Committee Member 3

Timothy Rogers

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC