Loading Events

« All Events

  • This event has passed.

Final PhD Defense for Tao Wang

December 17, 2019 - 2:00 pm - 4:00 pm

Title:  Compiler-based Auto-tuning and Synchronization Validation for HPC Applications

Examination Committee:
Dr. Guoliang Jin (Co-Chair)
Dr. Frank Mueller (Co-Chair)
Dr. James Tuck (Graduate School Representative)
Dr. Min Chi

All members of the university community are invited.

* * *
Abstract:   Modern high performance computing (HPC) architectures feature multi-core processors with deep memory hierarchies, complex out-of-order instruction pipelines, powerful single instruction multiple data (SIMD) components, and heterogeneous accelerators. In practice, due to these architectural complexities, performance portability is a serious problem since programs tuned for one architecture usually achieve sub-optimal performance on another, which translates into excessive waste of energy and entails significant performance tuning efforts. Therefore, automatic performance tuning techniques are in high demand at DOE laboratories. Existing tools are limited in different ways. On one side, traditional compiler-based auto-tuning approaches generate many functionally equivalent code variants and evaluate them on a tuning input to identify the best one. To generate a code variant, these approaches usually compile all program source files with the same compilation flags. Furthermore, after identifying the best code variants, the final performance evaluation is usually done on a small number of testing inputs. However, these experimental settings have limitations in two ways. First, different source modules may need specialized flags to achieve the best performance. Second, a program may have severe input  sensitivity so that the tuned executable yields sub-optimal performance on many other inputs in practice. Another problem is posed by multi-threaded HPC program correctness issues, such as deadlocks and data races, due to ad hoc synchronizations introduced by developers for performance purposes. There is also a need for novel tools to support bug detection and semantics validation for ad hoc synchronization constructs, e.g., ad hoc barriers. The state-of-art ad hoc synchronization analysis tools can only detect simple happen-before relationships between different program points and cannot detect complex synchronization constructs, such as ad hoc barriers, neither can they enumerate thread interleaving space to validate their dynamic semantics correctness.
This dissertation addresses these limitations of auto-tuning and ad hoc synchronization analysis technologies. We first propose a fine-grained compilation framework, FuncyTuner, to specialize the compilation for HPC program hot loops by utilizing per-loop profiling information to search the extremely large compilation flag space. Compared to the state-of-art, FuncyTuner improves performance of modern parallelized scientific programs by 4.5% to 10.7% (geometric mean) relative to the baseline. We then propose CodeSeer to evaluate different types of program sensitivity and build machine learning models to tackle the challenges presented by highly sensitive programs. Our experimental results show that all HPC programs expose certain type of basic input sensitivity and tuning inputs should be selected carefully. For those with high sensitivity, codeseer predictive models achieve 92% prediction accuracy while introducing less than 0.01 second online prediction overhead. Second, we contribute a framework, BarrierFinder, to automatically identify complex ad hoc synchronizations and infer their enforced order relationships. BarrierFinder features various techniques, including program slicing and bounded symbolic execution, to efficiently explore the interleaving space of ad hoc synchronizations within multi-threaded programs for their traces. BarrierFinder then uses these traces to characterize ad hoc synchronizations into different types, such as barriers. Our evaluation shows that BarrierFinder is both effective and efficient in its analysis. BarrierFinder also reliably detects deadlocks and atomicity violations for counter-based barrier implementations.

Details

Date:
December 17, 2019
Time:
2:00 pm - 4:00 pm

Venue

3300 EB2