||Adding GPUs to a compute node greatly expands its computational capacity. However,
taking advantage of such nodes is challenging. This talk presents the Hybrid
Task Graph Scheduler (HTGS), an abstract execution model and framework, which
simplifies developing applications for multi-GPU nodes by modularizing a
program into compute kernels, memory management, data motion, and state maintenance.
Furthermore, HTGS maintains a task graph representation at runtime and collects
task-level profile data, thereby identifying bottlenecks and supporting experimentation
for performance. We will present imaging applications that use HTGS to process
and analyze gigapixel images. We will also present two linear algebra benchmarks and
preliminary work with Radio Frequency Interface Mitigation. that exhibits
the applicability of HTGS beyond imaging.