Stream Experiments: Toward Latency Hiding in GPGPU

S. Laosooksathit, C. Leangsuksun, A. Baggag, and C. Chandler (USA)


High performance computing, GPGPU, and Latency hiding


In multithreaded programming on GPUs, data transfer between CPU and GPUs is a major impendence that prevents GPU to achieve its potential. Hence, stream management framework – a latency hiding strategy introduced by CUDA, becomes our attention. Streaming allows overlapping between kernel execution time and transfer time of independent data between CPU and GPUs. For this reason, the total execution time can potentially be reduced. In this paper, we introduced performance models in order to study the utilization of streams. Moreover, we have studied two methods that are used for timing operations in CUDA, namely CUDA calls and CUDA events. CUDA call functions are functions implemented in C++, while CUDA events method is an API. Our finding shows that CUDA events method is more accurate for timing operations running on GPU than CUDA call functions.

Important Links:

Go Back