A GPGPU PROGRAMMING FRAMEWORK BASED ON A SHARED-MEMORY MODEL

doi:10.2316/Journal.211.2013.3.211-1053

A GPGPU PROGRAMMING FRAMEWORK BASED ON A SHARED-MEMORY MODEL

Kazuhiko Ohno, Dai Michiura, Masaki Matsumoto, Takahiro Sasaki, and Toshio Kondo

References

[1] J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A.E. Lefohn, and T.J. Purcell, A survey of general-purpose computation on graphics hardware, Computer Graphics Forum, 26 (1), 2007, 80–113.
[2] Gpgpu.org. http://www.gpgpu.org/.
[3] CUDA Zone. http://developer.nvidia.com/category/zone/cuda-zone.
[4] OpenCL. http://www.khronos.org/opencl/.
[5] NVIDIA Corporation. NVIDIA’s Next Generation CUDA Compute Architecture: Fermi, 1.1 edition, 2009.
[6] NVIDIA Corporation. NVIDIA CUDA C Programming Guide, 4.2 edition, April 2012.
[7] NVIDIA Corporation. CUDA C Best Practices Guide, 4.1 edition, January 2012.
[8] M. Baskaran, J. Ramanujam, and P. Sadayappan, AutomaticC-to-CUDA code generation for aﬃne programs. In Compiler construction, volume 6011 of Lecture notes in computer science(Berlin/Heidelberg: Springer, 2010) 244–263.
[9] S. Lee, S. Min, and R. Eigenmann, OpenMP to GPGPU: a compiler framework for automatic translation and optimization. SIGPLAN Notations, 44, 2009, 101–110.
[10] N. Sundaram, A. Raghunathan, and S.T. Chakradhar, A framework for eﬃcient and scalable execution of domain- speciﬁc templates on GPUs, International Parallel and Distributed Processing Symposium, 0, 2009, 1–12.
[11] Y. Yang, P. Xiang, J. Kong, and H. Zhou, A GPGPU compiler for memory optimization and parallelism management,SIGPLAN Notations, 45, 2010, 86–97.
[12] S. Ueng, M. Lathara, S.S. Baghsorkhi, and W.W. Hwu, CUDA-Lite: reducing GPU programming complexity, in Languages and compilers for parallel computing (2008), 1–15.
[13] NVIDIA Corporation, Thrust quick start guide, March 2012.
[14] J. Protić, M. Tomaševć, and V. Milutinović, Distributed shared memory: concepts and systems, IEEE Parallel and Distributed Technology, 4 (2), 1996, 63–79.
[15] S. Raina, Virtual shared memory: a survey of techniques andsystems. Technical report, University of Bristol, 1992.
[16] I. Gelado et al., CUBA: an architecture for eﬃcient CPU/co-processor data communication, Proc. 22nd Annual Int. Conf. on Supercomputing, ICS ’08, 2008, 299–308.
[17] B. Dreier, M. Zahn, and T. Ungerer, The Rthreads distributed shared memory system, Proc. 3rd Int. Conf. on Massively Parallel Computing Systems, 1998.
[18] I. Gray and N.C. Audsley, Exposing non-standard architectures to embedded software using compile-time virtualization, Proc. 2009 Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, CASES ’09, 2009, 147–156.
[19] Q. Hou, K. Zhou, and B. Guo, BSGP: bulk-synchronous GPUprogramming, in ACM SIGGRAPH 2008 papers, SIGGRAPH ’08, 2008, 19:1–19:12.
[20] Himeno benchmark. http://accc.riken.jp/HPC_e/himenobmt_e.html.
[21] NVIDIA Corporation. NVIDIA’s next generation CUDA compute architecture: Kepler GK110, 1.0 edition, 2012.

Important Links:

Abstract
DOI: 10.2316/Journal.211.2013.3.211-1053
From Journal (211) Parallel and Distributed Computing and Networks - 2013

Go Back