A 64-Way SIMD Processing Architecture on an FPGA

R. Hoare, S. Tung, and K. Werger (USA)


SIMD, Architecture, Parallelism, FPGA


The architecture of an FPGA inherently allows for massive parallelism. Currently, FPGAs contain over one hundred thousand logic elements, over a thousand small memories banks and over five hundred 4k-bit memory banks. These architectural features make FPGAs an ideal platform for experimenting with different types of massively parallel architectures. This paper focuses on a Single-Instruction Multiple-Data (SIMD) system where each ALU will operate on its own local memory. The design and performance of a simple ALU that is used to exploit the parallelism of an Altera Stratix FPGA is presented in this paper. The performance and chip utilizations of 2, 4, 8, 16, 32 and 64 processing elements has been examined and found to still offer significant room for scalability to even larger numbers of processors. Our experimental results have found the I/O to be the bottleneck with our current design. Less than 25% of the logic was utilized for the 64 processor SIMD design.

Important Links:

Go Back