Applying Data-Parallel and Scalar Optimizations for the Efficient Implementation of the G.729A and G.723.1 Speech Coding Standards

K. Koutsomyti, S.R. Parr, V.A. Chouliaras, and J. Nunez (UK)


Speech processing devices, Speech Coding, VoIP, Coprocessors, Embedded systems, RISC CPU.


This work quantifies the performance benefit of vectorized versions of the ITU-T G.729A and G.723.1 speech coding standards. Architecture-level experimentation with the addition of custom vector instructions indicates a reduction in the dynamic instruction count of the workloads of the order of 51% and 65% respectively at a vector register length of sixteen 16-bit elements. The identified vector instructions are encapsulated in a configurable, vector accelerator that attaches to an open-source RISC CPU. The developed vector ISA is further extended via a number of scalar, custom, arithmetic instructions which yield an additional benefit of 17% and 10% respectively. We present a new implementation of the combined scalar-vector accelerator which maintains zero Load-Use latency while reducing the silicon footprint via dynamic allocation of the vector datapath to the scalar coprocessor.

Important Links:

Go Back