C2000 FFT: VCU, FPU or FixedPoint
From Texas Instruments Embedded Processors Wiki
Translate this page to
Contents |
Introduction
This page describes the different FFT implementations on a C28x.
FFT Implementations
- 32-bit Fixed-Point
- This implementation uses the C28 fixed-point CPU. It uses the on-chip 32bit fix-point math capabilities of the CPU. As a rule of thumb, it takes ~20 cycles for each FFT butterfly for an optimized 32-bit implementation.
- 16-bit Fixed-Point
- This implementation can use the C28 fixed-point CPU or the C28x with VCU enhancements (C28x+VCU).
- C28x Fixed-Point
- This implementation uses the 16-bit math capabilities of the C28x fixed-point CPU. If using the C28 CPU core, it takes ~16 cycles per FFT butterfly
- C28x with VCU
- This implementation uses the 16-bit math capabilities of the C28x with VCU. The VCU provides optimized 16-bit complex math capabilities that are in addition to that of the fixed-point CPU. If using the C28x+VCU enhancements, then it takes ~5 cycles for each FFT butterfly
- 32-bit Floating-Point
- This implementation uses the extended floating-point instruction set. It uses the 32-bit floating point math capabilities of the C28x+FPU as well as the repeat block (RPTB) instruction. As a rule of thumb, it takes ~10 cycles for each FFT butterfly. If the implementation is on a floating-point device and 32-bits are required, then this is the preferred implementation.
Conclusions
- 16-bit Implementation
- The C28x+VCU implementation is the best option. The magnitude and phase calculations would be the same as on a fixed-point device. This is because the VCU does not have enhancements to improve these algorithms.
- 32-bit Fixed-Point FFT Performance
- To improve the performance of a 32-bit fixed-point FFT:
- Consider using a floating-point device. The FPU can double the performance. In addition, magnitude and phase calculations are faster because the FPU does a better job at this than 32-bit fixed-point math. The trade-off in resolution between a 32-bit fixed-point and 32-bit floating-point implementation is negligible.
- If the application can tolerate a 16-bit implementation, then consider using the C28x+VCU. This would be faster compared to a 32-bit fixed-point implementation. The VCU does not, however, have instructions to improve the performance of a magnitude or phase calculation. These operations are best done in floating-point.
- 32-bit FPU vs 16-bit VCU
- The performance difference between a 16-bit VCU and a 32-bit FPU implementation is not great. The VCU does not have enhancements to improve the performance of magnitude and phase calculations.
- CLA
- While the CLA itself is not well suited for a full FFT algorithm, it could be considered for magnitude and phase calculations. This would offload these operations from the main CPU. On a device like 2806x a floating point FFT could be performed on the main C28x+FPU and the magnitude calculation performed on the CLA, as an example.
Abbreviations
This is a list of some of the terms and abbreviations used in this article. For a more complete list please visit C2000 Terms and Abbreviations.
- C28x
- By itself it is the fixed-point CPU. It supports both 16-bit and 32-bit operations.
- C28x + FPU
- The C28x CPU with 32-bit floating-point extensions. Sometimes called C28x with FPU or shortened to FPU.
- C28x + VCU
- The C28x CPU with Viterbi, Complex Math and CRC extensions. Sometimes called C28x with VCU or shortened to VCU.
- C28x + FPU + VCU
- A C28x CPU with both the floating-point and VCU extensions.
- CLA
- Control Law Accelerator. Refer to:
- CPU
- Central Processing Unit. In C2000 we typically use this to refer to the main processor. For example, the C28x CPU.
- FPU
- Floating Point Unit. The FPU adds floating-point extensions to the main CPU instruction set.
- VCU
- Short for Viterbi, Complex Math and CRC Unit. The VCU adds instructions to support these operations to the main CPU instruction set.
