NOTICE: The Processors Wiki will End-of-Life in December of 2020. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
Using NEON and VFPv3 on Cortex-A8
The compiler supports two different options to control NEON and VFPv3.
The --float_support=VFPv3 option instructs the compiler to generate code that utilizes the VFPv3 coprocessor for both double and single precision floating point operations. The option is also used to enable the assembler to accept VFPv3 instructions in assembly source. To enable VFPv3 the EABI mode must also be enabled through the --abi=eabi option. This is necessary because the calling convention for floating point paramemters changes when VFPv3 is enabled and that convention is only supported in EABI mode.
The --neon option instructs the compiler to automatically vectorize loops to use the NEON instructions. To get benefit from this option you should be using --opt_level=2 or higher and be generating code for performance by using the --opt_for_speed=[3-5] option.
The TI ARM compiler supports four modes related to Cortex-A8, NEON, and VFPv3. By default neither NEON or VFPv3 is enabled. In addition to the default the following 3 modes are supported:
- VFP enabled without NEON
- The compiler will generate VFPv3 instructions for single and double precision floating point operations
- NEON enabled without VFP
- In this mode the compiler will generate NEON instructions for SIMD integer operations. It will not generate NEON instructions to vectorize floating point operations. The motivation for not allowing floating point NEON instructions if VFP is not enabled is because it is possible to have an integer only variant of NEON implemented. In order for the NEON unit to support floating point operations the VFPv3 coprocessor must be present.
- NEON enabled and VFP enabled
- In this mode the compiler will generate a mix of NEON and VFP instructions. The NEON instructions can be either integer or floating point.
VFPv3 vs. NEON performance
A common question with regard to TI ARM compiler's support for NEON is how to get more floating point operations on the NEON unit instead of the VFPv3. The reason this is desirable is because the VFPv3 coprocessor is not a pipelined architecture on the Cortex-A8, but the NEON is. The compiler will always use VFP instructions for scalar floating point operations, even if the --neon option is used. The hardware is capable of issuing VFP instructions on the NEON coprocessor if the following conditions are met:
- The instruction must be a single precision data processing instruction
- The processor must be in flush-to-zero mode. In this mode the processor will treat all denormalized numbers as zero.
- The processor must be in default NaN mode. In this mode the operation will return the default NaN regardless of the input, whereas in full-compliance mode the returned NaN follows the rules in the ARM Architecture Reference Manual.
- The FPEXC.EX bit must be set to 0. This tells the processor that there is no additional state that must be handled by a context switch.