Please note as of Wednesday, August 15th, 2018 this wiki has been set to read only. If you are a TI Employee and require Edit ability please contact x0211426 from the company directory.

WinCE-TIBSP Neon/VFP Considerations

From Texas Instruments Wiki
Jump to: navigation, search

NEON Functionality in Combined BSP


  • Neon routines (in assembly form) for accelerating GDI/DDRAW routines can be found under BSP\COMMON\SRC\SOC\COMMON_TI_V1\COMMON_TI\DSS\DDGPE_NEON. Prebuilt library (omap35xx_ddgpe_neon.lib) can be found under the retail directory. These Neon routines are currently used by the display driver only  (see omap_optblt.cpp).
  • Some of the routines had to be disabled (and Microsoft emulated routines are used instead) since enabling the routines resulted in some GDI CETK tests to fail. We believe this is due to some boundary conditions that GDI tests check. In real usage some of these corner test cases are irrelevant (since the calling routines can handle some of the buffer allocations and such). Customers can re-enable these routines if need be.
  • Customer can add their own NEON routines but they would need to worry about assembling it. See note below.

Compiler/Assembler limitation with WinCE6

  • WinCE 6 only supports compiler/assembler for ARMv4 architecture. Hence, WinCE 6 assembler does not support the Neon instructions which are part of the ARMv7 architecture.
  • If modifications are done to the Neon assembly code provided in BSP, then customer needs to use either WinCE 7 tools or WinMo 6.5 tools to recreate the library.

NEON and VFP Co-existence


When multiple threads use either NEON or VFP, the registers have to be saved and restored by the kernel during context switches when a VFP/NEON exception is raised. The OMAP3 uses VFPv3 and as a consequence shares the same set of registers as NEON. Visit ARM website here


  • The WinCE6.0R2 (and R3) kernel only saves D0 to D15 registers. Although Microsoft provides kernel hooks like OEMSaveVFPCtrlRegs() and OEMRestoreVFPCtrlRegs() in order to save extra registers such as VFP registers, implementing these functions won't work as they will only allow you to save eight additional 32 bits registers. We need at least sixteen. Please refer to details at MSDN.
  • If you are interested and have access to the Microsoft Shared Source program, look at the %_WINCEROOT%\Private\WINCEOS\COREOS\NK\KERNEL\ARM\armtrap.s OEMSaveVFPCtrlRegs/OEMRestoreVFPCtrlRegs.
  • You would think you can just increment the number of VFP register in the %_WINCEROOT%\PUBLIC\COMMON\SDK\INC\winnt.h file. However, all the kernel source code would not be aligned if not re-built from the %_WINCEROOT%\PRIVATE source tree.
  • Even with VFP disabled, there is a corruption issue when Neon is used by multiple threads. This is a WinCE kernel related issue and we have a workaround implemented in the BSP to avoid corruption.
  • See note here on VFP library supplied by ARM.


  • Don't know if a reliable one exists as Microsoft has not support for it in its kernel code. And the Shared Source Microsoft Program only allows people to look at the kernel code, no modifications allowed.
  • The only available option with WinCE6 is to use either the ARM supplied VFPv2 library OR Neon - Neon is used by the accelerated GDI functions in the display driver.
  • If you use the ARM supplied VFPv2 library then you must disable the co-proc kernel callback and allow the Microsoft kernel to save/restore the VFP context. Here is how:
In vfphandler.c, in OALVFPInitialize(), change these lines as follows:

    pOemGlobal->pfnInitCoProcRegs    = NULL;
    pOemGlobal->pfnSaveCoProcRegs    = NULL;
    pOemGlobal->pfnRestoreCoProcRegs = NULL;
    pOemGlobal->cbCoProcRegSize      = 0;
    pOemGlobal->fSaveCoProcReg       = FALSE;
  • To disable NEON blits in display driver, Set the following bit to 0 in dss.reg
  • TI DVSDK also uses NEON routines under certain scenarios. One need to make sure that TI DVSDK is not used if NEON save/restore is disabled as suggested above.
  • Enabling both the Neon co-proc callback functions for save and restore context, and using VFPv2 library from ARM will lead to corruption of graphics or incorrect floating point results.