NOTICE: The Processors Wiki will End-of-Life in December of 2020. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

C55x DSPLIB FAQ

From Texas Instruments Wiki
(Redirected from DSPLIB)
Jump to: navigation, search

Content is no longer maintained and is being kept for reference only!


Overview

The DSP Library (DSPLIB) is a collection of high-level optimized DSP function modules for the C55x DSP platform. This source-code library includes C-callable functions (ANSI-C language compatible) for general signal processing math and vector functions that have been ported to C55x DSPs. The functions listed in the features section are specifically optimized for the C55x DSPs.

Functions

Please check the product documentation or DSPLIB#Manuals for a detailed list of functions. Functions included are:

  • FFT
  • Filtering and convolution
  • Adaptive filtering
  • Correlation
  • Math
  • Trigonometric
  • Miscellaneous
  • Matrix

Download

Roadmap

  • Update to rev 3 of 55x core.
    • Note current library was originally written for the rev 2.2 of the 55x core. Updating to rev 3 should yield some performance improvements. The rev 2.2 code should operate correctly on the rev 3 core.

Manuals


FAQ

Q: Can the DSPLIB implementation of FFT algorithms support 2048 or 4096 point complex FFTs?

  • The C55x DSP Library Programmer's Reference Guide says that the number of elements in the input vector must be between 8 and 1024. This is the default configuration for the FFT routines.
  • The DSPLIB FFT code implementation already supports larger sizes. The 1024 size was decided as the default based on customer requirements at the time the LIB was developed. The size is dependent on the twiddle table that is incorporated in the DSPLIB library. If you look in CCS folder at \C5500\dsplib\twiddle you will see that there are twiddle tables to support up to 4096 point FFT.
  • To support the larger size you will need to rebuild the DSPLIB with the larger twiddle table. You will need to change the twiddle.asm file that is in the \dsplib\55x_src folder. There are 2 files, twiddle.asm and twiddle.inc. I think that twiddle.asm is the one that is used by the fft routines. Section 2.3 of the DSP Library Reference Guide describes how to rebuild the DSPLIB once twiddle.asm is replaced in the \dsp_src folder.

Q: Does the DSPLib use the FFT Coprocessor found in the 55x?

Q: Where can I learn more about the 55x HW accelerator for the FFT?

Q: How do get a 2048 or 4096 FFT using the HW accelerator?

  • See HWAFFT App Note (SPRABB6) which can be found at tidoc:sprabb6 and look at section 8 - Computation of Large (Greater Than 1024-Point) FFTs on the VC5505 and C5505/15

Q: Can the 55x support 32-bit FFT?

  • A: Yes, through the DSPlib. Please see the cfft32, cifft32, rfft32, and rifft32 functions in DSPLIB].

Q: Why does DSPLIB Add function give incorrect results?

The add.asm file of DSPLIB v2.40.00 incorrectly sets circular addressing mode for the Auxillary Register AR1. Circular addressing is not required for this function and causes AR1 to be set to zero for use as offset pointer for a circular buffer implementation.


The routines in the DSPlib are tested using test routines which come with the library. The file add.asm passes these test routines, because the registers BSA01, which is used for the computation of the start address of the circular buffer, and BK03, which determines the buffer size, were set to zero. This setting effectively disables the circular "wrapping" of the addresses, and has a linear addressing scheme as a consequence. If registers BSA01 and BK03 are non-zero then the test will fail.


To correct this situation change add.asm statement:

ST2mask    .set  0000000000010010b to
ST2mask    .set  0000000000010000b


This bug has been reported for C55x DSPLIB v2.40.00.

Q: How can I rebuild the DSPLIB v2.40.00 library for a newer CPU rev. 3.x?


Here is how to rebuild the DSP library package for CPU rev. 3.3 with options to choose memory model:

  1. Copy and rename Blt55xx.bat in the dsplib_2.40.00 folder. Open with a text editor.
  2. Comment out c:\ti\dosrun.bat (CCS3.3 relic). REM is the CMD line comment (remark)
  3. Specify the path to the most current Code Gen Tools (CGT)
    Download Latest C55x Code Gen Tools here
  4. Add OPTIONS for C5505/15 (cpu:3.3, 5505, 5515 are equivalent) and desired memory model (use --ptrdif_size=32 for huge mem model only)
  5. Change the name of the output library to represent memory model (DSPLIB=55xdsph)
  6. Either double click the batch file or run it from Windows CMD line to see the output...
  7. Include the newly created library file in your project and test it with the DSPLIB examples


Here is how the Blt55xh.bat file should look for C5505/15(cpu:3.3) & huge mem model:

REM this file is for building DSPLIB in huge memory module on C5505/15
REM call c:\ti\dosrun.bat
set CGT_BIN=C:/Program Files/Texas Instruments/ccsv4/tools/compiler/C5500 Code Generation Tools 4.3.8/bin
set PATH=%PATH%;%CGT_BIN%
REM Equivalent OPTIONS for C5505/15, huge memory model:
set OPTIONS= -g -vcpu:3.3 --memory_model=huge --ptrdiff_size=32
REM set OPTIONS= -g -v5515 --memory_model=huge --ptrdiff_size=32
REM set OPTIONS= -g -v5505 --memory_model=huge --ptrdiff_size=32
REM Equivalent OPTIONS for C5505/15, large memory model:
REM set OPTIONS= -g -vcpu:3.3 --memory_model=large
REM set OPTIONS= -g -v5515 --memory_model=large
REM set OPTIONS= -g -v5505 --memory_model=large
REM set OPTIONS= -g -v5505 -ml
set DSPLIB=55xdsph
set SRC=55x_src
set FILES= *.asm
set EXT=asm
REM built library in SRC dir  and then copy to root
del %DSPLIB%.src
del %DSPLIB%.lib
cd %SRC%
del *.obj
cl55 %OPTIONS% %FILES%
ar55 -r %DSPLIB%.src *.%EXT% ..\include\*.h
ar55 -r %DSPLIB%.lib *.obj
copy %DSPLIB%.src ..
copy %DSPLIB%.lib ..
del *.src
del *.lib
dir *.obj >> ..\junk.txt
del *.obj
cd ..

Related E2E Forum Post

Q: Is DSPLIB v2.40.00 optimized for C5515/5535 (Rev 3.0) cores?)

  • A: No, it is not optimized for rev 3 cores. This is planned for DSPLib 3.0 which is tentatively planned for release in 1H2013 (as of Oct 16, 2012). This date is subject to change.

Q: I am having trouble compiling DSPlib under Windows...

  • A: This may be a problem because the DSPlib may have UNIX style linefeeds, while the windows utilities may need DOS style linefeeds.
  • A: This is filled as a TI bug #1485
  • A: You can use a DOS batch file to fix linefeeds:

<syntaxhighlight lang=dos> @echo off

cls

set DIR_PATH=%1

for /f %%i in ('cd') do set RET_DIR=%%i


for /r %DIR_PATH% %%i in (*.c; *.h; *.txt; *.asm; *.inc; *.cmd; *.bat; *.gel; *.tcf;) do (

%%~di

cd %%~dpi

rem echo %%~dpi

perl -p -e 's/\n/\r\n/' <%%~nxi> %%~nxi_temp

if exist %%~nxi_temp del %%~nxi

rem echo %%~nxi rem echo %%~nxi_temp

if exist %%~nxi_temp ren %%~nxi_temp %%~nxi

)

cd %RET_DIR%

pause </syntaxhighlight>

Q: Where can I get benchmarks for the algorithms?

v2.40.00 Errata

Rather than post the full code diffs, these notes are intended as brief documentation of errors that have been confirmed in the latest release of DSPLIB.

add alters 3 status registers without save and restore

The vector add() routine modifies ST0_55, ST1_55, and ST2_55 without saving the original values. Although an attempt is made to reverse the flag changes in one or two status registers, there's no guarantee that the original settings are restored. This could be fixed by pushing the original status register values on the stack.

mul32 alters T3 and AR5 without saving

The C language ABI for the C5000 requires that the callee preserve T2, T3, AR5, AR6 and AR7. mul32() alters T3 and AR5 without saving and restoring the initial value. PSH T3, PSHBOTH XAR5 and POPBOTH XAR5, POP T3 are all that need to be added to fix this.

sqrt_16 has off-by-one error and is coded inefficiently

The vector square root routine incorrectly rounds the results by adding 1, such that the square root of zero is one, and all results are off by one as well. This can be repaired by changing the bit shift count in the final instruction. In addition, the code does not make use of the ROUND instruction or the Round modifier for any other instruction, and thus the code is more complex than it needs to be. If optimizations are made, then no stack space is needed for intermediate values.

unpack writes outside the stack 50% of the time

The unpack() routine that is part of the rfft() macro uses AADD #-2, SP and MOV pair(T2), dbl(*SP(#0)) to save T2 and T3, following by MOV dbl(*SP(#0)), pair(T2) and AADD #2, SP to restore them. However, if the stack pointer happens to be odd, then T3 will be written outside the stack where it can be trashed by any hardware interrupt. The single instruction PSH T2, T3 accomplishes the required task in half the cycles and one fifth the code size, including the matching POP T2, T3.

twiddle table is not aligned

The documentation for cfft() states that the twiddle table must be aligned on a 32-bit word boundary, and yet the twiddle.asm source does nothing to ensure that this happens. Adding a single line with " .align 1024" between the .sect and twiddle: label makes everything kosher.

Support

Related