This arcticle is part of a collection of articles describing the C6Accel included in DaVinci/OMAPL/OMAP3 devices. To navigate to the main page for the C6Accel click on the link above.
This wiki is created to answer all the queries related to C6EZAccel aka C6Accel.
- 1 C6Accel FAQ
- 2 C6Accel General Questions
- 3 Installation
- 4 Application
- 4.1 What DSP kernels can I assess through C6Accel
- 4.2 How can I access floating point kernels in C6Accel on a SoC device with a floating point DSP?
- 4.3 I am not aware of the iUniversal inteface? How do I use C6Accel without having to learn the iUniversal interface
- 4.4 How do I extract the maximum performance out of C6Accel
- 4.5 What kind of performance can I expect from C6Accel
- 4.6 What is the inter-processor overhead involved in C6Accel ?
- 4.7 How can I add my own kernel to C6Accel and consume it in my ARM application
- 4.8 What debugging options do I have while adding functions to C6Accel
- 4.9 Are there any restrictions on the types of functions I can add to C6Accel
- 5 Build Errors
This page contains frequently asked questions (FAQs) about the C6Accel and it's usage and the corresponding answers to those questions.
C6Accel General Questions
What does C6Accel do?
C6Accel is a library of key DSP software kernels packaged as an XDAIS algorithm that can be invoked from the ARM side using simple API calls.The purpose of C6Accel is to provide the ARM user with the compute power of the DSP on computational intense tasks like running Color Space conversion, Filtering or Image/Signal Processing algorithm. The library kernels wrapped in the algorithm are optimized for performance on the DSP and would allow the ARM user to use the DSP as an accelerator for their application.By using these routines, the ARM application developer can achieve execution speeds considerably faster than equivalent C code written on ARM.In addition, by providing ready-to-use DSP kernels, C6Accel can significantly shorten the ARM application development time.
How can I download it?
- The latest version of the tool can be downloaded free of cost from here.
- C6Accel is also part of the software development kit on OMAPL and OMAP3 which can be downloaded from here. Advantage of downloading C6Accel with the DVSDK is that the kit provides all the dependencies of the package in a single download which will help you avoid downloading individual dependencies from different locations.
What are the dependencies?
- Codec Engine: codec_engine_2_21 and higher
- XDCTools: XDCtools_3_10_03 and higher
- XDAIS: xdais_6_23
- LINUXUTILS: linuxutils_2_23_01
- CODE_GEN: cg6x_6_0_21
- DSP BIOS
- Code sourcery tools
- FRAMEWORK COMPONENTs
- EDMA LLD
All these dependencies can be found in the DVSDK package for the target platform
What platforms does this support
C6Accel targets two-core heterogeneous SoC processors from Texas Instruments, specifically ARM+DSP devices which run Linux on the ARM. C6Accel has been tested on OMAP3530/DM3730, OMAPL138 and DM6467. The current C6Accel package supports OMAP3 and OMAPL family of devices. The support for using C6Accel on DM6467 can be found on the wiki Using C6Accel on DM6467 with DVSDK 3.x.
C6Accel algorithm is platform independent and hence can also be used on the OMAPL137 and DM6446 devices but is currently being tested on these platforms.
Is there a test application that I can use as a starting point?
Yes, C6Accel comes with a test application that tests all the kernels in C6Accel for correctness and benchmarks the kernel API calls for providing performance estimates to a developer. This test application code can be found along the path $(C6ACCEL_INSTALL_DIR)/soc/app. Build the package as described in Building C6Accel package section of the Reference guide to verify that the test application on your device.
Is C6Accel an open source tool?
C6Accel is not licensed as an open source tool. It is shipped with TI TSPA license which allows for easy redistribution. The C6Accel algorithm that enables execution of code on the DSP is available in complete source and can be modified as per application requirements. However the optimized DSP software libraries that are linked to the algorithm are only provided in binary in the package. The source for these individual libraries is available for free and can be downloaded from ti.com.
- C64x+DSPLIB DSP Signal Processing Library version 2.10
- IMGLIB-2 DSP Image/Video Processing Library Release version 2.0.1
- IQMath Release version 2.1.3
- C67xFastRTS Fast Run Time Support Library Release: Version 1.03
- C674x DSPLIB Digital Signal Processing Library Release: Version 1.2
- C64P_LIBPLUS : This library contains library kernel that currently are not part of standard offerings and is included to provide additional functionalities with C6Accel. (unversioned library not available in source.)
Note: Users who are interested in obtaining the source for this library can make their request on TI`s e2e forums.
What OS do I need?
How do I install and run the package?
- Download the setup file C6Accel-1.0-Setup.exe from the target content page
- Install the package by running the setup. The setup installs the package along the default path C:\Program files\C6Accel however it is recommended to download the package in the SDK_INSTALL_DIR. If the user chooses to install the package along any other path, he is expected to set approriate paths in the xdcpaths.dat file of the SDK before rebuilding the SDK so that the build tools can find the C6Accel codec.
- Download the setup file C6Accel-1.0-Linux-x86-Install from the target content page into a temperory folder (eg /tmp).
- To install the C6Accel package using the Linux installer,log in using a user account . The user account must have execute permission for the all the installation files. Switch user to “root” on the host Linux workstation and change directories to the temporary location where you have downloaded the bin files. Once you have changed the execute permissions you can go back to a normal user.
host $ su root host $ cd /tmp host $ chmod +x *.bin host $ exit
3. Execute the C6Accel installer that you previously downloaded from the SDK target content download page.
For example: host $ cd /tmp host $ ./C6Accel-1.0-Linux-x86-Install
4. Set paths to dependencies in the Rules.make file in the package. Ensure the C6ACCEL_INSTALL_DIR is set to the path where the C6accel package is installed
C6ACCEL_INSTALL_DIR = USER_DEF_PATH
5. Build the package by using the command make command in the root directory of the package.
Where can I find the documentation ?
Documentation for C6Accel can be found in $(C6ACCEL_INSTALL_DIR)/docs. The most recent version of the document is available here
What DSP kernels can I assess through C6Accel
For the list of kernels that can be accessed through C6Accel refer to C6Accel wrapper API reference guides
How can I access floating point kernels in C6Accel on a SoC device with a floating point DSP?
C6Accel contains fixed point as well as floating point kernels. However in order to maintain portability between C64+ and C674x devices C6accel package is configured to provide access to just the fixed point kernel library. On C647x devices, in order to access the floating point kernels in C6Accel, the C6Accel package module contains a FLOAT Boolean flag which need to be set in order to access the floating point kernels. Default settings sets this FLOAT flag set to false.
In order to access floating point kernels add the following script to the .cfg file of the application.
var C6ACCEL = xdc.useModule('ti.c6accel.ce.C6ACCEL'); C6ACCEL.alg.FLOAT = true;
When the application builds the codec server with this float flag set, the C6accel package is directed to pick the .l674 library which contains the floating point kernels in addition to the fixed point kernels. A build error will be seen on a C64+ device if the application configuration tries to set this flag.
Note: The C6ACCEL.alg.FLOAT is a legacy feature and applies only to C6Accel version 1.01.00.06 and earlier.
I am not aware of the iUniversal inteface? How do I use C6Accel without having to learn the iUniversal interface
The C6Accel package contains a wrapper library which is designed to abstract iuniversal and C6accel design considerations from an ARM application developer and provides an interface that appears like a simple function call within an application. The C6Accel wrapper library is available in source as well as object to be linked to the application. Refer package directory /soc/c6accelw to view wrapper library source.
How do I extract the maximum performance out of C6Accel
Accessing multiple kernels in C6Accel using the C6Accel wrapper API library creates multiple passes through the codec engine, thereby adding the codec engine overhead to the application every time a kernel is called from C6Accel. C6Accel supports chaining of API calls whereby upto 16 kernels(depending on number of input and output parameters) in C6accel using a single iUniversal API call. Accessing multiple kernels using a single API call averages the codec engine overhead thereby allowing the application to draw maximum performance out of C6Accel. However to draw maximum performance out of C6Accel, the user is expected to understand the design of C6Accel and must have the understanding of the iUniversal interface of the codec engine. For chaining of APIs refer to Chaining_calls_to_kernels_in_a_single_API_call_to_C6ACCEL_codec
Note: Chaining of APIs should be enabled only when the kernel functionalities with in the application needs to be called in succession.
What kind of performance can I expect from C6Accel
These are the results from the benchmarking tests for kernels in C6Accel
Benchmarking results for C6Accel functions called synchronously from ARM:
- File:Benchmarks sync dm6467.pdf
- File:Benchmarks sync omap3530.pdf
- File:Benchmarks sync dm3730.pdf
- Benchmarking on OMAPL137
- File:Benchmarks sync omapl138.pdf
- File:Benchmarks sync ti816x.pdf
Due to inter-processor over head involved in calling the DSP from the ARM it is generally seen that C6Accel performs better as the size of the processing data increases. Inorder to interpret the benchmarking data accurately, please read the following section on overheads involved in C6Accel.
Note: Benchmarks assume scaling governor to be userspace(setting device at maximum frequency) or performance.[For applicable devices only]
What is the inter-processor overhead involved in C6Accel ?
In a dual-core processor environment like OMAP3 and OMAPL, processing a buffer of data from the ARM on the DSP requires:
- Address translation from ARM-side virtual to DSP-side physical (fast)
- Transitioning execution from ARM to DSP-side processing (fast, < 100 microseconds)
- ARM side cache invalidation of buffer passed from the ARM to the DSP.
- Invalidating cache of the buffers so the DSP sees the right data (slow, especially with very large data buffers)
- Activating, processing, deactivating the C6Accel algorithm on the DSP (typically fast, but variable based on functionality invoked)
- Writing back the cache of the buffers so the ARM sees the right data (slow, especially with very large data buffers)
- Transitioning execution from DSP to ARM-side processing (fast, < 100 microseconds)
- Address translation back from DSP-side physical to ARM-side virtual (fast)
(More details in the Codec Engine Overhead article.)
Due to this overhead C6Accel will not be able to provide satisfactory performance on small data buffer sizes. Here is an analysis of some key functions that compare C6Accel performance with that on Native ARM running equivalent C code.
A general observation from this analysis showed that any functionality that took about 1ms or more on the ARM performed much better on the DSP through C6Accel.
How can I add my own kernel to C6Accel and consume it in my ARM application
What debugging options do I have while adding functions to C6Accel
Since C6Accel uses the codec engine to run algorithms on the DSP, we can take advantage of CE and DSPlink debug options to profile any C6Accel based application.
- Users can turn on the Codec Engine trace to profile your test application using the CE_DEBUG feature
- Users can debug the DSP side of you application using the the CE_DSPDEBUG option mentioned in the wiki article on Debugging DSP side of codec engine application
Are there any restrictions on the types of functions I can add to C6Accel
C6Accel is an eXpressDSP-compliant(XDAIS) algorithms which allows it to execute via the codec engine along side other algorithms like the codecs. Some general guide lines that users must keep in mind while adding functions to C6Accel are as mentioned below. Following these guidelines will ensure that the modified C6Accel algorithm is XDAIS compliant.
- Algorithms must not allocate memory.
- Algorithms are not allowed to perform I/O.
- Algorithms must be reentrant and must, therefore, only reference reentrant functions.
Most of the build errors will result from not setting the right path in the Rules.make file. Ensure that all the path for the package dependencies are set accurately. Also ensure that DSPLink and CMEM module have been built prior to building the package., Codec Engine IUNIVERSAL support, OpenCL or RCM