C6EZAccel ARM user Documentation

From Texas Instruments Embedded Processors Wiki

Jump to: navigation, search
Translate this page to   

^ Up to main C6Accel Main Page

This arcticle is part of a collection of articles describing the C6EZAccel included in DaVinci/OMAPL/OMAP3 devices.  To navigate to the main page for the C6EZAccel reference guide click on the link above.

Contents

Using C6EZAccel in an ARM application

User classification based on Level of control

C6EZAccel package is designed for two class of ARM SoC users who want different level of control.

The user experience for utilizing C6EZAccel for both class of users is as described in the section Interface C6EZAccel with the Application.

Interfacing the C6EZAccel with the application.

The initial Steps of interfacing C6EZAccel with the application will vary based on the user experience with tools offered by TI


Initial configuration steps Users with no experience with XDCtools and Codec engine

For users with no experience with XDC based builds the C6EZAccel package provides an easy to interface prebuilt configuration. this is the simplest way of interfacing C6EZAccel to an ARM application. Prerequiste : Build the C6EZAccel package.

# Location of Prebuilt configuration files
XDC_CFG		= $(C6ACCEL_INSTALL_DIR)/soc/app/c6accel_app_config
 
# Compiler options to be added to your compile step
XDC_CFLAGS	= $(XDC_CFG)/compiler.opt
 
# Linker file to be linked to your linker
XDC_LFILE	=  $(XDC_CFG)/linker.cmd 
 
# C6EZAccel ARM side Library
C6ACCEL_LIB += $(C6ACCEL_INSTALL_DIR)/soc/c6accelw/lib/c6accelw_$(PLATFORM).a470MV
CFLAGS += XDC_CFLAGS
$(TARGET):	$(OBJFILES) $(C6ACCEL_LIB) $(XDC_LFILE)

Note: For Users using this mechanism The alg name is c6accel and engine name is the name of the platform (omap138 or omap3530). This information is used in the application code to invoke C6accel_create.

After completing the initial configuration Steps go to the section Common Steps for all users of C6Accel to view steps needed to use C6EZAccel in application code.

Initial configuration steps for Users familiar with the XDC Tools and codec Engine

necessary to house the codecs (e.g. DSP/BIOS, Framework Components, link drivers, codecs, Codec Engine, etc.) and generates an executable. For any user defined application the user needs to integrate C6EZAccel in the user defined codec server. For details of integrating C6EZAccel into the codec server refer to the Codec Engine Server Integrator's Guide.

The C6EZAccel package comes with a prebuilt unitserver that is built specific to the platform is utilized in the sample test app that can be found under soc/app.

To integrate a codec server and invoke a codec the application must contain

The configuration file (.cfg) uses createFromServer() to integrate the specific server that needs to be invoked in the application:

var demoEngine = Engine.createFromServer(
    "omap3530",
    "./omap3530.x64P",
    "ti.c6accel_unitservers.omap3530"
    );
# C6EZAccel ARM side Library
C6ACCEL_LIB += $(C6ACCEL_INSTALL_DIR)/soc/c6accelw/lib/c6accelw_$(PLATFORM).a470MV

Link this library in the linker step to invoke C6EZAccel ARM side APIs

#Eg XDC build Step in make file
$(XDC_LFILE) $(XDC_CFLAGS):	$(XDC_CFGFILE)
	@echo
	@echo ======== Building $(TARGET) ========
	@echo Configuring application using $<
	@echo
	$(VERBOSE) $(CONFIGURO) -o $(XDC_CFG) -t $(XDC_TARGET) -p $(PLATFORM_XDC) -b $(CONFIG_BLD) $(XDC_CFGFILE)

Once the engine is configured in the application it can be invoked from the application.

Note: The Makefile for the application must include the C6EZAccel install directory $(C6ACCEL_INSTALL_DIR).

Common Steps for all users of C6EZAccel.

1. Include Codec Engine header files

#include <ti/sdo/ce/Engine.h>
#include <ti/sdo/ce/CERuntime.h>
#include <ti/sdo/ce/osal/Memory.h>

Note: If the application uses DMAI the Memory include can be replaced by a dmai include. These includes are necessary as C6EZAccel like all DSP codecs expects application developer to allocate contiguous memory for parameters the input and ouput buffers/vectors being passed.

2. Include C6EZAccel application codec header file iC6accel_ti.h or the C6EZAccel wrapper API file c6accelw.h

#include "../c6accelw/c6accelw.h"

OR

#include "ti/c6accel/iC6accel_ti.h"

Note: The codec packge path must be set as include path


3. Declare a C6accel Handle

C6accel_Handle hC6accel = NULL;

4. Define Engine Name (same as configured in the .cfg file) and alg name (default: c6accel) In this case

#define ENGINENAME "omap3530"
#define ALGNAME "c6accel"

5. Before creating a C6EZAccel instance the user must ensure that the codec engine runtime initialization is performed. This can done using the codec engine API CE_Runtime_init() API before any of the C6EZAccel APIs are used in the code.

CE_Runtime_init();

6. Once Codec Engine is initialized, the user can call C6accel_create() that will generate the C6accel handle.

hC6accel = C6accel_create(engineName, NULL, algName, NULL);

Refer to C6Accel_create to find details of the create API call.

7. Once the C6accel_create is successfully invoked , basic user can make calls to kernels in the codec using API calls as shown in section C6Accel Wrapper Library Reference and advanced users can utilize the chaining API feature as explained in section Chaining calls to kernels in a single API call to C6ACCEL codec Note: All input and output buffer parameters used in these API calls need to be contiguous in memory.

8. Once the C6EZAccel functionality in the application is complete the user is expected to tear down the codec using C6accel_delete() API.

Note: C6EZAccel can work with heap as well as pool CMEM memory allocations.

Accessing floating point kernels in C6Accel 1.01.00.06 or earlier

C6EZAccel contains fixed point as well as floating point kernels. However inorder to maitain portability between C64+ and C674x devices C6EZAccel package is configured to provide access to just the fixed point kernel library. On C647x devices inorder to access the floating point kernels in C6EZAccel, the C6EZAccel package module contains a FLOAT Boolean flag which needs to be set inorder to access the floating point kernels.Default settings sets this FLOAT flag set to false.

Inorder to access floating point kernels add the following script to the .cfg file codec/unit server.

   var C6ACCEL = xdc.useModule('ti.c6accel.ce.C6ACCEL');
   C6ACCEL.alg.FLOAT=true;

Eg. View codec.cfg file in omapl138 unit server include in the c6accel package along the path $(C6ACCEL_INSTALL_DIR)/soc/packages/ti/c6accel_unitservers/omapl138

When the application builds the codec server with this float flag set to true, the C6accel package is directed to link in the the .l674 library which contains the floating point kernels in addition to the fixed point kernels. A build error will be seen on a C64+ device if the server configuration tries to set this flag.

Example Applications using C6EZAccel

C6EZAccel ARM side wrapper library Reference

The C6EZAccel wrapper library is designed to abstract iuniversal and C6EZAccel design considerations and provides an interface that appear like a simple function call within an application. The C6EZAccel wrapper library is available in source as well as object to be linked to the application. Refer package directory /soc/c6accelw to view wrapper library source.

Using C6EZAccel wrapper library in the application

Inorder to use C6EZAccel wrapper APIs in the application add the appropriate library from $(C6ACCEL_INSTALL_DIR)/c6accelw/lib in the Make file for the application:

For eg. On OMAP3530 platform add the following to the Makefile to include the C6EZAccel wrapper library in the application.

OBJFILES += $(C6ACCEL_INSTALL_DIR)/c6accelw/lib/c6accelw_omap3530.a470MV

Common wrapper calls and defintions

C6accel_Handle

C6accel_Handle is a handle to the C6Accel Object. The C6Accel Object is defined as

C6accel_Object {
    Engine_Handle hEngine;
    UNIVERSAL_Handle hUni;
    E_CALL_TYPE callType;
} C6accel_Object;

The C6Accel Object carries the Engine Handle and the IUniversal Handle required for the current instance. E_CALL_TYPE is a custom defined datatype that can be take values as ASYNC or SYNC based on application requirements to make asynchronous or synchronous calls to the DSP.

C6Accel_create()

C6accel_Handle C6accel_create(String engName, Engine_Handle hEngine,String algName, UNIVERSAL_Handle hUniversal);

Arguments:

Return: API returns C6Accel Handle if invoked successfully or NULL if create call failed

Description: This API returns a C6Accel Handle from the Engine Handle and universal handle passed from the application.

Note: Default C6Accel handle is configured to make synchronous calls to the DSP. To enable asynchronous calling of the DSP refer to [ Making_Asynchronous_Calls_to_DSP_kernels_in_C6Accel]

Details:

Case Engine Handle Universal Handle Action
Case 1 NULL NULL Creates engine handle from engine name and universal handle from algname and returns C6accel Handle
Case 2 Passed from app Null Creates universal handle and passes exiting engine handle to C6Accel object and return C6Accel HAndle
Case 3 passed from app passed from app Passes engine and universal handle to C6accel object and returns C6Accel handle
Case 4 NULL(No engine name passed) X Returns NULL
Case 5 passed from app NULL (No algname) Returns NULL


X : Don`t care

C6accel_delete()

int C6accel_delete(C6accel_Handle handle);

Arguments:

Description: This API tears down the C6accel instance by closing the codec engine and IUNIVERSAL interface using the C6accel Handle.

Error Codes

For all the C6EZAccel wrapper API kernel that call a functionality on the DSP, the error codes returned from the API are documented below

This error is most likely occur only if the buffers/vectors being passed to the codec are not assigned contiguous memory. It can also occur if the application makes an asynchronous call to C6EZAccel when there is already a pending asynchronous call.

Specific fail messages

This error is like to occurring when the parameters passed do not satisfy the parameter specifications of the underlying kernel. Check Wrapper API documentation of that specific kernel to know more about the range of permissible parameters.

This error is likely when the application passes a wrong function ID to the codec. This error is unlikely to occur while using the wrapper API calls as the passing of the function ID is done inside the wrapper code. In case an advanced user comes across this error please verify if the function ID being passed is defined in the application codec interface header file iC6accel_ti.h.

Reference based on categories of kernels in C6EZAccel

The C6EZAccel organizes kernels into seven different functional categories: Digital Signal processing, Image Processing, Math, Analytics, Medical, Audio/Speech processing and Power/Control.

Note: The initial version of the C6EZAccel provides only kernels in the Digital signal processing, Image Processing and Math category.

The references for these functional categories have been furnished below:

C6Accel Signal Processing API Reference guide

C6Accel Image Processing API Reference guide

C6Accel Math API Reference guide

Making Asynchronous Calls to DSP kernels in C6EZAccel


C6Accel async.JPG

Asynchronous calling feature of C6EZAccel enables parallel processing on ARM and the DSP. Inorder to switch between Synchronous and Asynchronous calling C6EZAccel defines the following APIs

int C6Accel_setAsync(C6Accel_Handle hC6accel)

This sets calling mode to asynchronous.

int C6Accel_setSync(C6Accel_Handle hC6accel)

This sets calling mode to synchronous.

CALL_TYPE C6Accel_readCallType(C6Accel_Handle hC6accel)

This returns the current calling mode set in the application

Int C6accel_waitAsyncCall(C6accel_Handle hC6accel)

Wait for Async call to complete. The result from the DSP code will only be available when the Async call completes.

ARM application can make an async call and then perform other processing until the Async call completes, thereby allowing it maximum headroom for adding new features and improving performance.


Important Notes:

The performance improvement obtained from Asynchronous processing is depicted below. The test application code included in the package contains example code to showcase this asynchronous calling.



Async call.JPG

Advanced Features

Chaining calls to kernels in a single API call to C6Accel

Adding kernels/libraries to C6Accel

Integrating C6Accel in user defined codec server

Using C6Accel on DM6467

Benchmarking and Performance of the kernels

These are the results from the benchmarking tests for kernels in C6EZAccel

Benchmarking results for C6EZAccel functions called synchronously from ARM:

Due to inter-processor over head involved in calling the DSP from the ARM it is generally seen that C6EZAccel performs better as the size of the processing data increases. Inorder to interpret the benchmarking data accurately, please read the following section on overheads involved in C6EZAccel.

Note: Benchmarks assume scaling governor to be userspace(setting device at maximum frequency) or performance.[For applicable devices only]

C6EZAccel Overhead

In a dual-core processor environment like OMAP3 and OMAPL, there is some inherent overhead in processing a buffer on a remote core. As C6EZAccel builds on Codec Engine, the Codec Engine Overhead article describes overheads (and improvement strategies!) relevant to C6EZAccel as well.

In short, processing a buffer of data from the ARM on the DSP requires:

  1. Address translation from ARM-side virtual to DSP-side physical (fast)
  2. Transitioning execution from ARM to DSP-side processing (fast, < 100 microseconds)
  3. ARM side cache invalidation of buffer passed from the ARM to the DSP.
  4. Invalidating cache of the buffers so the DSP sees the right data (slow, especially with very large data buffers)
  5. Activating, processing, deactivating the C6EZAccel algorithm on the DSP (typically fast, but variable based on functionality invoked)
  6. Writing back the cache of the buffers so the ARM sees the right data (slow, especially with very large data buffers)
  7. Transitioning execution from DSP to ARM-side processing (fast, < 100 microseconds)
  8. Address translation back from DSP-side physical to ARM-side virtual (fast)


Due to this overhead C6EZAccel will not be able to provide satisfactory performance on small data buffer sizes. Here is an analysis of some key functions that compare C6EZAccel performance with that on Native ARM running equivalent C code.



C6AccelvsARM convolution.JPG
Figure: Performance of C6Accel vs Native ARM running 8 bit 3x3 mask convolution function


C6AccelvsARM FFT.JPG
Figure: Performance of C6Accel vs Native ARM running 8K sample floating point FFT function

A general observation from this analysis showed that any functionality that took about 1ms or more on the ARM performed much better on the DSP through C6Accel.

C6EZAccel Advanced Users Guide

Information about using C6EZAccel in an ARM application can be found here

Return to C6EZAccel Main page

Click Here

E2e.jpg For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article C6EZAccel ARM user Documentation here.
Hyperlink blue.png Links
ARM Microcontroller MCU ARM Processor Digital Media Processor Digital Signal Processing Microcontroller MCU Multi Core Processor
Ultra Low Power DSP 8 bit Microcontroller MCU 16 bit Microcontroller MCU 32 bit Microcontroller MCU

Leave a Comment

Comments

Comments on C6EZAccel ARM user Documentation


SmashedSqwurl said ...

In step 5 under the common steps, the line CE_Runtime_init(); should be CERuntime_init();

--SmashedSqwurl 13:56, 4 April 2012 (CDT)

Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox