NOTICE: The Processors Wiki will End-of-Life in December of 2020. It is recommended to download any files or other content you may need that are hosted on The site is now set to read only.


From Texas Instruments Wiki
Jump to: navigation, search


This wiki is part of a "cloud HPC" series showing how to use c66x coCPU™ cards in commodity servers to achieve real-time, high capacity processing and analytics of multiple concurrent streams of media, signals and other data.

DirectCore® software is middleware in the cloud HPC software model (see diagram below), providing an interface between host CPU and coCPU applications. DirectCore® can be used in two modes:

  • Transparent mode, for high level Linux applications that abstract underlying HPC hardware
  • Non transparent mode, for applications that need close interaction with hardware and c66x code

This wiki focuses on non-transparent mode, in which basic APIs are available to the programmer, including coCPU initialization, reset, code download, code symbol lookup, memory read/write, etc. Sections below describe DirectCore functionality, API, and give example source code.

Other wiki's in the cloud HPC series include:

Multiuser Operation[edit]

As expected in high level Linux applications, DirectCore allows true multiuser operation, without time-slicing or batch jobs. Multiple host and VM instances can allocate and utilize coCPU resources concurrently. How this works is described in sections below (see DirectCore Host and Guest Drivers and DirectCore Libraries).

TI-RTOS and Bare Metal Support[edit]

DirectCore supports both TI-RTOS and bare metal applications.

Debug Capabilities[edit]

  • Both local reset and hard reset methods supported. Hard reset can be activated as a backup method in situations where a c66x device has network I/O in the "Tx descriptor stuck" state, or DDR3 memory tests are not passing
  • Core dumps and exception handling statistics can be logged and displayed. For bare metal applications, exception handling is provided separately from standard TI-RTOS functionality
  • Execution Trace Buffer (ETB) readout and display (note -- this capability is in work)

Unified Core List[edit]

DirectCore merges all coCPU cards in the system, regardless of number of coCPUs per card, and presents a unified list of cores. This is consistent with Linux multicore models and is required to support virtualization. Most DirectCore APIs accept a core list parameter (up to 64), allowing API functionality to be applied to one or more cores as needed. Within one host or VM instance, for cores lists longer than 64, multiple "logical card" handles can be opened using the DSAssignCard() API.

Symbol Lookup[edit]

DirectCore provides physical and logical addresses for coCPU source code symbols (variables, structures, arrays, etc). A symbol lookup cache reduces overhead for commonly used symbols.

DirectCore Host and Guest Drivers[edit]

DirectCore drivers interact with coCPU cards from either host or guest instances (VMs). Host instances use a "physical" driver and VMs use virtIO "front end" drivers. Drivers are usually loaded upon server boot, but can be loaded and unloaded manually. Some notable driver capabilities:

  • coCPU applications can share host memory with host applications, and also share between coCPU cores
  • multiple PCIe lanes are used concurrently when applicable
  • both 32-bit and 64-bit BAR modes are supported

DirectCore Libraries[edit]

DirectCore libraries provide a high level API for applications, and are used identically by both host and guest (VM) instances. Some notes about DirectCore APIs:

  • DirectCore libraries abstract all coCPU cores as a unified "pool" of cores, allowing multiple users / VM instances to share c66x resources, including coCPU card NICs. This applies regardless of the number of coCPU cards installed in the server
  • Most APIs have a core list argument, allowing the API to operate on one or more cores simultaneously. This applies to target memory reads and writes; in the case of reads a "read multiple" mode is supported that reads each core into an offset in the specified host memory
  • APIs are fully concurrent between applications. The physical driver automatically maximizes PCIe bus bandwidth across multiple coCPUs
  • APIs are mostly synchronous; reading/writing target memory supports an asynchronous mode where subsequent reads/writes to the same core(s) will block if prior reads/writes have not completed
  • Mailbox APIs are supported, allowing asynchronous communication between host and target CPUs

Software Model[edit]

Below is a diagram showing where DirectCore libs and drivers fit in the cloud HPC software architecture for coCPUs.


HPC software model diagram


Some notes about the above diagram:

  • Application complexity increases from left to right (command line, open source library APIs, user code APIs, heterogeneous programming)
  • All application types can run concurrently in host or VM instances (see below for VM configuration)
  • coCPUs can make direct DMA access to host memory, facilitating use of DPDK. The host memory DMA capability is also used to share data between coCPUs, for example in a TI c66x application such as H.265 (HEVC) encoding, where multiple c66x CPUs must work concurrently on the same data set
  • coCPUs are connected directly to the network. Received packets are filtered by UDP port and distributed to coCPU cores at wire speed

Minimum Host Application Source Code Example[edit]

For non-transparent, or "hands on" mode, the source example below gives a minimum number of DirectCore APIs to make a working program (sometimes called the Hello World program). From a code flow perspective, the basic sequence is:

 open			assign card handles
 load			download executable code
 communicate		read/write memory, mailbox messages, etc.
 close			free card handles

In the source code example below, some things to look for:

  • obtaining a card handle (hCard in the source code)
  • DSLoadFileCores() API, which downloads executable files to one or more coCPU cores, for example ELF format files produced by TI build tools. Different cores can run different executables
  • cimRunHardware() API, which runs code on one or more coCPU cores, including synchronization between host and target required to sync values of shared mem C code variables and buffers
  • use of a "core list" parameter in most APIs. The core list can span multiple coCPUs

<syntaxhighlight lang='c'> /*

  $Header: /root/Signalogic/DirectCore/apps/SigC641x_C667x/boardTest/cardTest.c
    Minimum application example showing use of DirectCore APIs

    host test program using DirectCore APIs and coCPU hardware
  Copyright (C) Signalogic Inc. 2014-2016
  Revision History
    Created Nov 2014 AKM
    Modified Jan 2016, JHB.  Simplify for web page presentation (remove declarations, local functions, etc).  Make easier to read
  • /
  1. include <stdio.h>
  2. include <sys/socket.h>
  3. include <limits.h>
  4. include <unistd.h>
  5. include <sys/time.h>

/* Signalogic header files */

  1. include "hwlib.h" /* DirectCore API header file */
  2. include "cimlib.h" /* CIM lib API header file */

/* following header files required depending on application type */

  1. include "test_programs.h"
  2. include "keybd.h"

/* following shared host/target CPU header files required depending on app type */

  1. ifdef APP_SPECIFIC
 #include "streamlib.h"
 #include "video.h"
 #include "ia.h"
  1. endif

/* Global vars */

QWORD nCoreList = 0; /* bitwise core list, usually given in command line */ bool fCoresLoaded = false;

/* Start of main() */

int main(int argc, char *argv[]) {

HCARD hCard = (HCARD)NULL; /* handle to card. Note that multiple card handles can be opened */ CARDPARAMS CardParams; int nDisplayCore, timer_count; DWORD data[120];

/* Display program header */

  printf("DirectCore minimum API example for coCPU host and VM accelerators, Rev 2.1, Copyright (C) Signalogic 2015-2016\n");

/* Process command line for basic target CPU items: card type, clock rate, executable file */

  if (!cimGetCmdLine(argc, argv, NULL, CIM_GCL_DEBUGPRINT, &CardParams, NULL)) exit(EXIT_FAILURE);

/* Display card info */

  printf("coCPU card info: %s-%2.1fGHz, target executable file %s\n", CardParams.szCardDescription, CardParams.nClockRate/1e9, CardParams.szTargetExecutableFile);

/* Assign card handle, init cores, reset cores */

  if (!(hCard = cimInitHardware(CIM_IH_DEBUGPRINT, &CardParams))) {  /* use CIM_IH_DEBUGPRINT flag so cimInitHardware will print error messages, if any */
     printf("cimInitHardware failed\n");
  nCoreList = CardParams.nCoreList;


  If application specific items are being used, process the command line again using flags and
  structs as listed below (note -- this example gives NULL for the application specific struct)
  App                 Flag          Struct Argument (prefix with &)
  ---                 ----          -------------------------------
  VDI                 CIM_GCL_VDI   VDIParams
  Image Analytics     CIM_GCL_IA    IAParams
  Media Transcoding   CIM_GCL_MED   MediaParams
  Video               CIM_GCL_VID   VideoParams
  FFT                 CIM_GCL_FFT   FFTParams
  • /
  if (!cimGetCmdLine(argc, argv, NULL, CIM_GCL_DEBUGPRINT, &CardParams, NULL)) goto cleanup;

/* Load executable file(s) to target CPU(s) */

  printf("Loading executable file %s to target CPU corelist 0x%lx\n", CardParams.szTargetExecutableFile, nCoreList);
  if (!(fCoresLoaded = DSLoadFileCores(hCard, CardParams.szTargetExecutableFile, nCoreList))) {
     printf("DSLoadFileCores failed\n");
     goto cleanup;

/* Run target CPU hardware. If required, give application type flag and pointer to application property struct, as noted in comments above */

  if (!cimRunHardware(hCard, CIM_RH_DEBUGPRINT | (CardParams.enableNetIO ? CIM_RH_ENABLENETIO : 0), &CardParams, NULL)) {
     printf("cimRunHardware failed\n");  /* use CIM_RH_DEBUGPRINT flag so cimRunHardware will print any error messages */
     goto cleanup;
  nDisplayCore = GetDisplayCore(nCoreList);
  DSSetCoreList(hCard, (QWORD)1 << nDisplayCore);
  printf("Core list used for results display = 0x%llx\n", (unsigned long long)((QWORD)1 << nDisplayCore));

/* Start data acquisition and display using RTAF components */

  setTimerInterval((time_t)0, (time_t)1000);
  printf("Timer running...\n");
  while (1) {  /* we poll with IsTimerEventReady(), and use timer events to wake up and check target CPU buffer ready status */
     ch = getkey();  /* look for interactive keyboard commands */
     if (ch >= '0' && ch <= '9') {
        nDisplayCore = ch - '0';
        DSSetCoreList(hCard, (QWORD)1 << nDisplayCore);
     else if (ch == 'Q' || ch == 'q' || ch == ESC) goto cleanup;
     if (IsTimerEventReady()) {
     /* check to see if next data buffer is available */
        if ((new_targetbufnum[nDisplayCore] = DSGetProperty(hCard, DSPROP_BUFNUM)) != targetbufnum[nDisplayCore]) {
           targetbufnum[nDisplayCore] = new_targetbufnum[nDisplayCore];  /* update local copy of target buffer number */
           printf("Got data for core %d... count[%d] = %d\n", nDisplayCore, nDisplayCore, count[nDisplayCore]++);
           if (dwCode_count_addr != 0) {
              DSReadMem(hCard, DS_RM_LINEAR_PROGRAM | DS_RM_MASTERMODE, dwCode_count_addr, DS_GM_SIZE32, &timer_count, 1);
              printf("Timer count value = %d\n", timer_count);
        /* read data from target CPUs, display */
           if (DSReadMem(hCard, DS_RM_LINEAR_DATA | DS_RM_MASTERMODE, dwBufferBaseAddr + nBufLen * 4 * hostbufnum[nDisplayCore], DS_GM_SIZE32, (DWORD*)&data, sizeof(data)/sizeof(DWORD))) {
              hostbufnum[nDisplayCore] ^= 1;  /* toggle buffer number, let host know */
              DSSetProperty(hCard, DSPROP_HOSTBUFNUM, hostbufnum[nDisplayCore]);
              for (int i=0; i<120; i+=12) {
                 for (int j=0; j<12; j++) printf("0x%08x ", data[i+j]); printf("\n");


  if (fCoresLoaded) SavecoCPULog(hCard);
  printf("Program and hardware cleanup, hCard = %d\n", hCard);

/* Hardware cleanup */

  if (hCard) cimCloseHardware(hCard, CIM_CH_DEBUGPRINT, nCoreList, NULL);


/* Local functions */

int GetDisplayCore(QWORD nCoreList) {

int nDisplayCore = 0;

  do {
     if (nCoreList & 1) break;
  } while (nCoreList >>= 1);
  return nDisplayCore;

} </syntaxhighlight>

Mailbox Create Examples[edit]

Send and receive mailbox creation API examples are shown below. Send means host CPU cores are sending mail to target CPU cores, and receive means host CPU cores are receiving messages from target CPU cores.

<syntaxhighlight lang='c'> /* Allocate send mailbox handle (send = transmit, or tx) */

  if (tx_mailbox_handle[node] == NULL) {
     tx_mailbox_handle[node] = malloc(sizeof(mailBoxInst_t));
     if (tx_mailbox_handle[node] == NULL) {
        printf("Failed to allocate Tx mailbox memory for node = %d\n", node);
        return -1;

/* Create send mailbox */

  mailBox_config.mem_start_addr = host2dspmailbox + (nCore * TRANS_PER_MAILBOX_MEM_SIZE);
  mailBox_config.mem_size = TRANS_PER_MAILBOX_MEM_SIZE;
  mailBox_config.max_payload_size = TRANS_MAILBOX_MAX_PAYLOAD_SIZE;
  if (DSMailBoxCreate(hCard, tx_mailbox_handle[node], MAILBOX_MEMORY_LOCATION_REMOTE, MAILBOX_DIRECTION_SEND, &mailBox_config, (QWORD)1 << nCore) != 0) {
     printf("Tx DSMailboxCreate() failed for node: %d\n", node);
     return -1;

/* Allocate receive mailbox handle (receive = rx) */

  if (rx_mailbox_handle[node] == NULL) {
     rx_mailbox_handle[node] = malloc(sizeof(mailBoxInst_t));
     if (rx_mailbox_handle[node] == NULL) {
        printf("Failed to allocate Tx mailbox memory for node = %d\n", node);
        return -1;

/* Create receive mailbox */

  mailBox_config.mem_start_addr = dsp2hostmailbox + (nCore * TRANS_PER_MAILBOX_MEM_SIZE);
  if (DSMailboxCreate(hCard, rx_mailbox_handle[node], MAILBOX_MEMORY_LOCATION_REMOTE, MAILBOX_DIRECTION_RECEIVE, &mailBox_config, (QWORD)1 << nCore) != 0) {
     printf("Rx DSMailboxCreate() failed for node: %d\n", node);
     return -1;


Mailbox Query and Read Examples[edit]

Source code excerpts with mailbox query and read API examples are shown below. Some code has been removed for clarity. These examples are processing "session" actions, for example a media transcoding application. Other application examples would include streams (video), nodes (analytics), etc.

<syntaxhighlight lang='c'> /* query and read mailboxes on all active cores */

  nCore = 0;
  do {
     if (nCoreList & ((QWORD)1 << nCore)) {
        ret_val = DSQueryMailbox(hCard, (QWORD)1 << nCore);
        if (ret_val < 0) {
           fprintf(mailbox_out, "mailBox_query error: %d\n", ret_val);
        while(ret_val-- > 0) {
           ret_val = DSReadMailbox(hCard, rx_buffer, &size, &trans_id, (QWORD)1 << nCore);
           if (ret_val < 0) {
              fprintf(mailbox_out, "mailBox_read error: %d\n", ret_val);
           memcpy(&header_in, rx_buffer, sizeof(struct cmd_hdr));
           if (header_in.type == DS_CMD_CREATE_SESSION_ACK) {
           else if (header_in.type == DS_CMD_DELETE_SESSION_ACK) {
           else if (header_in.type == DS_CMD_EVENT_INDICATION) {
  } while (nCoreList >> 1);


CardParams Struct[edit]

The CardParams struct shown in the above "minimum" source code example is given here.

<syntaxhighlight lang='c'> typedef struct {

/* from command line */

 char          szCardDesignator[CMDOPT_MAX_INPUT_LEN];
 char          szTargetExecutableFile[CMDOPT_MAX_INPUT_LEN];
 unsigned int  nClockRate;
 QWORD         nCoreList;

/* derived from command line entries */

 char          szCardDescription[CMDOPT_MAX_INPUT_LEN];
 unsigned int  maxCoresPerCPU;
 unsigned int  maxCPUsPerCard;
 unsigned int  maxActiveCoresPerCard;
 unsigned int  numActiveCPUs;   /* total number of currently active CPUs (note: not max CPUs, but CPUs currently in use) */
 unsigned int  numActiveCores;  /* total number of currently active cores (note: not max cores, but cores currently in use) */
 bool          enableNetIO;     /* set if command line params indicate that network I/O is needed.  Various application-specific params are checked */
 WORD          wCardClass;
 unsigned int  uTestMode;
 bool          enableTalker;    /* not used for coCPU hardware */

} CARDPARAMS; /* common target CPU and card params */

typedef CARDPARAMS* PCARDPARAMS; </syntaxhighlight>

Installing / Configuring VMs[edit]

Below is a screen capture showing VM configuration for coCPU cards, using the Ubuntu Virtual Machine Manager (VMM) user interface:

VMM dialog showing VM configuration for coCPU cards

coCPU core allocation is transparent to the number of coCPU cards installed in the system; just like installing memory DIMMs of different sizes, coCPU cards can be mixed and matched.

E2e.jpg {{
  1. switchcategory:MultiCore=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article DirectCore here.

  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article DirectCore here.

C2000=For technical support on the C2000 please post your questions on The C2000 Forum. Please post only comments about the article DirectCore here. DaVinci=For technical support on DaVincoplease post your questions on The DaVinci Forum. Please post only comments about the article DirectCore here. MSP430=For technical support on MSP430 please post your questions on The MSP430 Forum. Please post only comments about the article DirectCore here. OMAP35x=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article DirectCore here. OMAPL1=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article DirectCore here. MAVRK=For technical support on MAVRK please post your questions on The MAVRK Toolbox Forum. Please post only comments about the article DirectCore here. For technical support please post your questions at Please post only comments about the article DirectCore here.


Hyperlink blue.png Links

Amplifiers & Linear
Broadband RF/IF & Digital Radio
Clocks & Timers
Data Converters

Power Management


Switches & Multiplexers
Temperature Sensors & Control ICs
Wireless Connectivity