Control Law Accelerator (C2000 CLA) FAQ

From Texas Instruments Wiki
Jump to: navigation, search

Contents

Introduction

This is an frequently asked question list for the C2000 Control Law Accelerator; there are currently multiple CLA types found on the following MCU devices (with the type listed alongside):

  1. 2803x - Type 0
  2. 2805x - Type 0
  3. 2806x - Type 0
  4. 2837xD/S - Type 1

Other Resources

Forum Discussion

Documentation and Webpages

Training Videos

Frequently Asked Questions

Workshop Material

Section 9 of the Piccolo Multi-day workshop is dedicated to the Control Law Accelerator

Software

controlSUITE
The latest software for the CLA is included as part of controlSUITE (www.ti.com/controlSUITE). This includes:
  • DPLib CLA release
  • Example system in the HVPFCKit.
  • The CLAmath Macro Library:
sin, cos, div, sqrt, 1/sqrt, atan, atan2
  • 2803x, 2805x, 2806x and 2837xD/S C/C++ Header Files and Peripheral Examples (device support)
ADC -> FIR example, both saram and flash based
CLA Compiler examples
Legacy Downloads
The following legacy downloads include projects ready for use on CCS V3.3

Architecture, Configuration

Q: What is the CLA?

The CLA is a 32-bit floating-point math accelerator that runs in parallel with the main CPU. There are currently 2 types:
  1. Type 0
  2. Type 1

Q: Is the CLA independent from the main CPU?

Yes. Once the CLA is configured by the main CPU it can execute algorithms independently of the main CPU. The CLA has its own bus structure, register set, pipeline and processing unit. In addition the CLA can access a host of peripheral registers directly. This makes it ideal for handling time-critical control loops but it can also be used for filtering or math algorithms. Please refer to your device specific documentation for the list of CLA accessible peripherals.

Q: Is the CLA interrupt driven?

Yes, the CLA can be configured to respond to various peripheral interrupts. Other devices may respond to other system interrupts. Please refer to your device specific documentation for information. Also see Tasks and Interrupts.

Q: How fast is the CLA interrupt response?

The CLA does not support nesting of interrupts. In addition, the CLA receives interrupts directly, not through the peripheral interrupt expansion block (PIE). Because of this, the CLA has a very low interrupt response delay. At the 7th cycle after an interrupt the first instruction will be in the decode 2 (D2) phase of the pipeline, i.e. from the time an interrupt (trigger) is received it takes 4 cycles for the CLA to begin fetching the first instruction another 3 cycles for that instruction to move through the pipeline to the D2 (Decode 2) phase. The CLA has the ability to read the ADC result register in the same cycle that the ADC completes a sample conversion - the number of cycles for a complete sample/conversion is dependent on the ADC type, please refer to your device specific documentation for more information. Also see Tasks and Interrupts and Accessing Peripherals.

Q: Does the CLA have registers?

Yes, the CLA has its own independent set of registers. The CLA registers can be thought of in two groups:
Configuration Registers
Some of these registers are used by the main C28x CPU to configure the CLA. Other registers give the main CPU status information. For example, which interrupts have been flagged or which task is currently running.
Execution Registers
These include four floating-point result registers, two auxillary registers, a status register and a program counter. These registers can be read by the main C28x CPU but not written to.

Q: Does the CLA have an Accumulator?

There is no single register designated as an accumulator - results of operations go into the result registers (MR0-MR3).

Q: What frequency does the CLA run at?

The CLA on 2803x, 2806x, 2805x and 2837xD/S devices runs at the same speed as the CPU (SYSCLKOUT). Other devices may differ. Please refer to your device specific documentation for information.

Q: At reset what state is the CLA in?

The clock to the CLA is disabled and all the CLA registers are cleared. CLA will not start servicing interrupts until after it is configured to do such by the main CPU.

Q: How is the CLA configured?

The CLA is configured by the main CPU just as any other module or peripheral.

Development Tools, Debugging, etc..

Q: I would like to learn what code development tools are available and how I can debug code for the CLA.

Please refer to the Control Law Accelerator (C2000 CLA) Debug FAQ

Tasks and Interrupts

Q: What is a 'Task'?

A CLA task is an interrupt response routine executed by the CLA.

Q: How many interrupts are supported?

Both Type-0 and Type-1 CLA supports 8 interrupts.

Q: Which interrupts can start a task?

Peripherals: Each task has specific peripheral interrupts that can trigger it. There are generally two ways to configure the triggers for a task. One is through the MPISRCSEL1 register (2803x, 2805x, 2806x), another through a system register CLA1TASKSRCSELx (2837xD/S)
One thing that is very important to understand is the trigger source is just the mechanism by which the task is started. The trigger source does not limit what the task can do. For example, task 1 can read any/multiple ADC RESULT register(s) and modify any ePWM1, ePWM2, ePWM3...ePWM7 register even through it was started by EPWM1_INT.
Below the triggers available on 2803x and 2806x are shown. Other devices may differ. Please refer to your device specific documentation for information.
On 2803x the interrupt triggers are assigned as follows:
  • Interrupt 1 = Task 1 = ADCINT1 or EPWM1_INT or software only
  • Interrupt 2 = Task 2 = ADCINT2 or EPWM2_INT or software only
  • Interrupt 3 = Task 3 = ADCINT3 or EPWM3_INT or software only
  • Interrupt 4 = Task 4 = ADCINT4 or EPWM4_INT or software only
  • Interrupt 5 = Task 5 = ADCINT5 or EPWM5_INT or software only
  • Interrupt 6 = Task 6 = ADCINT6 or EPWM6_INT or software only
  • Interrupt 7 = Task 7 = ADCINT7 or EPWM7_INT or software only
  • Interrupt 8 = Task 8 = ADCINT8 or CPU Timer 0 or software only
On 2806x the interrupt triggers are assigned as follows:
  • Interrupt 1 = Task 1 = ADCINT1 or EPWM1_INT or software only
  • Interrupt 2 = Task 2 = ADCINT2 or EPWM2_INT or software only
  • Interrupt 3 = Task 3 = ADCINT3 or EPWM3_INT or software only
  • Interrupt 4 = Task 4 = ADCINT4 or EPWM4_INT or eQEP1/2 or ECAP1/2/3 or software only
  • Interrupt 5 = Task 5 = ADCINT5 or EPWM5_INT or eQEP1/2 or ECAP1/2/3 or software only
  • Interrupt 6 = Task 6 = ADCINT6 or EPWM6_INT or eQEP1/2 or ECAP1/2/3 or software only
  • Interrupt 7 = Task 7 = ADCINT7 or EPWM7_INT or eQEP1/2 or ECAP1/2/3 or software only
  • Interrupt 8 = Task 8 = ADCINT8 or CPU Timer or eQEP1/2 or ECAP1/2/3 or software only

Q: Can the main CPU start a task through software?

Yes! The main CPU can flag an interrupt at any time by using the IACK #16bit instruction. For example IACK 0x0003 would flag interrupt 1 and interrupt 2. This is the same as setting bits in the force register (MIFRC).

Q: I'm trying to force tasks using the IACK instruction, but it isn't working. What could be wrong?

  • Make sure you've enabled this feature in the MICTL register.
  • Make sure the interrupt is enabled in the MIER register.
  • Make sure you are using the correct argument to IACK. For example IACK #0x0003 would flag interrupt 1 (bit 0) and interrupt 2 (bit 1).

Q: If two interrupts come in at the same time, which one is executed first?

The highest priority task that is both flagged (MIFR register) and enabled (MIER register) is the one to get executed. Interrupt 1/Task 1 has the highest priority and Interrupt 8/Task 8 the lowest priority.

Q: Can you nest CLA interrupts?

No. A CLA task is executed until it is complete. Once a task is complete then the highest interrupt that is both flagged and enabled will automatically begin.

Q: Can the CLA interrupt the main CPU?

The CLA will send an interrupt to the PIE (peripheral interrupt expansion block) to let the main CPU know a task has completed. Each task has an associated vector in the PIE. This interrupt is automatically fired when the associated task completes. For example when task 1 completes, CLA1_INT1 in the PIE will be flagged.
There are also dedicated interrupts in the PIE for floating-point overflow and underflow conditions.
The Type 1 CLA has a Software Interrupt Capability, where a task can enable and force an interrupt to the main CPU. For example Task 1 can enable a software interrupt for task 2, by writing to the TASK2 bit of the CLA1SOFTINTEN register, and then forcing that interrupt by writing to the TASK2 bit of the CLA1SOFTINTFRC register. By enabling the software interrupt for any task, task 2 in the example, you are disabling its end-of-task interrupt capability.

Q: Can the main CPU terminate a task?

Yes. If a interrupt has been flagged, but the task has not yet run, the main CPU can clear the flag using the MICLR register.
If the task is already running then a soft reset (in MCTL) will terminate the task and clear the MIER register. If you want to clear all of the CLA registers you can use the hard reset option in the MCTL register.

Q: What is the starting address for each task? Is the starting address fixed?

The start address is configurable. Each task has an associated interrupt vector (MVECT1 to MVECT8). For type 0 CLAs, this vector holds the starting address (as an offset from the first program location) of the task. For all the other types, this vector holds the entire 16-bit address of the task.

Q: Is there a size limit for a task?

No limit. For type 0 CLAs, the program space is limited to 12-bits or 4096 words. All CLA instructions are 32-bits, so within a 4k x 16 program space you can have ~2k CLA instructions.
For type 1 (and above) CLAs, the program space is 16-bits wide, leaving the lower 64K words of space available as program space. Please refer to your device specific documentation for information on how to configure/allocate the memory spaces to the CLA.

Q: How do I indicate the end of a task?

After a task begins, the CLA will execute instructions until it encounters an "MSTOP" instruction. MSTOP indicates the end of the task.

Q: Can the CLA itself flag another task?

The CLA can not write to its own configuration registers so it can not start a task by writing to the force register. It can, however, write to the ePWM registers so technically it could force an interrupt from one of the ePWM modules.
The main CPU can take an interrupt when the task is complete. You could, within this interrupt, start another task using the IACK instruction.

Q: If the CLA is configured to respond to ACDINT1, can the CPU also respond?

Yes. The interrupts are sent to both the CLA and the PIE so either or both can respond.

Accessing Peripherals

Q: Which peripherals can the CLA directly access?

Below shows the peripherals the CLA can access on the 2803x and 2805x. Additional peripherals are available on some devices. Please refer to your device specific documentation for information.
2803x
The CLA has direct access to the ADC result, ePWM+HRPWM, and comparator registers.
2806x
The CLA has direct access to the ADC result, ePWM+HRPWM, eCAP, eQEP and comparator registers.
2807x
The CLA had direct access to the the ADC module(s)(including results), ePWM+HRPWM, eCAP, eQEP, comparator subsytem, DAC subsystem, SPI, McBSP, uPP, EMIF, GPIO
2837x
The CLA(s) had direct access to the the ADC module(s)(including results), ePWM+HRPWM, eCAP, eQEP, comparator subsytem, DAC subsystem, SPI, McBSP, uPP, EMIF(s), GPIO



Q: Some of those registers you mentioned are EALLOW protected from rogue writes by the main CPU. Does the CLA have this protection as well?

There is a bit called MEALLOW in the CLA status register that enables/disables the protection for CLA writes. This is set and cleared by the MEALLOW/MEDIS CLA instructions. This protection is independent of the main CPU's EALLOW bit. That is, the main CPU can enable writes via EALLOW, but the register will still be protected from CLA writes via MEALLOW.

Q: How can the CLA read the ADC result register "Just-in-time"?

The ADC on 2803x can be configured to assert an interrupt after the sample window time. If the CLA is configured to respond to the ADC interrupt, then the task will begin while the conversion is in progress. The 8th instruction in the task will be just in time to read the ADC result register when it updates.
The ADC on the 2837xD/S can be also be configured to assert an interrupt after the sample window time. If the CLA is configured to respond to the ADC interrupt, then the task will begin while the conversion is in progress. The conversion time, however, is not fixed and dependent on two things
  1. The ADCCLK
  2. The ADC mode i.e. 12-bit or 16-bit
For example, assume the ADC is set to run at a quarter of the system clock in 12-bit mode; the ADC is set to sample (acquire) for 15 SYSCLK cycles or 75ns. After the capacitor has sampled the analog value, the ADC will trigger the CLA task early. In 12-bit mode, the ADC will take 29.5 ADCCLKs to complete a conversion
T_sys = 1/200MHz = 5ns
T_adc = 4*T_sys = 20ns
The ADC will take 29.5 * 4 or 118 SYSCLK cycles to complete a conversion
+==============================================================================+
| ADC activity     | CLA Activity      | F1 | F2 | D1 | D2 | R1 | R2 | E  | W  |
+==============================================================================+
| Sample           |                   |    |    |    |    |    |    |    |    |
| Sample           |                   |    |    |    |    |    |    |    |    |
| ...              |                   |    |    |    |    |    |    |    |    |
| Sample           |                   |    |    |    |    |    |    |    |    |
| Conversion (1)   |Interrupt Received |    |    |    |    |    |    |    |    |
| Conversion (2)   |Task Startup       |    |    |    |    |    |    |    |    |
| Conversion (3)   |Task Startup       |    |    |    |    |    |    |    |    |
| Conversion (4)   |I1                 |I1  |    |    |    |    |    |    |    |
| Conversion (5)   |I2                 |I2  |I1  |    |    |    |    |    |    |
| ...              |                   |    |I2  |I1  |    |    |    |    |    |
| ...              |                   |    |    |I2  |I1  |    |    |    |    |
| ...              |                   |    |    |    |I2  |I1  |    |    |    |
| ...              |                   |    |    |    |    |I2  |I1  |    |    |
| ...              |                   |    |    |    |    |    |I2  |I1  |    |
| ...              |                   |    |    |    |    |    |    |I2  |I1  |
| ...              |                   |    |    |    |    |    |    |    |I2  |
| ...              |                   |    |    |    |    |    |    |    |    |
| ...              |                   |    |    |    |    |    |    |    |    |
| ...              |                   |    |    |    |    |    |    |    |    |
| Conversion (114) |I110               |I110|    |    |    |    |    |    |    |
| Conversion (115) |I111 Read ADCRESULT|I111|I110|    |    |    |    |    |    |
| Conversion (116) |                   |    |I111|I110|    |    |    |    |    |
| Conversion (117) |                   |    |    |I111|I110|    |    |    |    |
| Conversion (118) |                   |    |    |    |I111|I110|    |    |    |
| RESULT Latched   |                   |    |    |    |    |I111|I110|    |    |
| RESULT Available |                   |    |    |    |    |    |I111|    |    |
+==============================================================================+
So the CLA Task will have to wait 118 cycles before it is able to access the ADC result register; this means it has 118 cycles within which to perform setup or any other pre-calculations.

Q: If the CLA takes an ADC interrupt, can it then clear the ADC's interrupt flag?

No. The CLA can not access the ADC configuration registers so it can not clear the ADC interrupt flag. Here are three options for handling this:
Option 1
Place the ADC in continuous mode. In this mode the next conversion will start, when triggered, even if the flag is still set.
Option 2
Service the ADC interrupt with the main CPU as well as the CLA and have the main CPU clear the flag.
Option 3
Have the main CPU service the end of task interrupt from the CLA and clear the flag.

Q: You mentioned the CLA can access the ePWM registers. Is it all of the ePWM modules on the device?

Any task can access any of the ePWM modules. There are no restrictions on this.

Q: How can I interface to GPIO control from the CLA?

On most devices, the CLA does not have access to the GPIO control registers. GPIO control is typically handled by the main CPU. One thing that you could do is toggle a ePWM pin by accessing the ePWM registers with the CLA (see the ePWM's AQCSFRC register). If instead you want to toggle a GPIO pin at the end of the task, this could be done by the main CPU when it services the CLA interrupt.
On the 2837xD/S, the CLA can directly access the GPIO control and data registers -in fact, it takes ownership of the peripheral bus. For a given CPU subsystem, the CLA and DMA share secondary access to some peripherals, the primary owner being the subsystem CPU itself. The secondary ownership of the bus is determined by the CpuSysRegs.SECMSEL[VBUS32_x] bit. If it is set to 0, the CLA is the secondary owner. If it is set to 1, the DMA is the secondary owner. By default, at reset, the CLA is given the secondary ownership of the bus and, therefore, can access all the peripherals connected to it.

Q: If the CLA is using the ePWM or ADC Result registers, does it mean the main CPU can not?

No. Both the CLA and the main CPU can access the registers. The arbitration scheme for these registers can be found in the CLA Reference Guide. Keep in mind if the main CPU performs a read-modify-write operation on a register and between the read and the write the CLA modifies the same register, the changes made by the CLA can be lost. In general it is best to not have both processors writing to registers.

Q: I want CLA Task 1 to access ePWM1 and ePWM2 registers, but Task 1 can only be triggered by EPWM1_INT.

The interrupt is only the method by which a task is started. It does not limit the resources the CLA can access during that task. Any CLA task can access all of the ADC RESULT, compare and ePWM registers. For example, assume ADCINT1 triggers task 1. This single task could then read ADC RESULT0, perform a control algo, read ADC RESULT1, perform another control algo, etc.

Memory Access

Q: Does the CLA have access to all of the memory on the device?

There are specific blocks on the device that the CLA can use.
CLA Program Memory
On Piccolo (2803x and 2806x) this is a 4k x 16 block, single cycle (no wait states). This means it can hold 2048 CLA instructions (all CLA instructions are 32-bits). At reset, this block is mapped to the main CPU memory space and treated by the CPU like any other memory block. While mapped to CPU space, the main CPU can initialize the memory with the CLA program code. Once the memory is initialized the CPU maps it to the CLA program space.
On Delfino (2837xD/S) it can access the lower 64K word space, single cycle (no wait states). This means it can hold up to 32K CLA instructions (all CLA instructions are 32-bits). At reset, this space is mapped to the main CPU memory space and treated by the CPU like any other memory block. While mapped to CPU space, the main CPU can initialize the memory with the CLA program code. Once the memory is initialized the CPU maps it to the CLA program space.
CLA Data Memory 
On Piccolo (2803x and 2806x) each of these blocks is 1k x 16, single-cycle. At reset, the blocks are mapped to the main CPU memory space and treated by the CPU like any other memory block. While mapped to CPU space, the main CPU can initialize the memory with data tables and coefficients for the CLA to use. Once the memory is initialized with CLA data the main CPU maps it to the CLA space. (Each block can be independently mapped).
On Delfino (2837xD) the lower 64K word space can be used. At reset, the space is mapped to the main CPU memory space and treated by the CPU like any other memory block. While mapped to CPU space, the main CPU can initialize the memory with data tables and coefficients for the CLA to use. Once the memory is initialized with CLA data the main CPU maps it to the CLA space. (Each RAM block can be independently mapped).
2803x Data RAM
There are two CLA data memory blocks on the device.
Each data RAM belongs either to the CPU or the CLA. If it is mapped to the CLA, then the CPU will read 0x0000 if it attempts to access the block. Likewise, if the block is mapped to the CPU, the CLA will read all 0x0000 if it tries to read from the block.
2806x Data RAM Access
There are three CLA data memory blocks on the device.
Access is the same as on 2803x except a second configuration bit has been added to allow the CPU to read/write to the memory block even when it is mapped to the CLA.
2837xD Data RAM Access
Each physical RAMLSx block (in lower 64KW space) can be configured as either CLA program or data space. The configuration is done within the memory configuration module and not the CLA config module.
Shared message RAMs 
There are two small memory blocks for data sharing and communication between the CLA and the main CPU. On Piccolo these are 128 x 16 words in size.
CLA to CPU Message RAM
CLA can read/write, main CPU can only read
CPU to CLA Message RAM
CPU can read/write, CLA can only read

Q: Can both the CLA and the CPU access the same message RAM at the same time?

The accesses will be arbitrated as described below.
  • CLA to CPU Message RAM: Priority of accesses are (highest priority first):
  1. CLA write
  2. CPU debug write
  3. CPU data read, program read, CPU debug read
  4. CLA data read
  • CPU to CLA Message RAM: Priority of accesses are (highest priority first):
  1. CLA read
  2. CPU data write, program write, CPU debug write
  3. CPU data read, CPU debug read
  4. CPU program read

Q: If I understand the data manual, CLA code can only be run from the CLA program RAM (RAML3 on most devices, RAMLSx on 2837xD/S). It can be loaded from flash and run in program RAM.

Yes. CLA can only execute from memory designated as CLA program memory. On 2803x this is L3 while on the 2837xD/S any of the RAMLSx can be configured to serve as CLA program space. During debug you can load the ram directly using the debugger. In a stand-alone system the main CPU needs to initialize the RAM with the CLA code. Most likely the code will be stored in flash and copied to CLA program space. For an example, refer to the CLA FIR in flash project in The FIR example in controlSUITE shows how this is done.
  • For 2806x (C:\ti\controlSUITE\device_support\f2806x)
  • For 2803x (C:\ti\controlSUITE\device_support\f2803x)
  • For 2805x (C:\ti\controlSUITE\device_support\f2805x)
  • For 2837xD (C:\ti\controlSUITE\device_support\F2837xD)

Q: Can I use a block other than L3 for CLA code?

On the 2803x and 2806x devices, only L3 can be used by the CLA for program memory.
On the 2837xD/S, any of the RAMLSx blocks (in lower 64K memory) can be used as CLA program memory.
For your specific device check the memory map in the data manual.

Q: L3 memory block is designated as CLA Program. Can I allocate a portion for the CLA and the rest for the CPU?

2803x: No - this memory block either belongs to the main CPU (default at reset) or the CLA. That is, either one or the other can access it. If the block is assigned to the CLA and the main CPU tries to fetch or read data, it will receive a 0x0000.
2805x and 2806x: Yes - Using the MMEMCFG[RAMxCPUE] bit, you can grant the CPU access to read from and write to the data memory. After changing the configuration always wait 2 SYSCLKOUT cycles before accessing the memory.
2837xD: Yes - You can configure any of the RAMLSx blocks as CLA program space by
  1. writing a 1 to the memory block’s MemCfgRegs.LSxMSEL[MSEL_LSx] bit
  2. and then specifying the memory block as CLA code memory by writing a 1 to the MemCfgRegs.LSxCLAPGM[CLAPGM_LSx] bit.
When a block is configured as CLA program space, the CLA can fetch from it but the CPU only has emulation read/write access. When configured as CLA data space, it is shared between the CPU and CLA i.e the CPU has read and write permissions, subject to arbitration, to the memory block

Q: Can the message RAM be used as data memory for the CLA?

The CLA to CPU message RAM (128x16) can be read/written to by the CLA and can be used as a data RAM for the CLA.

Q: How can I initialize variables in the CLA to CPU message RAM?

Since the main CPU can not write to this memory, the CLA will need to initialize these variables. To do this, set up a task to perform the initialization and then trigger the task using the main CPU. The FIR example in controlSUITE for the 2806x (C:\ti\controlSUITE\device_support\f2806x)and 2803x (C:\ti\controlSUITE\device_support\f2803x) shows how this is done.

Q: In my application the CLA does not need data memory. Can this memory be used by the main CPU instead?

Yes - if the memory is not needed by the CLA then it can be used by the main CPU just like any other block. Also the two data memory blocks can be independently assigned to the CLA or main CPU so you can have one block belong to the main CPU and the other to the CLA.

Q: In my application the CLA does not all the data RAMs. Can one block be assigned to the CLA and the other block(s) to the main CPU?

Yes. The data RAMs are mapped to the CPU or the CLA individually.

Q: I want to do a ping-pong scheme where the CLA uses a data RAM and then the main CPU uses it. Can this be done?

2803x
Yes - There a few things you need to make sure of before changing the mapping of the data RAM:
  • After changing the mapping (via MMEMCFG) always wait 2 SYSCLKOUT cycles before accessing the memory.
  • Before changing the memory from CPU to CLA, make sure the CPU is not performing any accesses to the memory.
  • Before changing the memory from CLA to CPU, make sure the CLA is not performing any accesses. One way to check for this is to clear the MIER register, wait a couple of cycles and then check that MIRUN is cleared.
2806x
Yes - There are two options:
Option 1
Using the MMEMCFG[RAMxCPUE] bit, you can grant the CPU access to read from and write to the data memory. After changing the configuration always wait 2 SYSCLKOUT cycles before accessing the memory.
Option 2
Follow the same procedure as for the 2803x and re-map the memory to the 28x..
2837xD
any RAMLSx block, when configured as CLA data space, is automatically shared between the CPU and CLA.

Q: Is CLA memory protected by the Code Security Module (CSM)?

Yes - on 2803x devices, all CLA memory is protected by the CSM. The CLA configuration and result registers are also protected. On later device, CLA memory can also be protected by the DCSM module. Please refer to the device TRM for more information on security.

Communication Between the CLA and main CPU

Q: How do the main CPU and the CLA communicate with eachother?

Communication is handled through the message RAMs and interrupts.
  • The CLA can pass data to the main CPU through the CLA to CPU message RAM
  • The main CPU can pass data to the CLA through the CPU to CLA message RAM
  • The main CPU can flag a CLA interrupt/task in software if desired by using the IACK instruction.
  • The CLA can alert the main CPU that a task has completed through an interrupt to the PIE. There is one interrupt vector per task in the PIE. The main CPU does not have to service interrupts from the CLA if it is not required by the application.

Q: In my code, how can I share variables between the CLA and the main CPU?

Since the CLA and the C28x code are in the same project this is very easy. My suggestion is to:
  • Create a shared header file with common constants and variables. Include this file in both the C28x C and CLA .asm code.
  • Use data section pragma statements and the linker file to place the variables in the appropriate message RAM.
  • Define shared variables in your C code.
  • Initialize variables in the CPU to CLA message RAM with the main CPU.
  • Initialize variables in the CLA to CPU message RAM with a CLA task. This initialization task can be started via the main C28x software.
This method is described in this short video Demo of the framework used in the header files and peripheral examples

Q: Can I use CLA data RAM as a message RAM?

Yes, but remember that at any given time each of the the CLA data RAMs either belongs to the main CPU or to the CLA. Therefore for the other processor to see the data you will have to first make sure no accesses are taking place to the RAM and then change the mapping.

Instruction Set, Code Execution

Q: How fast are instructions executed by the CLA?

The CLA is clocked at the same rate as the main CPU (SYSCLKOUT) which is max 60 MHz on 2803x and 80 MHz on 2806x. All CLA instructions are single cycle. While a discontinuity (branch/call/return) instruction itself is single cycle, the time for a discontinuity to complete depends on how many "delay slots" around the instruction are used. If no delay slots are used, then the discontinuity completes in 7 cycles (worst case) whether taken or not. A typical time would be 4 cycles taken/not taken. In this case slots before the instruction are used, but slots after are not.

Q: In the examples I noticed there are 3 MNOP's after each MSTOP. Why is this done? Is it required?

There is a restriction that an MSTOP not endup within 3 instructions of a branch. The MNOPS have been added to make sure this requirement is always met even when the program RAM following a task is not initialized. If you know for sure there is not a branch within 3 instructions after the MSTOP, then you can remove the MNOPS.

Q: Does the CLA use floating-point math?

Yes the CLA supports 32-bit (single-precision) floating-point math operations. These follow the IEEE 754 standard.

Q: Why float and not fixed-point?

Floating-point is easier to code. It is self saturating and thus removes the saturation and scaling burden from code. In addition it does not suffer from overflow/underflow sign inversion issues. Finally algorithms coded in float are more cycle efficient.

Q: Since the main CPU is fixed-point and the CLA is float, won't I have to convert my numbers?

Yes, the CLA makes this easy with instructions that convert data. If you are reading from memory this conversion can be done while the value is read so it is quite efficient. For example, you can read an ADC result register (Unsigned 16-bit) and convert it to 32-bit float as you read it.

Q: Does the CLA support the repeat block (RPTB) instruction that is on the C28x+FPU?

No, but you can use a loop (branch or call) to execute a block of instructions multiple times. There are also no single repeat instructions for the CLA. (RPT ||...)

Q: Does the CLA support branches?

Yes, the CLA has its own branch (MBCNDD), call (MCCNDD) and return (MRCNDD). All of these are conditional and delayed.

Q: Are there any sub-hardware modules inside CLA where each component corresponds to some algorithm so that the user only need to set registers or is CLA just purely programmable in floating-point?

There are no algorithms built into the CLA in hardware. The CLA is completely programmable.

Q: What types of instructions does the CLA support?

For a full list of instructions, please refer to the reference guide.

The following table shows an overview of the types of instructions the CLA supports.

Type Example(s) Cycles Type Example(s) Cycles
Load (conditional) MMOV32 MRa,mem32{,CONDF} 1 Store MMOV32 mem32,MRa 1
Load With Data Move MMOVD32 MRa,mem32 1 Store/Load MSTF MMOV32 MSTF,mem32 1
Floating Point
Compare, Min, Max
Absolute, Negative Value
MCMPF32 MRa,MRb
MABSF32 MRa,MRb
1 Conversion
Unsigned Integer To Float
Integer To Float
Float To Integer
etc
MUI16TOF32 MRa,mem16
MI32TOF32 MRa,mem32
MF32TOI32 MRa,MRb
1
Floating Point Math
Add, Sub, Multiply
1/X Estimate
1/sqrt(x) Estimate

MMPYF32 MRa,MRb,MRc
MEINVF32 MRa,MRb
MEISQRTF32 MRa,MRb
1 Integer Load/Store MMOV16 MRa,mem16 1
Load/Store Auxiliary Register MMOV16 MAR,mem16 1 Branch/Call/Return MBCNDD 16bitdest {,CNDF} 1-7
Integer Operations
AND, OR, XOR
Add Sub
Shift

MAND32 MRa,MRb,MRc
MSUB32 MRa,MRb,MRc
MLSR32 MRa,#SHIFT
1 Stop Instructions
End of Task
Breakpoint
MSTOP
MDEBUGSTOP
1
Parallel Instructions
Multiply with parallel add/sub
Math with parallel load/store
MMPYF32 MRa,MRb,MRc
|| MSUBF32 MRd,MRe,MRf
1/1 No Operation MNOP 1

CLA 'C' Compiler

Q: Are signed integers (16-bit) data types supported?

The data type, signed integer, is supported (as of CGT 6.4.1) but not all operations on it are, due to the fact that the CLA does not have the necessary assembly instructions to do efficient operations with that data type. For e.g. integer modulus is not supported. Please see the compiler guide for the full list of restrictions

Q: How are pointers handled in the 'C' compiler?

Pointers are interpreted differently on the C28x and the CLA. The C28x treats them as 32-bit data types (address bus size being 22-bits wide can only fit into a 32-bit data type) while the CLA only has an address bus size of 16 bits. Assume the following structure is declared in a shared header file(i.e. common to the C28 and CLA) and defined and allocated to a memory section in a .c file
/********************************************************************
Shared Header File
********************************************************************/
typedef struct{                                                      
  float a;                                                           
  float *b;                                                          
  float *c;                                                          
}foo;                                                                
/********************************************************************
main.c
********************************************************************/
#pragma(X,"CpuToCla1MsgRam") //Assign X to section CpuToCla1MsgRam   
foo X;                                                               
/********************************************************************
test.cla
********************************************************************/
__interrupt void Cla1Task1 ( void )                                  
{                                                                    
  float f1,f2;                                                       
  f1 = *(X.b);                                                       
  f2 = *(X.c); //Pointer incorrectly dereferenced                    
               //Tries to access location 0x1503 instead             
               //of 0x1504                                           
}

Assume that the C28 compiler will allocate space for X at the top of the section CpuToCla1MsgRam as follows:

Element Address
X.a 0x1500
X.b 0x1502
X.c 0x1504

The CLA compiler will interpret this structure differently

Element Address
X.a 0x1500
X.b 0x1502
X.c 0x1503

The CLA compiler treats pointers b and c as 16-bits wide and, therefore, incorrectly de-references pointer c. The solution to this is to declare a new pointer as follows:

/********************************************************************
Shared Header File
********************************************************************/
typedef union{
  float *ptr; //Aligned to lower 16-bits
  Uint32 pad; //32-bits
}CLA_FPTR;
 
typedef struct{
  float a;
  CLA_FPTR b;
  CLA_FPTR c;
}foo;
 
/********************************************************************
main.c
********************************************************************/
#pragma(X,"CpuToCla1MsgRam") //Assign X to section CpuToCla1MsgRam
foo X;
/********************************************************************
test.cla
********************************************************************/
__interrupt void Cla1Task1 ( void )
{
  float f1,f2;
  f1 = *(X.b.ptr);
  f2 = *(X.c.ptr); //Correct Access
}

The new pointer CLA_FPTR is a union of a 32-bit integer and a pointer to a float. The CLA compiler recognizes the size of the larger of the two elements(the 32 bit integer) and therefore aligns the pointer to the lower 16-bits. Now both the pointers b and c will occupy 32-bit memory spaces and any instruction that tries to de-reference pointer c will access the correct address 0x1504.


CLA Compared to C28x+FPU (C28x plus floating-point unit)

Q: Are the basic CLA instructions also available for the 28x+FPU devices?

To make sure we are on the same page, lets define the following instruction sets:
C28x Instruction Set
This is the original fixed-point instruction set.
C28x+FPU Instruction Set
This is the C28x Instruction Set plus additional instructions to support native single-precision (32-bit) floating-point operations. While the additional instructions are mostly to support single-precision floating-point math, there are some other useful instructions like RPTB (repeat block) included. Since they are part of the superset, and only available on devices with the FPU, we still refer to them as part of the FPU instructions.
CLA Instruction Set
The CLA instruction set is a subset of the FPU instructions. A few FPU instructions are not supported on CLA - for example the repeat block is not supported. The CLA also has a few instructions that the FPU does not have. For example: the CLA has some native integer math instructions as well as a native branch/call/return.

Q: If the CLA instruction set is a subset of the FPU instruction set, can I assume benchmarks for floating-point div, sin, cos, etc are the same on both?

The CLA instructions are indeed a subset of the C28x+FPU and for the math instructions there are equivalents on each, but there are still differences that impact benchmarks. For example:
  • Cycle differences in multiply and conversion (see next question) as well as branches
  • Resource differences (ex: 8 floating-point result registers vs 4)

Q: Is the CLA faster at doing a floating-point multiply than the regular C28x+FPU?

Trick question! :) Consider the following:
  • C28x FPU: multiply or conversions take 2p cycles. This means that they take two cycles to complete, but remember you can put another instruction in that delay slot including another math instruction.
  • CLA: the math instructions and conversions take 1 cycle - no delay slot needed. So if you were not able to use that delay cycle on the FPU to do meaningful work, then you could say the CLA is faster if you are just counting cycles.
  • Devices with the CLA run much slower (60 MHz on 2803x and 80 MHz on 2806x) than the devices with the FPU unit (80-300 MHz).
So it depends on the FPU delay slot usage and the frequency of the device.

Q: What are the main differences between the CLA and the C28x+FPU?

The biggest thing to keep in mind is the CLA is independent from the main CPU where as the FPU unit was a superset on top of the C28x fixed-point CPU. The following table shows other differences between the two:


2803x CLA C28x+FPU
Execution Independent from the main CPU
Executes floating-point in parallel with the main CPU.
Part of the main CPU.
FPU instructions do not execute in parallel with fixed-point.
Floating-Point Result Registers 4 (MR0 - MR3) 8 (R0H - R7H)
Auxillary Registers 2, 16-bits, (MAR0, MAR1)
Can access all of CLA data
8, 32-bits, (XAR0 - XAR7)
Shared with fixed-point instructions
Pipeline 8-stage pipeline
Completely independent from main CPU
Fetch and decode stages shared with fixed point instructions
Can not execute in parallel with fixed-point
Single Step Moves pipeline ahead 1 cycle Completely flushes the pipeline
Addressing Modes Uses only 2 addressing modes
Direct & indirect with post increment
No data page pointer
Uses all C28x addressing modes
Interrupts Serviced
  • 2803x: ADC, ePWM and CPU Timer 0 interrupts
  • 2806x: ADC, ePWM, CPU Timer 0, eCAP and eQEP
All available interrupts
Nesting Interrupts Not supported. No stack pointer. Supported with stack pointer.
Instruction Set Independent instruction set
Subset of FPU instructions
Similar mnemonics to C28x+FPU but with leading 'M'
ex: MMPYF32 MR0, MR1, MR2
Floating-point instructions are in addition (superset) to C28x fixed-point instructions
Repeated Instructions No single repeat or repeat block Repeat MACF32 & repeat block (RPTB)
Communication with
main CPU
Through message RAMs and interrupts
Main CPU can read CLA execution registers but not write to them
One CPU, but you can copy information between fixed and float registers
For example, copy R0H to ACC or copy STF flags to ST0
Math and Conversion Single cycle 2p cycles (2 pipelined cycles)
Integer Operations Native instructions for AND, OR, XOR, ADD, SUB, shifts etc.. Uses fixed-point instructions
Flow Control Native branch/call/return
conditional delayed
Uses fixed-point instructions
Required copy of float flags to fixed-point ST0
Branch/Call/Return Conditional & delayed
3 instructions after are always executed
Performance can be improved by using delay cycles
Uses fixed-point flow control
Branches are not delayed
Instructions after are only executed if the branch is not taken
Memory Access CLA program, data and message RAMs only
Specific memory for CLA program
Specific memory for CLA data
All memory on the device
Program/data memory allocation is up to the user
Register Access
  • 2803x: ePWM+HRPWM, Comparator, ADC result
  • 2806x: ePWM+HRPWM, eQEP, eCAP, Comparator, ADC result
All registers on the device
Programming CLA Assembly or
CLA C Compiler (Requires C28x codegen 6.1.0 or later)
C/C++ or Assembly
Operating Frequency
(device dependent)
  • 2803x: Flash based devices up to 60MHz
  • 2806x: Flash based devices up to 80MHz
  • Flash based devices up to 150MHz (2833x)
  • RAM only devices up to 300MHz (2834x)