Control Law Accelerator (C2000 CLA) FAQ

From Texas Instruments Embedded Processors Wiki

Jump to: navigation, search


  • Image:Google-16x16.png Search for an article here:


This is an frequently asked question list for the C2000 Control Law Accelerator found on 2803x MCU devices.


Contents

Other Resources

Forum Discussion

Documentation and Webpages

Training Videos

Frequently Asked Questions

Workshop Material

Section 9 of the Piccolo Multi-day workshop is dedicated to the Control Law Accelerator

Software

controlSUITE
The latest software for the CLA is included as part of controlSUITE (www.ti.com/controlSUITE). This includes:
  • DPLib CLA release
  • Example system in the HVPFCKit.
  • The CLAmath Macro Library:
sin, cos, div, sqrt, 1/sqrt, atan, atan2
  • 2803x C/C++ Header Files and Peripheral Examples
ADC -> FIR example, both saram and flash based
Legacy Downloads
The following legacy downloads include projects ready for use on CCS V3.3

Architecture, Configuration

Q: What is the CLA?

The CLA is a 32-bit floating-point math accelerator that runs in parallel with the main CPU.

Q: Is the CLA independent from the main CPU?

Yes. Once the CLA is configured by the main CPU it can execute algorithms independently of the main CPU. The CLA has its own bus structure, register set, pipeline and processing unit. In addition the CLA can access ePWM, comparator and ADC result registers directly. This makes it ideal for handling time-critical control loops but it can also be used for filtering or math algorithms.

Q: Is the CLA interrupt driven?

Yes, the CLA responds to the ADC, ePWM and CPU timer 0 interrupts. Also see Tasks and Interrupts.

Q: How fast is the CLA interrupt response?

The CLA does not service nontime-critical interrupts (e.g. communication ports) and there is no nesting of interrupts. In addition the CLA receives interrupts directly, not through the peripheral interrupt expansion block (PIE). Because of this, the CLA has a very low interrupt response delay. At the 7th cycle after an interrupt the first instruction will be in the decode 2 (D2) phase of the pipeline. In addition, the CLA can easily read the ADC result register as soon as it is available. Also see Tasks and Interrupts and Accessing Peripherals.

Q: Does the CLA have registers?

Yes, the CLA has its own independent set of registers. The CLA registers can be thought of in two groups:
  • Configuration Registers: Some of these registers are used by the main C28x CPU to configure the CLA. Other registers give the main CPU status information. For example, which interrupts have been flagged or which task is currently running.
  • Execution Registers: These include four floating-point result registers, two auxillary registers, a status register and a program counter. These registers can be read by the main C28x CPU but not written to.

Q: Does the CLA have an Accumulator?

  • There is no single register designated as an accumulator - results of operations go into the result registers (MR0-MR3).

Q: What frequency does the CLA run at?

The CLA on 2803x devices runs at the same speed as the CPU (SYSCLKOUT).

Q: At reset what state is the CLA in?

The clock to the CLA is disabled and all the CLA registers are cleared. CLA will not start servicing interrupts until after it is configured to do such by the main CPU.

Q: How is the CLA configured?

The CLA is configured by the main CPU just as any other module or peripheral.

Development Tools, Debugging, etc..

Q: I would like to learn what code development tools are available and how I can debug code for the CLA.

Please refer to the Control Law Accelerator (C2000 CLA) Debug FAQ

Tasks and Interrupts

Q: What is a 'Task'?

A CLA task is an interrupt response routine executed by the CLA.

Q: How many interrupts are supported?

The 2803x CLA supports 8 interrupts.

Q: Which interrupts can start a task?

Peripherals: Each task has specific peripheral interrupts that can trigger it. Which one is selected in the MPISRCSEL1 register by the main CPU.
One thing that is very important to understand is the trigger source is just the mechanism by which the task is started. The trigger source does not limit what the task can do. For example, task 1 can read any/multiple ADC RESULT register(s) and modify any ePWM1, ePWM2, ePWM3...ePWM7 register even through it was started by EPWM1_INT.
On 2803x the interrupt triggers are assigned as follows:
  • Interrupt 1 = Task 1 = ADCINT1 or EPWM1_INT or software only
  • Interrupt 2 = Task 2 = ADCINT2 or EPWM2_INT or software only
  • Interrupt 3 = Task 3 = ADCINT3 or EPWM3_INT or software only
  • Interrupt 4 = Task 4 = ADCINT4 or EPWM4_INT or software only
  • Interrupt 5 = Task 5 = ADCINT5 or EPWM5_INT or software only
  • Interrupt 6 = Task 6 = ADCINT6 or EPWM6_INT or software only
  • Interrupt 7 = Task 7 = ADCINT7 or EPWM7_INT or software only
  • Interrupt 8 = Task 8 = ADCINT8 or CPU Timer 0 or software only

Q: Can the main CPU start a task through software?

Yes! The main CPU can flag an interrupt at any time by using the IACK #16bit instruction. For example IACK 0x0003 would flag interrupt 1 and interrupt 2. This is the same as setting bits in the force register (MIFRC).

Q: I'm trying to force tasks using the IACK instruction, but it isn't working. What could be wrong?

  • Make sure you've enabled this feature in the MICTL register.
  • Make sure the interrupt is enabled in the MIER register.
  • Make sure you are using the correct argument to IACK. For example IACK #0x0003 would flag interrupt 1 (bit 0) and interrupt 2 (bit 1).

Q: If two interrupts come in at the same time, which one is executed first?

The highest priority task that is both flagged (MIFR register) and enabled (MIER register) is the one to get executed. Interrupt 1/Task 1 has the highest priority and Interrupt 8/Task 8 the lowest priority.

Q: Can you nest CLA interrupts?

No. A CLA task is executed until it is complete. Once a task is complete then the highest interrupt that is both flagged and enabled will automatically begin.

Q: Can the CLA interrupt the main CPU?

The CLA will send an interrupt to the PIE (peripheral interrupt expansion block) to let the main CPU know a task has completed. Each task has an associated vector in the PIE. This interrupt is automatically fired when the associated task completes. For example when task 1 completes, CLA1_INT1 in the PIE will be flagged.
There are also dedicated interrupts in the PIE for floating-point overflow and underflow conditions.

Q: Can the main CPU terminate a task?

Yes. If a interrupt has been flagged, but the task has not yet run, the main CPU can clear the flag using the MICLR register.
If the task is already running then a soft reset (in MCTL) will terminate the task and clear the MIER register. If you want to clear all of the CLA registers you can use the hard reset option in the MCTL register.

Q: What is the starting address for each task? Is the starting address fixed?

The start address is configurable. Each task has an associated interrupt vector (MVECT1 to MVECT8). This vector holds the starting address (as an offset from the first program location) of the task.

Q: Is there a size limit for a task?

No limit other than all the instructions for all the tasks need to fit within the CLA program memory of the device. All CLA instructions are 32-bits, so within a 4k x 16 program space you can have ~2k CLA instructions.

Q: How do I indicate the end of a task?

After a task begins, the CLA will execute instructions until it encounters an "MSTOP" instruction. MSTOP indicates the end of the task.

Q: Can the CLA itself flag another task?

The CLA can not write to its own configuration registers so it can not start a task by writing to the force register. It can, however, write to the ePWM registers so technically it could force an interrupt from one of the ePWM modules.
The main CPU can take an interrupt when the task is complete. You could within this interrupt start another task using the IACK instruction.

Q: If the CLA is configured to respond to ACDINT1, can the CPU also respond?

Yes. The interrupts are sent to both the CLA and the PIE so either or both can respond.

Accessing Peripherals

Q: Which peripherals can the CLA directly access?

On 2803x devices the CLA has direct access to the ADC result, ePWM+HRPWM, and comparator registers.

Q: Some of those registers you mentioned are EALLOW protected from rogue writes by the main CPU. Does the CLA have this protection as well?

There is a bit called MEALLOW in the CLA status register that enables/disables the protection for CLA writes. This is set and cleared by the MEALLOW/MEDIS CLA instructions. This protection is independent of the main CPU's EALLOW bit. That is, the main CPU can enable writes via EALLOW, but the register will still be protected from CLA writes via MEALLOW.

Q: How can the CLA read the ADC result register "Just-in-time"?

The ADC on 2803x can be configured to assert an interrupt after the sample window time. If the CLA is configured to respond to the ADC interrupt, then the task will begin while the conversion is in progress. The 8th instruction in the task will be just in time to read the ADC result register when it updates.

Q: If the CLA takes an ADC interrupt, can it then clear the ADC's interrupt flag?

No. The CLA can not access the ADC configuration registers so it can not clear the ADC interrupt flag. Here are three options for handling this:
  • Option 1: Place the ADC in continuous mode. In this mode the next conversion will start, when triggered, even if the flag is still set.
  • Option 2: Service the ADC interrupt with the main CPU as well as the CLA and have the main CPU clear the flag.
  • Option 3: Have the main CPU service the end of task interrupt from the CLA and clear the flag.

Q: You mentioned the CLA can access the ePWM registers. Is it all of the ePWM modules on the device?

Any task can access any of the ePWM modules. There are no restrictions on this.

Q: How can I interface to GPIO control from the CLA?

On 2803x devices the CLA does not have access to the GPIO control registers. GPIO control is typically handled by the main CPU. One thing that you could do is toggle a ePWM pin by accessing the ePWM registers with the CLA. If you want to toggle a GPIO pin at the end of the task, this could be done by the main CPU when it services the CLA interrupt.

Q: If the CLA is using the ePWM or ADC Result registers, does it mean the main CPU can not?

No. Both the CLA and the main CPU can access the registers. The arbitration scheme for these registers can be found in the CLA Reference Guide. Keep in mind if the main CPU performs a read-modify-write operation on a register and between the read and the write the CLA modifies the same register, the changes made by the CLA can be lost. In general it is best to not have both processors writing to registers.

Q: I want CLA Task 1 to access ePWM1 and ePWM2 registers, but Task 1 can only be triggered by EPWM1_INT.

The interrupt is only the method by which a task is started. It does not limit the resources the CLA can access during that task. Any CLA task can access all of the ADC RESULT, compare and ePWM registers. For example, assume ADCINT1 triggers task 1. This single task could then read ADC RESULT0, perform a control algo, read ADC RESULT1, perform another control algo, etc.

Memory Access

Q: Does the CLA have access to all of the memory on the device?

There are specific blocks on the device that the CLA can use.
  • CLA Program Memory:
On Piccolo (2803x) this is a 4k x 16 block, single cycle (no wait states). This means it can hold 2048 CLA instructions (all CLA instructions are 32-bits). At reset, this block is mapped to the main CPU memory space and treated by the CPU like any other memory block. While mapped to CPU space, the main CPU can initialize the memory with the CLA program code. Once the memory is initialized the CPU maps it to the CLA program space.
  • CLA Data Memory
There are two CLA data memory blocks on the device. On Piccolo each of these blocks is 1k x 16, single-cycle.
At reset, both blocks are mapped to the main CPU memory space and treated by the CPU like any other memory block. While mapped to CPU space, the main CPU can initialize the memory with data tables and coefficients for the CLA to use. Once the memory is initialized with CLA data the main CPU maps it to the CLA space. (Each block can be independently mapped).
  • Shared message RAMs
There are two small memory blocks for data sharing and communication between the CLA and the main CPU. On Piccolo these are 128 x 16 words in size.
  • CLA to CPU Message RAM: CLA can read/write, main CPU can only read
  • CPU to CLA Message RAM: CPU can read/write, CLA can only read

Q: Can both the CLA and the CPU access the same message RAM at the same time?

The accesses will be arbitrated as described below.
  • CLA to CPU Message RAM: Priority of accesses are (highest priority first):
  1. CLA write
  2. CPU debug write
  3. CPU data read, program read, CPU debug read
  4. CLA data read
  • CPU to CLA Message RAM: Priority of accesses are (highest priority first):
  1. CLA read
  2. CPU data write, program write, CPU debug write
  3. CPU data read, CPU debug read
  4. CPU program read

Q: If I understand the 2803x data manual right, CLA code can only be run from L3 (CLA program RAM). It can be loaded from flash and run in L3.

Yes. CLA can only execute from L3 on the 2803x. During debug you can load the ram directly using the debugger. In a stand-alone system the main CPU needs to initialize the RAM with the CLA code. Most likely the code will be stored in flash and copied to L3. For an example, refer to the CLA FIR in flash project in 2803x C/C++ Header Files and Peripheral Examples.

Q: On 2803x can I use a block other than L3 for CLA code?

No - on the 2803x devices, only L3 can be used by the CLA for program memory

Q: On 2803x the L3 memory block is designated as CLA Program. Can I allocate a portion for the CLA and the rest for the CPU?

No - this memory block either belongs to the main CPU (default at reset) or the CLA. That is, either one or the other can access it. If the block is assigned to the CLA and the main CPU tries to fetch or read data, it will receive a 0x0000.

Q: Can the message RAM be used as data memory for the CLA?

The CLA to CPU message RAM (128x16) can be read/written to by the CLA and can be used as a data RAM for the CLA.

Q: How can I initialize variables in the CLA to CPU message RAM?

Since the main CPU can not write to this memory, the CLA will need to initialize these variables. To do this, set up a task to perform the initialization and then trigger the task using the main CPU. The FIR example in the 2803x C/C++ Header Files and Peripheral Examples shows how this is done.

Q: In my application the CLA does not need data memory. Can this memory be used by the main CPU instead?

Yes - if the memory is not needed by the CLA then it can be used by the main CPU just like any other block. Also the two data memory blocks can be independently assigned to the CLA or main CPU so you can have one block belong to the main CPU and the other to the CLA.

Q: In my application the CLA does not need both data RAMs. Can one block be assigned to the CLA and the other block to the main CPU?

Yes.

Q: I want to do a ping-pong scheme where the CLA uses a data RAM and then the main CPU uses it. Can this be done?

Yes - There a few things you need to make sure of before changing the mapping of the data RAM:
  • After changing the mapping (via MMEMCFG) always wait 2 SYSCLKOUT cycles before accessing the memory.
  • Before changing the memory from CPU to CLA, make sure the CPU is not performing any accesses to the memory.
  • Before changing the memory from CLA to CPU, make sure the CLA is not performing any accesses. One way to check for this is to clear the MIER register, wait a couple of cycles and then check that MIRUN is cleared.

Q: Is CLA memory protected by the Code Security Module (CSM)?

Yes - on 2803x devices, all CLA memory is protected by the CSM. The CLA configuration and result registers are also protected.

Communication Between the CLA and main CPU

Q: How do the main CPU and the CLA communicate with eachother?

Communication is handled through the message RAMs and interrupts.
  • The CLA can pass data to the main CPU through the CLA to CPU message RAM
  • The main CPU can pass data to the CLA through the CPU to CLA message RAM
  • The main CPU can flag a CLA interrupt/task in software if desired by using the IACK instruction.
  • The CLA can alert the main CPU that a task has completed through an interrupt to the PIE. There is one interrupt vector per task in the PIE. The main CPU does not have to service interrupts from the CLA if it is not required by the application.

Q: In my code, how can I share variables between the CLA and the main CPU?

Since the CLA and the C28x code are in the same project this is very easy. My suggestion is to:
  • Create a shared header file with common constants and variables. Include this file in both the C28x C and CLA .asm code.
  • Use data section pragma statements and the linker file to place the variables in the appropriate message RAM.
  • Define shared variables in your C code.
  • Initialize variables in the CPU to CLA message RAM with the main CPU.
  • Initialize variables in the CLA to CPU message RAM with a CLA task. This initialization task can be started via the main C28x software.
This method is described in this short video Demo of the framework used in the header files and peripheral examples

Q: Can I use CLA data RAM as a message RAM?

Yes, but remember that at any given time each of the the CLA data RAMs either belongs to the main CPU or to the CLA. Therefore for the other processor to see the data you will have to first make sure no accesses are taking place to the RAM and then change the mapping.

Instruction Set, Code Execution

Q: How fast are instructions executed by the CLA?

2803x CLA is clocked at the same rate as the main CPU (SYSCLKOUT) which is max 60Mhz. All CLA instructions are single cycle. While a discontinuity (branch/call/return) instruction itself is single cycle, the time for a discontinuity to complete depends on how many "delay slots" around the instruction are used. If no delay slots are used, then the discontinuity completes in 7 cycles (worst case) whether taken or not. A typical time would be 4 cycles taken/not taken. In this case slots before the instruction are used, but slots after are not.

Q: In the examples I noticed there are 3 MNOP's after each MSTOP. Why is this done? Is it required?

There is a restriction that an MSTOP not endup within 3 instructions of a branch. The MNOPS have been added to make sure this requirement is always met even when the program RAM following a task is not initialized. If you know for sure there is not a branch within 3 instructions after the MSTOP, then you can remove the MNOPS.

Q: Does the CLA use floating-point math?

Yes the CLA supports 32-bit (single-precision) floating-point math operations. These follow the IEEE 754 standard.

Q: Why float and not fixed-point?

Floating-point is easier to code. It is self saturating and thus removes the saturation and scaling burden from code. In addition it does not suffer from overflow/underflow sign inversion issues. Finally algorithms coded in float are more cycle efficient.

Q: Since the main CPU is fixed-point and the CLA is float, won't I have to convert my numbers?

Yes, the CLA makes this easy with instructions that convert data. If you are reading from memory this conversion can be done while the value is read so it is quite efficient. For example, you can read an ADC result register (Unsigned 16-bit) and convert it to 32-bit float as you read it.

Q: Does the CLA support the repeat block (RPTB) instruction that is on the C28x+FPU?

No, but you can use a loop (branch or call) to execute a block of instructions multiple times. There are also no single repeat instructions for the CLA. (RPT ||...)

Q: Does the CLA support branches?

Yes, the CLA has its own branch (MBCNDD), call (MCCNDD) and return (MRCNDD). All of these are conditional and delayed.

Q: Are there any sub-hardware modules inside CLA where each component corresponds to some algorithm so that the user only need to set registers or is CLA just purely programmable in floating-point?

There are no algorithms built into the CLA in hardware. The CLA is completely programmable.

Q: What types of instructions does the CLA support?

For a full list of instructions, please refer to the TMS320x2803x Piccolo Control Law Accelerator (CLA) Reference Guide (SPRUGE6). The following table shows an overview of the types of instructions the CLA supports.
Type Example(s) Cycles Type Example(s) Cycles
Load (conditional) MMOV32 MRa,mem32{,CONDF} 1 Store MMOV32 mem32,MRa 1
Load With Data Move MMOVD32 MRa,mem32 1 Store/Load MSTF MMOV32 MSTF,mem32 1
Floating Point
Compare, Min, Max
Absolute, Negative Value
MCMPF32 MRa,MRb
MABSF32 MRa,MRb
1 Conversion
Unsigned Integer To Float
Integer To Float
Float To Integer
etc
MUI16TOF32 MRa,mem16
MI32TOF32 MRa,mem32
MF32TOI32 MRa,MRb
1
Floating Point Math
Add, Sub, Multiply
1/X Estimate
1/sqrt(x) Estimate

MMPYF32 MRa,MRb,MRc
MEINVF32 MRa,MRb
MEISQRTF32 MRa,MRb
1 Integer Load/Store MMOV16 MRa,mem16 1
Load/Store Auxiliary Register MMOV16 MAR,mem16 1 Branch/Call/Return MBCNDD 16bitdest {,CNDF} 1-7
Integer Operations
AND, OR, XOR
Add Sub
Shift

MAND32 MRa,MRb,MRc
MSUB32 MRa,MRb,MRc
MLSR32 MRa,#SHIFT
1 Stop Instructions
End of Task
Breakpoint
MSTOP
MDEBUGSTOP
1
Parallel Instructions
Multiply with parallel add/sub
Math with parallel load/store
MMPYF32 MRa,MRb,MRc
|| MSUBF32 MRd,MRe,MRf
1/1 No Operation MNOP 1



CLA Compared to C28x+FPU (C28x plus floating-point unit)

Q: Are the basic CLA instructions also available for the 28x+FPU devices?

To make sure we are on the same page - within the C28x family there are currently three instruction sets:
  • C28x Instruction Set:
This is the original fixed-point instruction set.
  • C28x+FPU Instruction Set:
This is the C28x Instruction Set plus additional instructions to support native single-precision (32-bit) floating-point operations. While the additional instructions are mostly to support single-precision floating-point math, there are some other useful instructions like RPTB (repeat block) included. Since they are part of the superset, and only available on devices with the FPU, we still refer to them as part of the FPU instructions.
  • CLA Instruction Set:
The CLA instruction set is a subset of the FPU instructions. A few FPU instructions are not supported on CLA - for example the repeat block is not supported. The CLA also has a few instructions that the FPU does not have. For example: the CLA has some native integer math instructions as well as a native branch/call/return.

Q: If the CLA instruction set is a subset of the FPU instruction set, can I assume benchmarks for floating-point div, sin, cos, etc are the same on both?

The CLA instructions are indeed a subset of the C28x+FPU and for the math instructions there are equivalents on each, but there are still differences that impact benchmarks. For example:
  • Cycle differences in multiply and conversion (see next question) as well as branches
  • Resource differences (ex: 8 floating-point result registers vs 4)

Q: Is the CLA faster at doing a floating-point multiply than the regular C28x+FPU?

Trick question! :) Consider the following:
  • C28x FPU: multiply or conversions take 2p cycles. This means that they take two cycles to complete, but remember you can put another instruction in that delay slot including another math instruction.
  • CLA: the math instructions and conversions take 1 cycle - no delay slot needed. So if you were not able to use that delay cycle on the FPU to do meaningful work, then you could say the CLA is faster if you are just counting cycles.
  • Devices with the CLA run much slower (60MHz) than the devices with the FPU unit (150-300MHz).
So it depends on the FPU delay slot usage and the frequency of the device.

Q: What are the main differences between the CLA and the C28x+FPU?

The biggest thing to keep in mind is the CLA is independent from the main CPU where as the FPU unit was a superset on top of the C28x fixed-point CPU. The following table shows other differences between the two:


2803x CLA C28x+FPU
Execution Independent from the main CPU
Executes floating-point in parallel with the main CPU.
Part of the main CPU.
FPU instructions do not execute in parallel with fixed-point.
Floating-Point Result Registers 4 (MR0 - MR3) 8 (R0H - R7H)
Auxillary Registers 2, 16-bits, (MAR0, MAR1)
Can access all of CLA data
8, 32-bits, (XAR0 - XAR7)
Shared with fixed-point instructions
Pipeline 8-stage pipeline
Completely independent from main CPU
Fetch and decode stages shared with fixed point instructions
Can not execute in parallel with fixed-point
Single Step Moves pipeline ahead 1 cycle Completely flushes the pipeline
Addressing Modes Uses only 2 addressing modes
Direct & indirect with post increment
No data page pointer
Uses all C28x addressing modes
Interrupts Serviced ADC, ePWM and CPU Timer 0 interrupts All available interrupts
Nesting Interrupts Not supported. No stack pointer. Supported with stack pointer.
Instruction Set Independent instruction set
Subset of FPU instructions
Similar mnemonics to C28x+FPU but with leading 'M'
ex: MMPYF32 MR0, MR1, MR2
Floating-point instructions are in addition (superset) to C28x fixed-point instructions
Repeated Instructions No single repeat or repeat block Repeat MACF32 & repeat block (RPTB)
Communication with
main CPU
Through message RAMs and interrupts
Main CPU can read CLA execution registers but not write to them
One CPU, but you can copy information between fixed and float registers
For example, copy R0H to ACC or copy STF flags to ST0
Math and Conversion Single cycle 2p cycles (2 pipelined cycles)
Integer Operations Native instructions for AND, OR, XOR, ADD, SUB, shifts etc.. Uses fixed-point instructions
Flow Control Native branch/call/return
conditional delayed
Uses fixed-point instructions
Required copy of float flags to fixed-point ST0
Branch/Call/Return Conditional & delayed
3 instructions after are always executed
Performance can be improved by using delay cycles
Uses fixed-point flow control
Branches are not delayed
Instructions after are only executed if the branch is not taken
Memory Access CLA program, data and message RAMs only
Specific memory for CLA program
Specific memory for CLA data
All memory on the device
Program/data memory allocation is up to the user
Register Access ePWM+HRPWM, Comparator, ADC result All registers on the device
Programming CLA Assembly C/C++ or Assembly
Operating Frequency
(device dependent)
Flash based devices up to 60MHz Flash based devices up to 150MHz
RAM only devices up to 300MHz

For technical support please post your questions at http://e2e.ti.com. Please post only comments about the article Control Law Accelerator (C2000 CLA) FAQ here.
Leave a Comment
Personal tools