Please note as of Wednesday, August 15th, 2018 this wiki has been set to read only. If you are a TI Employee and require Edit ability please contact x0211426 from the company directory.
Control Law Accelerator (C2000 CLA) FAQ
x x x x x x
The CLA FAQs are being migrated to E2E FAQs. Please refer to https://e2e.ti.com/support/microcontrollers/c2000/f/171/t/786227
As the FAQs are migrated they will be deleted from the wiki.
x x x x x x
- 1 Architecture, Configuration
- 1.1 Q: How fast is the CLA interrupt response?
- 1.2 Q: Some of those registers you mentioned are EALLOW protected from rogue writes by the main CPU. Does the CLA have this protection as well?
- 1.3 Q: How can the CLA read the ADC result register "Just-in-time"?
- 1.4 Q: How can I interface to GPIO control from the CLA?
- 2 Memory Access
- 3 Instruction Set, Code Execution
- 3.1 Q: How fast are instructions executed by the CLA?
- 3.2 Q: In the examples I noticed there are 3 MNOP's after each MSTOP. Why is this done? Is it required?
- 3.3 Q: Does the CLA use floating-point math?
- 3.4 Q: Why float and not fixed-point?
- 3.5 Q: Since the main CPU is fixed-point and the CLA is float, won't I have to convert my numbers?
- 3.6 Q: Does the CLA support the repeat block (RPTB) instruction that is on the C28x+FPU?
- 3.7 Q: Does the CLA support branches?
- 3.8 Q: Are there any sub-hardware modules inside CLA where each component corresponds to some algorithm so that the user only need to set registers or is CLA just purely programmable in floating-point?
- 4 CLA 'C' Compiler
- 5 CLA Compared to C28x+FPU (C28x plus floating-point unit)
- 5.1 Q: Are the basic CLA instructions also available for the 28x+FPU devices?
- 5.2 Q: If the CLA instruction set is a subset of the FPU instruction set, can I assume benchmarks for floating-point div, sin, cos, etc are the same on both?
- 5.3 Q: Is the CLA faster at doing a floating-point multiply than the regular C28x+FPU?
- 5.4 Q: What are the main differences between the CLA and the C28x+FPU?
Q: How fast is the CLA interrupt response?
- The CLA does not support nesting of interrupts. In addition, the CLA receives interrupts directly, not through the peripheral interrupt expansion block (PIE). Because of this, the CLA has a very low interrupt response delay. At the 7th cycle after an interrupt the first instruction will be in the decode 2 (D2) phase of the pipeline, i.e. from the time an interrupt (trigger) is received it takes 4 cycles for the CLA to begin fetching the first instruction another 3 cycles for that instruction to move through the pipeline to the D2 (Decode 2) phase. The CLA has the ability to read the ADC result register in the same cycle that the ADC completes a sample conversion - the number of cycles for a complete sample/conversion is dependent on the ADC type, please refer to your device specific documentation for more information. Also see Tasks and Interrupts and Accessing Peripherals.
Q: Some of those registers you mentioned are EALLOW protected from rogue writes by the main CPU. Does the CLA have this protection as well?
- There is a bit called MEALLOW in the CLA status register that enables/disables the protection for CLA writes. This is set and cleared by the MEALLOW/MEDIS CLA instructions. This protection is independent of the main CPU's EALLOW bit. That is, the main CPU can enable writes via EALLOW, but the register will still be protected from CLA writes via MEALLOW.
Q: How can the CLA read the ADC result register "Just-in-time"?
- The ADC on 2803x can be configured to assert an interrupt after the sample window time. If the CLA is configured to respond to the ADC interrupt, then the task will begin while the conversion is in progress. The 8th instruction in the task will be just in time to read the ADC result register when it updates.
- The ADC on the 2837xD/S can be also be configured to assert an interrupt after the sample window time. If the CLA is configured to respond to the ADC interrupt, then the task will begin while the conversion is in progress. The conversion time, however, is not fixed and dependent on two things
- The ADCCLK
- The ADC mode i.e. 12-bit or 16-bit
- For example, assume the ADC is set to run at a quarter of the system clock in 12-bit mode; the ADC is set to sample (acquire) for 15 SYSCLK cycles or 75ns. After the capacitor has sampled the analog value, the ADC will trigger the CLA task early. In 12-bit mode, the ADC will take 29.5 ADCCLKs to complete a conversion
- T_sys = 1/200MHz = 5ns
- T_adc = 4*T_sys = 20ns
- The ADC will take 29.5 * 4 or 118 SYSCLK cycles to complete a conversion
+==============================================================================+ | ADC activity | CLA Activity | F1 | F2 | D1 | D2 | R1 | R2 | E | W | +==============================================================================+ | Sample | | | | | | | | | | | Sample | | | | | | | | | | | ... | | | | | | | | | | | Sample | | | | | | | | | | | Conversion (1) |Interrupt Received | | | | | | | | | | Conversion (2) |Task Startup | | | | | | | | | | Conversion (3) |Task Startup | | | | | | | | | | Conversion (4) |I1 |I1 | | | | | | | | | Conversion (5) |I2 |I2 |I1 | | | | | | | | ... | | |I2 |I1 | | | | | | | ... | | | |I2 |I1 | | | | | | ... | | | | |I2 |I1 | | | | | ... | | | | | |I2 |I1 | | | | ... | | | | | | |I2 |I1 | | | ... | | | | | | | |I2 |I1 | | ... | | | | | | | | |I2 | | ... | | | | | | | | | | | ... | | | | | | | | | | | ... | | | | | | | | | | | Conversion (114) |I110 |I110| | | | | | | | | Conversion (115) |I111 Read ADCRESULT|I111|I110| | | | | | | | Conversion (116) | | |I111|I110| | | | | | | Conversion (117) | | | |I111|I110| | | | | | Conversion (118) | | | | |I111|I110| | | | | RESULT Latched | | | | | |I111|I110| | | | RESULT Available | | | | | | |I111| | | +==============================================================================+
- So the CLA Task will have to wait 118 cycles before it is able to access the ADC result register; this means it has 118 cycles within which to perform setup or any other pre-calculations.
Q: How can I interface to GPIO control from the CLA?
- On most devices, the CLA does not have access to the GPIO control registers. GPIO control is typically handled by the main CPU. One thing that you could do is toggle a ePWM pin by accessing the ePWM registers with the CLA (see the ePWM's AQCSFRC register). If instead you want to toggle a GPIO pin at the end of the task, this could be done by the main CPU when it services the CLA interrupt.
- On the 2837xD/S, the CLA can directly access the GPIO control and data registers -in fact, it takes ownership of the peripheral bus. For a given CPU subsystem, the CLA and DMA share secondary access to some peripherals, the primary owner being the subsystem CPU itself. The secondary ownership of the bus is determined by the CpuSysRegs.SECMSEL[VBUS32_x] bit. If it is set to 0, the CLA is the secondary owner. If it is set to 1, the DMA is the secondary owner. By default, at reset, the CLA is given the secondary ownership of the bus and, therefore, can access all the peripherals connected to it.
Q: L3 memory block is designated as CLA Program. Can I allocate a portion for the CLA and the rest for the CPU?
- 2803x: No - this memory block either belongs to the main CPU (default at reset) or the CLA. That is, either one or the other can access it. If the block is assigned to the CLA and the main CPU tries to fetch or read data, it will receive a 0x0000.
- 2805x and 2806x: Yes - Using the MMEMCFG[RAMxCPUE] bit, you can grant the CPU access to read from and write to the data memory. After changing the configuration always wait 2 SYSCLKOUT cycles before accessing the memory.
- 2837xD: Yes - You can configure any of the RAMLSx blocks as CLA program space by
- writing a 1 to the memory block’s MemCfgRegs.LSxMSEL[MSEL_LSx] bit
- and then specifying the memory block as CLA code memory by writing a 1 to the MemCfgRegs.LSxCLAPGM[CLAPGM_LSx] bit.
- When a block is configured as CLA program space, the CLA can fetch from it but the CPU only has emulation read/write access. When configured as CLA data space, it is shared between the CPU and CLA i.e the CPU has read and write permissions, subject to arbitration, to the memory block
Q: I want to do a ping-pong scheme where the CLA uses a data RAM and then the main CPU uses it. Can this be done?
- Yes - There a few things you need to make sure of before changing the mapping of the data RAM:
- After changing the mapping (via MMEMCFG) always wait 2 SYSCLKOUT cycles before accessing the memory.
- Before changing the memory from CPU to CLA, make sure the CPU is not performing any accesses to the memory.
- Before changing the memory from CLA to CPU, make sure the CLA is not performing any accesses. One way to check for this is to clear the MIER register, wait a couple of cycles and then check that MIRUN is cleared.
- Yes - There are two options:
- Option 1
- Using the MMEMCFG[RAMxCPUE] bit, you can grant the CPU access to read from and write to the data memory. After changing the configuration always wait 2 SYSCLKOUT cycles before accessing the memory.
- Option 2
- Follow the same procedure as for the 2803x and re-map the memory to the 28x..
- any RAMLSx block, when configured as CLA data space, is automatically shared between the CPU and CLA.
Q: Is CLA memory protected by the Code Security Module (CSM)?
- Yes - on 2803x devices, all CLA memory is protected by the CSM. The CLA configuration and result registers are also protected. On later device, CLA memory can also be protected by the DCSM module. Please refer to the device TRM for more information on security.
Instruction Set, Code Execution
Q: How fast are instructions executed by the CLA?
- The CLA is clocked at the same rate as the main CPU (SYSCLKOUT) which is max 60 MHz on 2803x and 80 MHz on 2806x. All CLA instructions are single cycle. While a discontinuity (branch/call/return) instruction itself is single cycle, the time for a discontinuity to complete depends on how many "delay slots" around the instruction are used. If no delay slots are used, then the discontinuity completes in 7 cycles (worst case) whether taken or not. A typical time would be 4 cycles taken/not taken. In this case slots before the instruction are used, but slots after are not.
Q: In the examples I noticed there are 3 MNOP's after each MSTOP. Why is this done? Is it required?
- There is a restriction that an MSTOP not endup within 3 instructions of a branch. The MNOPS have been added to make sure this requirement is always met even when the program RAM following a task is not initialized. If you know for sure there is not a branch within 3 instructions after the MSTOP, then you can remove the MNOPS.
Q: Does the CLA use floating-point math?
- Yes the CLA supports 32-bit (single-precision) floating-point math operations. These follow the IEEE 754 standard.
Q: Why float and not fixed-point?
- Floating-point is easier to code. It is self saturating and thus removes the saturation and scaling burden from code. In addition it does not suffer from overflow/underflow sign inversion issues. Finally algorithms coded in float are more cycle efficient.
Q: Since the main CPU is fixed-point and the CLA is float, won't I have to convert my numbers?
- Yes, the CLA makes this easy with instructions that convert data. If you are reading from memory this conversion can be done while the value is read so it is quite efficient. For example, you can read an ADC result register (Unsigned 16-bit) and convert it to 32-bit float as you read it.
Q: Does the CLA support the repeat block (RPTB) instruction that is on the C28x+FPU?
- No, but you can use a loop (branch or call) to execute a block of instructions multiple times. There are also no single repeat instructions for the CLA. (RPT ||...)
Q: Does the CLA support branches?
- Yes, the CLA has its own branch (MBCNDD), call (MCCNDD) and return (MRCNDD). All of these are conditional and delayed.
Q: Are there any sub-hardware modules inside CLA where each component corresponds to some algorithm so that the user only need to set registers or is CLA just purely programmable in floating-point?
- There are no algorithms built into the CLA in hardware. The CLA is completely programmable.
CLA 'C' Compiler
Q: Are signed integers (16-bit) data types supported?
- The data type, signed integer, is supported (as of CGT 6.4.1) but not all operations on it are, due to the fact that the CLA does not have the necessary assembly instructions to do efficient operations with that data type. For e.g. integer modulus is not supported. Please see the compiler guide for the full list of restrictions
- Compiler Guide: TMS320C28x Optimizing C/C++ Compiler v6.4 (SPRU514).
CLA Compared to C28x+FPU (C28x plus floating-point unit)
Q: Are the basic CLA instructions also available for the 28x+FPU devices?
- To make sure we are on the same page, lets define the following instruction sets:
- C28x Instruction Set
- This is the original fixed-point instruction set.
- C28x+FPU Instruction Set
- This is the C28x Instruction Set plus additional instructions to support native single-precision (32-bit) floating-point operations. While the additional instructions are mostly to support single-precision floating-point math, there are some other useful instructions like RPTB (repeat block) included. Since they are part of the superset, and only available on devices with the FPU, we still refer to them as part of the FPU instructions.
- CLA Instruction Set
- The CLA instruction set is a subset of the FPU instructions. A few FPU instructions are not supported on CLA - for example the repeat block is not supported. The CLA also has a few instructions that the FPU does not have. For example: the CLA has some native integer math instructions as well as a native branch/call/return.
Q: If the CLA instruction set is a subset of the FPU instruction set, can I assume benchmarks for floating-point div, sin, cos, etc are the same on both?
- The CLA instructions are indeed a subset of the C28x+FPU and for the math instructions there are equivalents on each, but there are still differences that impact benchmarks. For example:
- Cycle differences in multiply and conversion (see next question) as well as branches
- Resource differences (ex: 8 floating-point result registers vs 4)
Q: Is the CLA faster at doing a floating-point multiply than the regular C28x+FPU?
- Trick question! :) Consider the following:
- C28x FPU: multiply or conversions take 2p cycles. This means that they take two cycles to complete, but remember you can put another instruction in that delay slot including another math instruction.
- CLA: the math instructions and conversions take 1 cycle - no delay slot needed. So if you were not able to use that delay cycle on the FPU to do meaningful work, then you could say the CLA is faster if you are just counting cycles.
- Devices with the CLA run much slower (60 MHz on 2803x and 80 MHz on 2806x) than the devices with the FPU unit (80-300 MHz).
- So it depends on the FPU delay slot usage and the frequency of the device.
Q: What are the main differences between the CLA and the C28x+FPU?
- The biggest thing to keep in mind is the CLA is independent from the main CPU where as the FPU unit was a superset on top of the C28x fixed-point CPU. The following table shows other differences between the two:
2803x CLA C28x+FPU Execution Independent from the main CPU
Executes floating-point in parallel with the main CPU.
Part of the main CPU.
FPU instructions do not execute in parallel with fixed-point.
Floating-Point Result Registers 4 (MR0 - MR3) 8 (R0H - R7H) Auxillary Registers 2, 16-bits, (MAR0, MAR1)
Can access all of CLA data
8, 32-bits, (XAR0 - XAR7)
Shared with fixed-point instructions
Pipeline 8-stage pipeline
Completely independent from main CPU
Fetch and decode stages shared with fixed point instructions
Can not execute in parallel with fixed-point
Single Step Moves pipeline ahead 1 cycle Completely flushes the pipeline Addressing Modes Uses only 2 addressing modes
Direct & indirect with post increment
No data page pointer
Uses all C28x addressing modes Interrupts Serviced
- 2803x: ADC, ePWM and CPU Timer 0 interrupts
- 2806x: ADC, ePWM, CPU Timer 0, eCAP and eQEP
- Refer to device specific TRM as this list has increased over time.
All available interrupts Nesting Interrupts
- CLA type 0, 1 - Not supported. No stack pointer.
- CLA type 2 - supports 1 background task
Supported with stack pointer. Instruction Set Independent instruction set
Subset of FPU instructions
Similar mnemonics to C28x+FPU but with leading 'M'
ex: MMPYF32 MR0, MR1, MR2
Floating-point instructions are in addition (superset) to C28x fixed-point instructions Repeated Instructions No single repeat or repeat block Repeat MACF32 & repeat block (RPTB) Communication with
Through message RAMs and interrupts
Main CPU can read CLA execution registers but not write to them
One CPU, but you can copy information between fixed and float registers
For example, copy R0H to ACC or copy STF flags to ST0
Math and Conversion Single cycle 2p cycles (2 pipelined cycles) Integer Operations Native instructions for AND, OR, XOR, ADD, SUB, shifts etc.. Uses fixed-point instructions Flow Control Native branch/call/return
Uses fixed-point instructions
Required copy of float flags to fixed-point ST0
Branch/Call/Return Conditional & delayed
3 instructions after are always executed
Performance can be improved by using delay cycles
Uses fixed-point flow control
Branches are not delayed
Instructions after are only executed if the branch is not taken
Memory Access CLA program, data and message RAMs only
Specific memory for CLA program
Specific memory for CLA data
All memory on the device
Program/data memory allocation is up to the user
- 2803x: ePWM+HRPWM, Comparator, ADC result
- 2806x: ePWM+HRPWM, eQEP, eCAP, Comparator, ADC result
- Refer to specific datasheet or TRM as this list has grown over time.
All registers on the device Programming CLA Assembly or
CLA C Compiler (Requires C28x codegen 6.1.0 or later)
C/C++ or Assembly Operating Frequency
- 2803x: Flash based devices up to 60MHz
- 2806x: Flash based devices up to 80MHz
- Refer to specific device family TRM and data sheet
- Flash based devices up to 150MHz (2833x)
- RAM only devices up to 300MHz (2834x)