OMAP-L1x/C674x/AM1x SoC Level Optimizations

From Texas Instruments Wiki
Jump to: navigation, search

^ Up to main OMAP-L1x/C674x/AM1x SOC Architecture and Throughput Overview Table of Contents

Master Priority and Interconnect Arbitration

The system interconnect or the Switched Central Resource (SCR) system provides priority-based arbitration to select the connection between master and slave peripherals; this arbitration is based on the priority value of each master port.
Each master can have a priority value between 0 and 7

  • 0 is the highest priority
  • 7 is the lowest priority


Interconnect Arbitration Scheme

At any given instance read and write requests from multiple masters may compete for the same end point (an end point can be a slave peripheral, on- or off-chip memory, or an infrastructure component such as a bridge/SCR that connects to multiple slave peripherals). The SCRs use the following arbitration scheme to resolve contention:

  1. At each priority level, one request from each master is selected in a round robin manner.
  2. The highest priority request is selected.
  3. Arbitration occurs at burst size boundaries (or lower).

Recommendations

  • Check the default priority values for each master on a given device. For OMAPL1x/C674x the default priority is provided in the SYSCFG module via the MSTPRIn registers. These read/write registers can be used to change the default priority of each master.
  • Change the default priorities so that they suit the application requirements. For example on OMAPL1x/c674x the default priority for LCDC is 5 -- this value is lower than the default priority of many other masters on the device. To adequately meet the real-time requirements of the LCDC, it is recommended that the priority of the LCDC be elevated to the highest priority in the system (0 being the highest) with respect to competing masters in the system.
NOTE: EMIFB controller on OMAPL1x7/C6747/C6745/C6745 and DDR2/mDDR memory controller on OMAPL1x8/C6748/C6746/C6742 slave endpoint may re-order commands, irrespective of the master priority. For EMIF command re-ordering please review the next sub-section.

External Memory Interface Controller (EMIFB SDRAM/mDDR/DDR2) Command Re-Ordering (BPRIO/PBBPR setting)

The external memory interface controller has the ability to re-order commands to optimize its efficiency. This command re-ordering is performed by the EMIF controller irrespective of priority of the master sending these commands.

NOTE: The command re-ordering and associated details do not apply to EMIFA SDRAM interface on this device family. 

Impact of Command Re-ordering Scheme

EMIF command re-ordering may:

  • Delay the servicing of a higher priority master request.
  • Result in the blocking of commands from one master by the continuous commands from a higher priority master.
  • Block commands to a memory row when there are pending commands to another row that is already open in the same bank.


Recommendations

  • Adjust the "priority raise old counter" value in the EMIF controller to suit the application/system requirements. This counter value is given in terms of the number of bus word transfers completed before the external memory controller will elevate the priority of the oldest command in the command FIFO. It can be thought of as a "timeout" mechanism for the command re-ordering scheme. The setting can be adjusted to ensure that critical accesses to external memory are not blocked for a prolonged time period.
OMAP-L137/c6747'/c6745/AM17x': 
EMIFB BPRIO register (offset 0x20), PRIO_RAISE value. 
Bus Word: 4 bytes (32 bit bus internally connecting to EMIFB controller)

OMAP-L138/c6748/c6746/c6742/AM18x: 
mDDR/DDR2 PBBPR register (offset 0x20), PR_OLD_COUNT value. 
Bus Word: 8 bytes (64 bit bus internally connecting to mDDR/DDR2 controller 
  • The setting should be lowered from the default value of 0xFF (256 bus words) for applications that have more than one master accessing external memory (via the EMIFB or mDDR/DDR2 controller).
  • The priority raise old counter should be used as a system tuning parameter. Its optimal value is dependent on factors such as system traffic, number of masters accessing the external memory, etc. Further, the value should be determined and/or verified by system stress testing to ensure that critical masters are not vulnerable to bandwidth starvation when accessing external memory. A typical value range between 0x20 and 0x30 has been used extensively and has been observed to work for a wide range of applications. A value of 0 would imply that all of the commands destined for EMIF are serviced in the order in which they are received -- this is not recommended because it completely disables the EMIF's ability to perform command optimization


EMIF Arbitration/Command Re-Ordering Scheme Details (Advanced Reading)

Stage 1 (for command from each master):

  • For each master, the oldest command is selected.
  • For the same master, if the reads are to a different block (2048 bytes) then the writes or if reads are equal or higher priority then the write, the the read commands are selected (even if write command was the oldest)

Stage 2 (for all commands in the command FIFO):

  • Among selected read, read from already opened page/row is selected.
  • Among selected write, writes to already opened page/row is selected.

Stage 3

  • If there are several read commands, highest priority “oldest” read is selected.
  • If there are several write commands, highest priority “oldest” write is selected.

Stage 4

  • If Read Data-FIFO has space, read command is executed first.



c674x DSP Related Optimizations

The following sub-sections are specific to the c674x megamodule within the OMAPL1x/c674x device family. These are primarily relevant to data transfers and accesses to L1/L2 memories internal to the c674x DSP (from components within the megamodule or from masters outside the megamodule) and highlight some c674x megamodule specific features and optimization guidelines.


EDMA3 vs IDMA vs c674x CPU

Internal Direct Memory Access (IDMA) Controller

The internal direct memory access (IDMA) controller is used to perform fast-block transfers between any two memories local to the C674x+ DSP. Local memories include Level 1 program (L1P), Level 1 data (L1D), and Level 2 (L2) memories. The IDMA is optimized for rapid burst transfers of memory blocks (contiguous data). The intent of the IDMA is to relieve the C674x+ DSP of on-chip memory (to/from L1D/L2) data movement tasks. For more details on the IDMA controller, see TMS320C674x DSP Megamodule Reference Guide

NOTE: IDMA cannot be used for transfers to/from system memory external to the DSP megamodule including Shared RAM, ARM RAM or external memory.
Enhanced Direct Memory Access (EDMA3) Controller

The EDMA3 is the device/system DMA controller. Its primary purpose is to service user programmed data transfers between internal (DSP L1/L2, Shared RAM) and external (SDRAM, DDR2/mDDR, or flash memory) or memory-mapped peripherals (like serial ports, MMC/SD etc). Apart from linear transfers, it also allows enables advanced features such as linking, chaining, 2-D transfers, etc.

Things to note

The following are a couple of points to keep in mind when choosing EDMA, c674x CPU, or IDMA for data transfers:

  • The IDMA will have a better cycle/word performance when compared to the EDMA for on-chip memory (to/from L1D/L2) transfers. Because the IDMA is local to the memories, it operates at a higher clock and uses a wider bus width.
  • For certain on-chip memory (L1D/L2 to/from L2/L1D) transfer scenarios, it is possible that the IDMA and CPU will have nearly identical cycle/word efficiency. However, offloading the tasks of data transfers to the IDMA will allow more efficient usage of CPU bandwidth to perform other critical tasks.

In summary, when the geometry is fairly simple (i.e., 1-D transfers) and performance is the biggest care-about, the IDMA makes the most sense. On the other hand, when extra flexibility and features (e.g., linking, chaining, 2-D transfers) are desired over performance, the EDMA should be used. Note that competing accesses to these memories (by multiple masters) will degrade the IDMA performance.


Bandwidth Management Module

The bandwidth management module (BWM) in the c674x megamodule is responsible for arbitrating requests from multiple requestors into DSP L1/L2 memories. This allows better overall bandwidth allocation of the DSP memoreis as well ensure that some requestors do not block the resources/memories for extended period of time. The potential requestors for c674x megamodule memories are c674x CPU initiated transfers ( data/program accesses), c674x cache controller accesses, IDMA and accessess initiated from outside the DSP via the DSP SDMA port ( system DMA, master peripherals like EMAC, UHPI etc). The BWM has programmable priority control and conflict counters that be tuned based on the system tuning.

In general the default values for priorities and conflict counters (maxwait period) do not need to be changed and when dealing with chip/system level bottlenecks and resource issues, it is recommended to focus on other system level optimization knobs mentioned in this page.

For additional details on the BWM module, please see the TMS320C674x DSP Megamodule guide.