OMAP-L1x/C674x/AM1x Multichannel Audio Serial Port (McASP) Throughput and Optimization Techniques
From Texas Instruments Embedded Processors Wiki
^ Up to main OMAP-L1x/C674x/AM1x SOC Architecture and Throughput Overview Table of Contents
This article was created to present throughput measurements and optimization techniques for the the Multichannel Audio Serial Port (McASP) of OMAP-L1x/C674x/AM1xx devices. Different variables were explored to assess their impact on McASP performance. Finally, the maximum throughput that can be achieved by the McASP are also presented.
The information in this article deals mainly with OMAP-L1x8/C674m/AM18xx devices (where m is an even number). However the data can also be extended to OMAP-L1x7/C674n/AM17xx devices (where n is an odd number) given that both sets of devices share a similar SOC architecture.
Contents |
McASP Basics
The following sections provide a high-level overview of the McASP. Detailed information on the McASP can be obtained from the TMS320C674x/OMAP-L1x Multichannel Audio Serial Port (McASP) User's Guide (SPRUFM1).
Features
The McASP is a general-purpose serial port optimized for multichannel audio applications. It supports time-division multiplexed (TDM) stream, Inter-IC Sound (I2S) protocols, direct connection to analog to digital converters (ADC) and digital to analog converters (DAC). The McASP includes independent transmit and receive sections with separate master clocks, bit clocks, and frame syncs. Up to 16 serializers that can be individually enabled for either transmit or receive operation; the number of serializers supported on each McASP varies for device to device (see your device data sheet). Each McASP also includes a 256-byte Read FIFO and Write FIFO.
McASP Block Diagram
Data Flow
The McASP generates transmit/receive DMA event (AXEVT/AREVT) requests when it needs data to be tranferred to/from its serializer registers. The Read and Write FIFOs directly service these data transfer requests. When a specific amount of data is available or needed in the FIFOs, the FIFOs generate a transmit/receive DMA event (AXEVT/AREVT). The EDMA must service McASP FIFOs when AXEVT/AREVT are generated. The amount of data requested by the FIFOs depends on number of serializers being used and FIFO configuration (see below).
FIFO Configuration
The McASP Write and Read FIFOs temporarily hold serializer data.
- 256 bytes = up to four 32-bit words per serializer in the case of 16 active serializers
- 256 bytes = up to 64 32-bit words in the case of one active serializer
The WNUMDMA and WNUMEVT parameters are used to configure the Write FIFO. Similarly RNUMDMA and RNUMEVT configure the Read FIFO.
WNUMDMA specifies the write word count per McASP transfer. This value must equal the number of McASP serializers used as transmitters. Upon a transmit DMA event from the McASP, WNUMDMA words are transferred from the Write FIFO to the McASP. WNUMEVT specifies the write word count per DMA event. This value is a non-zero integer multiple of the number of serializers enabled as transmitters. When the Write FIFO has space for at least WNUMEVT words of data, then an AXEVT (transmit DMA event) is generated by FIFO to the host/DMA controller. RNUMEVT and RNUMDMA are receive/read versions of the two parameters just described.
McASP FIFO Use
When FIFO is disabled, EDMA writes data directly to serializers; WNUMDMA, WNUMEVT, RNUMDMA, and RNUMEVT are ignored.
The “TX/RX EVT DMA RATIO ” is the multiplication factor between W/RNUMEVT and W/RNUMDMA.
- WNUMEVT = TX EVT DMA RATIO * WNUMDMA
- RNUMEVT = RX EVT DMA RATIO * RNUMDMA
TX/RX EVT DMA RATIO affects how often DMA requests are generated by the McASP. If TX EVT DMA RATIO = 1, then FIFO will trigger an DMA request on every transmit event.
However, if TX EVT DMA RATIO = 4, then FIFO will trigger an DMA request after four transmit events. Reducing number of DMA requests makes it easier to meet real-time deadlines.
Clocking
McASP serial clock (ACLKR & ACLKX) can be generated internally or externally. In the case of internal clock, ACLKR & ACLKX are generated from high-frequency clock (AHCLKX and AHCLKR) and clock dividers (CLKXDIV and CLKRDIV). The high-frequency clock generated externally on AHCLKX and AHCLKR pins or internally from AUXCLK and clock dividers (HCLKXDIV and HCLKRDIV). In the case of external clock, ACLKR & ACLKX are sourced directly on McASP pins.
McASP Transmit Clock Generator Block Diagram
McASP Receive Clock Generator Block Diagram
McASP supports independent transmit and receive clock zones. However, McASP receiver can also be configured to operate synchronously to the transmitter clock and frame signals (ACLKX and AFSX).
McASP Throughput Characterization
A vast amount of throughput data was collected on the McASP. Several knobs (or variables) were turned to get a full understanding of the McASP throughput performance under different scenarios.
One imporant variable that was considered during these throughput measurements was the impact of other master activity on the McASP throughput performance. To simulate activity generated by another master(s), a dummy EDMA continuous transfer was setup to compete for access to external memory. Several aspects of this backgound activity were also tweaked.
| Test Variable | Options |
|---|---|
| DSP/ARM Frequency | 300, 200, and 100 MHz |
| Pass/Fail Criteria | EDMA is not able to complete the transfer of data from McASP or FIFO to the destination buffers |
| McASP Data Location | L2, Shared RAM, DDR2 (132 MHz) |
| Test Parameters |
|
Notes:
- The TX/RX EVT DMA RATIO value shown in data slides is the minimum value for which the test always passes for a given set of test parameters. A lower threshold value will cause the test to fail.
- All data collected using low-level software; BIOS and Linux were not used.
Test Environment
The following list describes the test environment under which the McASP throughput data was collected:
- All data was collected with the DSP & ARM running at three different frequencies: 300, 200, and 100MHz.
- L2, Shared RAM, and DDR2 memory were used to hold the source (SRC) and destination (DST) data buffers.
- No drivers or high-level operating systems (BIOS, Linux, etc.) were used for this testing. All data was collected using low-level software.
Summary of McASP Throughput Data
The subsections given below present a summary of a large amount of data collected by varying the parameters listed in the Throughput Data Test Variables table . To see the specific impact of these test parameters on McASP performance, refer to the back up slides given in the presentations at the end of this wiki article.
NOTE: "Max throughput" is on a per serializer basis.
Standalone Transfers, No EDMA Background Activity
| Test Scenario | Max Throughput (Mbps) | SRC and DST Buffers |
| Standalone Test Element size=32 bits FIFO enabled | 49 49 49 | L2 with TX/RX EVT DMA ratio = 1 L3 with TX/RX EVT DMA ratio = 1 DDR2 with TX/RX EVT DMA ratio = 1 |
| Standalone Test Element size=16 bits FIFO enabled | 32 32 32 | L2 with TX/RX EVT DMA ratio = 2 L3 with TX/RX EVT DMA ratio = 2 DDR2 with TX/RX EVT DMA ratio = 3 |
| Standalone Test Element size=8 bits FIFO enabled | 16 16 16 | L2 with TX/RX EVT DMA ratio = 2 L3 with TX/RX EVT DMA ratio = 2 DDR2 with TX/RX EVT DMA ratio = 2 |
| Test Scenario | Max Throughput (Mbps) | SRC and DST Buffers |
| Standalone Test Element size=32 bits FIFO disabled | 21.5 21.5 21.5 | L2 L3 DDR2 |
| Standalone Test Element size=32 bits FIFO enabled | 32 32 32 | L2 with TX/RX EVT DMA ratio = 2 L3 with TX/RX EVT DMA ratio = 2 DDR2 with TX/RX EVT DMA ratio = 2 |
| Standalone Test Element size=16 bits FIFO disabled | 10.66 10.66 10.66 | L2 L3 DDR2 |
| Standalone Test Element size=16 bits FIFO enabled | 16 16 16 | L2 with TX/RX EVT DMA ratio = 2 L3 with TX/RX EVT DMA ratio = 2 DDR2 with TX/RX EVT DMA ratio = 2 |
| Standalone Test Element size=8 bits FIFO disabled | 4.8 4.8 4.8 | L2 L3 DDR2 |
| Standalone Test Element size=8 bits FIFO enabled | 8 8 8 | L2 with TX/RX EVT DMA ratio = 2 L3 with TX/RX EVT DMA ratio = 2 DDR2 with TX/RX EVT DMA ratio = 2 |
Competing Traffic, EDMA Background Activity Using Different TC
Summary of McASP Concurrent Data, 4 TX/RX Serializers, Different TC Used for Background Activity
NOTE: The above all test scenarios are with the background activity with continues data transfer of 4K bytes.
Summary of McASP Concurrent Data, 8 TX/RX Serializers, Different TC Used for Background Activity
NOTE: The above all test scenarios are with the background activity with continues data transfer of 4K bytes.
Competing Traffic, EDMA Background Activity Using Same TC
Summary of McASP Concurrent Data, 4 TX/RX Serializers, Same TC Used for Background Activity
NOTE: The above all test scenarios are with the background activity with continues data transfer of 4K bytes.
Summary of McASP Concurrent Data, 8 TX/RX Serializers, Same TC Used for Background Activity
NOTE: The above all test scenarios are with the background activity with continues data transfer of 4K bytes.
Factors Affecting McASP Throughput
The following table describes the main factors affecting McASP throughput and general recommendations for handling those factors.
| Factor | Impact | General Recommendation |
| SRC/DST Buffer Location | Different memories have different access latencies. The longer the access latency, the lower the sustainable McASP throughput. | Internal memory (L2 and Shared RAM) has lower access latencies than external memory (DDR). In general, try to keep McASP SRC/DST buffers in internal memory to meet McASP real-time requirements. |
| EDMA Queue/TC assignment for McASP | Assigning McASP transmit and receive events to the same Queue/TC might add delay in servicing individual events. | Assign McASP transmit and receive EDMA events to the same queue/TC during general usage to save EDMA queue/TC resources as the performance impact is not significant. |
| McASP FIFO Use | Using FIFO increases McASP data rate and decreases real-time burden on EDMA. | Use McASP FIFO whenever possible.The ratio between W/RNUMEVT and W/RNUMDA should be increased as much as possible to take full advantage of the FIFOs and in turn increase the performance of the McASP. |
| Large, parallel data transfer on same EDMA Queue/TC | Assigning large paging transfers on same EDMA Queue/TC as McASP transfers impacts McASP performance. | Move large paging transfers with non real-time requirements to different queue/TC and reduce the priority of that TC relative to the TC used for McASP transfers. |
McASP Throughput Presentation
The following presentation summarizes the results of all throughput measurements conducted on OMAP-L1x8/C674x/AM18xx class of devices.
Omapl1x8_c674x_am18xx_mcaspThroughput.zip
Omapl1x8_c674x_am18xx_mcaspThroughput_backupSlides_Part1.zip
Omapl1x8_c674x_am18xx_mcaspThroughput_backupSlides_Part2a.zip
Omapl1x8_c674x_am18xx_mcaspThroughput_backupSlides_Part2b.zip
Omapl1x8_c674x_am18xx_mcaspThroughput_backupSlides_Part3.zip
Comments
Comments on OMAP-L1x/C674x/AM1x Multichannel Audio Serial Port (McASP) Throughput and Optimization Techniques

Would it be possible to obtain the source code for these tests? Thanx.
--Ockie 07:57, 16 February 2011 (CST)