Tuning Audio Latency on C6747
From Texas Instruments Embedded Processors Wiki
Contents |
Introduction
Latency is an important consideration in any DSP audio system. This article describes techniques for tuning your system to balance the competing requirements of responsiveness and general processing efficiency. We'll approach the problem from three starting points:
- An example application that comes with the C6747 PSP driver package
- An audio application generated by the C6Flo graphical development tool
- A low-level audio application that controls the audio and EDMA devices without any intermediate driver
PSP Audio Example Application
For more information on how to download and install the PSP drivers, please refer to the C6747 getting started guide. You should be able to find the PSP audio example application in a path similar to this:
C:\OMAPL137_dsp_1_XX_XX_XX\pspdrivers_01_XX_XX_XX\packages\ti\pspiom\examples\evm6747\audio\build\audioSample.pjt
All of the changes we'll make are in a single source file, audioSample_io.c. Take a moment to familiarize yourself with the contents of this file. The only processing done by this application is a simple memcpy call, which could easily be replaced by "real" processing between the input and output buffers.
Cutting Out Extra Audio Buffers
Before we do anything else, make sure that the macro NUM_BUFS is set to 2.
#define NUM_BUFS 2 /* Num Bufs to be issued and reclaimed */The default value, 4, adds a lot of extra latency without any real benefit. Using 2 buffers is the classic "ping pong" arrangement, where one buffer is filled/consumed by the audio peripheral while the second is being processed. In this application, the audio driver and EDMA handle the first buffer without any CPU loading.
Changing the Audio Buffer Size
The most basic tradeoff between latency and performance is the selection of an audio buffer size. In this application, the buffer length is controlled by another macro, BUFLEN.
#define BUFLEN 2560 /* number of samples in the frame */ #define BUFALIGN 128 /* alignment of buffer for L2 cache */
Note: Do not change BUFALIGN! 128-byte alignment is required by the audio driver for any buffer size.
Since we're operating at a sampling frequency of 44.1 kHz, the default buffer length works out to over 58 milliseconds. This is a pretty long time in audio terms. Our overall latency can be described with a simple formula:
L = 2 * T + d
Where L is our audio latency, T<tt> is our buffer period, and <tt>d is some additional delay introduced by the audio hardware itself (codec chip, etc.). Assuming zero hardware delay (unlikely), our large audio buffers mean we'll see lag of at least 116 ms. This is very noticeable, even for basic applications. The following table shows the actual audio latency you can achieve with this application simply by reducing the buffer length to smaller values.
| Buffer Length | Latency (ms) | CPU Load |
|---|---|---|
| 512 | 25.8 | 1.86% |
| 256 | 14.2 | 2.46% |
| 128 | 8.24 | 4.03% |
| 64 | 5.4 | 8.66% |
| 32 | 3.88 | 13.3% |
| 16 | 3.2 | 25.3% |
Note that the smallest buffer size listed is 16 samples. This is the smallest buffer size allowed by our audio driver. Also, note that the table also includes the CPU load required just to maintain the audio stream for these buffer sizes. The smaller your buffers become, the more often the DSP needs to call the SIO APIs that keep the buffers loaded.
Disabling the McASP Hardware FIFO
There's one additional change we can make to reduce our latency. The audio data enters our application via the McASP peripheral. It collects in a FIFO before being copied to normal memory via EDMA. If we disable the FIFO, audio samples will be copied directly to memory instead. This can slightly improve our latency as the cost of smaller, more frequent EDMA transactions. You can disable the FIFO usage very easily by changing one element in each of the McASP input/output channel parameter structs.
Mcasp_ChanParams mcasp_chanparam[Audio_NUM_CHANS]= { { 0x0001, {Mcasp_SerializerNum_0, }, (Mcasp_HwSetupData *)&mcaspRcvSetup, TRUE, Mcasp_OpMode_TDM, Mcasp_WordLength_32, NULL, 0, NULL, NULL, 1, Mcasp_BufferFormat_INTERLEAVED, FALSE, /* McASP FIFO enable */ TRUE }, { 0x0001, {Mcasp_SerializerNum_5,}, (Mcasp_HwSetupData *)&mcaspXmtSetup, TRUE, Mcasp_OpMode_TDM, Mcasp_WordLength_32, NULL, 0, NULL, NULL, 1, Mcasp_BufferFormat_INTERLEAVED, FALSE, /* McASP FIFO enable */ TRUE } };
With this change, we can achieve the latency listed in the following table. (Note that CPU load is not affected by this change; only EDMA load.)
| Buffer Length | Latency (ms) | CPU Load |
|---|---|---|
| 512 | 24.2 | 1.86% |
| 256 | 12.6 | 2.46% |
| 128 | 6.78 | 4.03% |
| 64 | 3.88 | 8.66% |
| 32 | 2.40 | 13.3% |
| 16 | 1.72 | 25.3% |
C6Flo Audio Application
For more information on installing and using the C6Flo tool, please refer to the C6Flo main page. For this article, we'll start with the C6747 audio filter example application and cut out the processing blocks between the audio input and audio output blocks. The diagram should look like this before you generate your application code:
Note: You may want to "Save As..." before generating code to avoid overwriting the original diagram
We'll be making changes to two source files: c6747_audio_app_blocks.c and c6747_audio_app_threads.c.
Cutting Out Extra Audio Buffers
Similar to the PSP example application, our C6Flo application starts out using 4 buffers per audio driver handle (input and output). We'll need to change that to 2 buffers to get good audio latency in our system. Look for the following 6 lines in c6747_audio_app_blocks.c and change the number 4 to 2 in one:
int ti_c6flo_evmc6747_audioin_v1_create(ti_c6flo_evmc6747_audioin_v1_hdl blockp) { // ... sio_attrs.nbufs = 2; // was 4 // ... for (i = 0; i < 2; i++) // was 4 // ... for (i = 0; i < 2; i++) // was 4 // ... } int ti_c6flo_evmc6747_audioout_v1_create(ti_c6flo_evmc6747_audioout_v1_hdl blockp) { // ... sio_attrs.nbufs = 2; // was 4 // ... for (i = 0; i < 2; i++) // was 4 // ... for (i = 0; i < 2; i++) // was 4 // ... }
Changing the Audio Buffer Size
Changing buffer sizes in a C6Flo application can be done in the GUI by adjusting the buffer_size and buffer_length parameters. However, if we change the parameters and re-generate our application code, we'll overwrite the changes we just made to cut out our extra audio buffers. Fortunately, it's also easy to change these parameters in our C application code. Look for the following lines in c6747_audio_app_threads.c:
// Thread parameter structs C6Flo_std_thread_obj thread0_obj = { /* buffer size (bytes) = */ 1024, /* buffer length (elements) = */ 256, /* buffer alignment (bytes) = */ 128, /* thread index = */ 0 };
Note that buffer_size is equal to four times buffer_length because we're representing our data as single-precision floating point (i.e. 4 bytes per sample). When changing one value, be sure to change the other so that they maintain this ratio.
This application uses a slightly different McASP configuration and does a little more work between audio input and audio output, so our latency and CPU loading looks a little bit different than it did for the PSP example application. The following table summarizes the performance of our C6Flo application for different buffer sizes.
| Buffer Length | Latency (ms) | CPU Load |
|---|---|---|
| 512 | 28.2 | 2.82% |
| 256 | 16.6 | 4.24% |
| 128 | 10.8 | 6.56% |
| 64 | 7.16 | 11.2% |
| 32 | 4.34 | 19.4% |
| 16 | 2.98 | 36.9% |
Priming the Audio Driver
Due to our McASP configuration, we can't just turn off the FIFO for this application. However, there is one more trick we can use to lower our latency. The C6Flo application code as generated follows a somewhat convoluted process to "prime" the audio input and output buffers at the start of the application:
- Allocate audio input buffers
- Create audio input driver handle
- Prime audio input driver handle
- Allocate audio output buffers
- Create audio output driver handle
- Prime audio output driver handle
The separation between steps 3 and 6 introduces unnecessary latency to our application, so it's in our best interest to move them closer together. Fortunately, there's a pretty easy way to do this thanks to the structure of our application. All C6Flo-generated applications begin by calling "create" functions for each block, followed by "init" functions for each block. Currently, steps 1-6 take place in the create functions. We can move steps 3 and 6 to the init functions with a little work (and without breaking anything). Here's what you should end up with when you're done:
int ti_c6flo_evmc6747_audioin_v1_create(ti_c6flo_evmc6747_audioin_v1_hdl blockp) { SIO_Attrs sio_attrs; int size, count, align, status, i; // get max buffer length, alignment count = blockp->std.thread->buffer_length; align = blockp->std.thread->buffer_align; // internal buffers must be big enough to hold pairs of 16-bit value (i.e. count * 2 * 2 bytes) size = count << 2; // initialize audio driver audio_evm_init(); // create driver handle sio_attrs = SIO_ATTRS; sio_attrs.nbufs = 2; sio_attrs.align = align; sio_attrs.model = SIO_ISSUERECLAIM; blockp->stream_hdl = SIO_create("/dioAudioIN", SIO_INPUT, size, &sio_attrs); if (blockp->stream_hdl == NULL) { LOG_printf(&trace,"[audioin]: could not create driver handle"); return C6Flo_EGENERIC; } // allocate internal buffers for (i = 0; i < 2; i++) { C6Flo_MEM_alloc(&(blockp->buffers[i]), C6Flo_MEM_NORMAL, C6Flo_MEM_PERSIST, size, align, &blockp->std); if (blockp->buffers[i] == NULL) { LOG_printf(&trace,"[audioin]: buffer allocation error"); return C6Flo_EALLOC; } } return C6Flo_EOK; } int ti_c6flo_evmc6747_audioout_v1_create(ti_c6flo_evmc6747_audioout_v1_hdl blockp) { SIO_Attrs sio_attrs; int size, count, align, status, i; // get max buffer length, alignment count = blockp->std.thread->buffer_length; align = blockp->std.thread->buffer_align; // internal buffers must be big enough to hold pairs of 16-bit value (i.e. count * 2 * 2 bytes) size = count << 2; // initialize audio driver audio_evm_init(); // create driver handle sio_attrs = SIO_ATTRS; sio_attrs.nbufs = 2; sio_attrs.align = align; sio_attrs.model = SIO_ISSUERECLAIM; blockp->stream_hdl = SIO_create("/dioAudioOUT", SIO_OUTPUT, size, &sio_attrs); if (blockp->stream_hdl == NULL) { LOG_printf(&trace,"[audioout]: could not create driver handle"); return C6Flo_EGENERIC; } // allocate (and clear) internal buffers for (i = 0; i < 2; i++) { C6Flo_MEM_alloc(&(blockp->buffers[i]), C6Flo_MEM_NORMAL, C6Flo_MEM_PERSIST, size, align, &blockp->std); memset(blockp->buffers[i], 0, size); if (blockp->buffers[i] == NULL) { LOG_printf(&trace,"[audioout]: buffer allocation error"); return C6Flo_EALLOC; } } return C6Flo_EOK; } int ti_c6flo_evmc6747_audioin_v1_init(ti_c6flo_evmc6747_audioin_v1_hdl blockp) { int size, count, status, i; count = blockp->std.thread->buffer_length; size = count << 2; // prime driver (issue internal buffers) for (i = 0; i < 2; i++) { status = SIO_issue(blockp->stream_hdl, blockp->buffers[i], size, NULL); if (status != SYS_OK) { LOG_printf(&trace,"[audioin]: buffer issue error (prime)"); return C6Flo_EALLOC; } } return C6Flo_EOK; } int ti_c6flo_evmc6747_audioout_v1_init(ti_c6flo_evmc6747_audioout_v1_hdl blockp) { int size, count, status, i; count = blockp->std.thread->buffer_length; size = count << 2; // prime driver (issue internal buffers) for (i = 0; i < 2; i++) { status = SIO_issue(blockp->stream_hdl, blockp->buffers[i], size, NULL); if (status != SYS_OK) { LOG_printf(&trace,"[audioout]: buffer issue error (prime)"); return C6Flo_EALLOC; } } return C6Flo_EOK; }
This change will improve your latency to the values listed in the following table without affecting your CPU load at all.
| Buffer Length | Latency (ms) | CPU Load |
|---|---|---|
| 512 | 24.9 | 2.82% |
| 256 | 13.3 | 4.24% |
| 128 | 7.52 | 6.56% |
| 64 | 4.56 | 11.2% |
| 32 | 3.12 | 19.4% |
| 16 | 2.40 | 36.9% |
Low-Level Audio Application
Unlike the previous two examples, this application is not part of (or generated by) any standard software release from Texas Instruments. To get started, download the application source code from the following URL:
This package contains a CCS3.3 project file (*.pjt), a DSP/BIOS 5 configuration file (*.tcf), and several C source and header files. The application is self-contained; you don't need any other libraries or software packages to build and run.
Note: The following data reflects version 3 of the application.
Changing the Audio Buffer Size
This application is much closer to achieving optimal performance as-is, so there's little we can do to improve it. However, like all audio applications, our overall latency and CPU load will depend on our audio buffer size. The buffer size can be adjusted in this application using a macro near the top of the audio.c source file:
#define SAMPLES_PER_BUF 128The following table lists latency and CPU load for several possible buffer lengths. Note that this application allows even smaller buffer sizes than the previous examples. The minimum sample count in those applications was set by the PSP driver. Since this application does not use the PSP driver, it does not share that limitation.
| Buffer Length | Latency (ms) | CPU Load |
|---|---|---|
| 512 | 22.2 | 0.82% |
| 256 | 11.5 | 0.91% |
| 128 | 6.20 | 1.07% |
| 64 | 3.52 | 1.40% |
| 32 | 2.20 | 2.05% |
| 16 | 1.52 | 3.36% |
| 8 | 1.20 | 5.88% |
Comments
Comments on Tuning Audio Latency on C6747


Another improvement in order to decrease the audio latency could be the increase of the default AIC sample frequency (44.1 kHz). If we set the AIC to its maximum rate of 96 kHz we'll obtain a Latency of 11ms for a 512 Buffer Length.
--Gschelotto 07:03, 26 January 2012 (CST)