Codec Engine Profiling

From Texas Instruments Embedded Processors Wiki

Jump to: navigation, search
Translate this page to   

Contents

Introduction

System integrators have an acute need to profile the CPU activity in their applications. Especially in Arm + DSP environments it can be quite difficult to e.g. pinpoint the reason for a dropped frame. Many developers don't even have JTAG emulators and DSP IDE environments such as Code Composer Studio.

Codec Engine does however contain hooks to get key profiling data. Often this can be done from the command line i.e. without requiring a rebuild of either DSP or GPP code.

This topic attempts to collect all such Codec Engine profiling techniques under 1 roof.

We encourage contributions since this is an active area - everybody has their own profiling 'tricks'.

Profiling Techniques

Various techniques will be described here as sub-topics. For example understanding the time-deltas reported by CE DEBUG, pinpointing process() call duration, and analyzing cache impact.

Profiling DSP-side process() calls in an ARM+DSP environment

A typical question is How do I benchmark my codec process() call?

The easiest way is to simply leverage CE_DEBUG.

For example, on the DM6446EVM platform we ran the following command: -

CE_DEBUG=1 CE_DSP0TRACE="CV=5;GT_time=2" ./decode -v davincieffect_ntsc.mpeg4 -t 3 

Before we look at the output, let's understand the command details: -

Now let's look at a snapshot of the output trace: -

[DSP] @+003,829us: [+5 T:0x8fa52584] CV - VISA_enter(visa=0x8fa51fb0): algHandle = 0x8fa51fe0
[DSP] @+000,023us: [+5 T:0x8fa52584] CV - VISA_exit(visa=0x8fa51fb0): algHandle = 0x8fa51fe0
[DSP] @+003,508us: [+5 T:0x8fa52584] CV - VISA_enter(visa=0x8fa51fb0): algHandle = 0x8fa51fe0
[DSP] @+016,335us: [+5 T:0x8fa52584] CV - VISA_exit(visa=0x8fa51fb0): algHandle = 0x8fa51fe0

These VISA_enter/exit calls bracket the control() and process() calls of the algorithm interface. Control calls are typically short - hence in the above example it's clear that the 2nd VISA_exit call timestamp represents the enter -> process() -> exit duration.

So...is the benchmark time for the MPEG4 decode processing 16335 microseconds? Unfortunately there are a couple more steps to get the absolute time: -

Since Codec Engine uses a standard tracing format you could also script this calculation.

Instrumenting your own code w/ GT trace profiling

Let's say you want to get more detailed instrumentation. Or you just want to verify the numbers in the above section are accurate. How can you add your own benchmarking and make it appear in the CE_DEBUG trace log? Step 1 is to read up on Overriding stubs and skeletons and Printing in stubs and skeletons. The skeleton (DSP-side wrapper for VISA calls) is what we care about in this Arm+DSP profiling scenario.

// benchmarking the process() DSP call
t0 = CLK_gethtime();
 
VISA_enter((VISA_Handle)handle);
retVal = fxns->process(alg, inBufs, outBufs, inArgs, outArgs);
VISA_exit((VISA_Handle)handle);
 
t1 = CLK_gethtime();
cpuCycles = (LgUns)((t1 - t0) * CLK_cpuCyclesPerHtime());
/* calculate absolute time in milliseconds */
timeAbsolute = cpuCycles / GBL_getFrequency();
 
GT_1trace(CURTRACE, GT_5CLASS, "BENCHMARK> process() call : delta=%ld\n", (t1-t0));
GT_1trace(CURTRACE, GT_5CLASS, "BENCHMARK> process() call : abs time in msec=%d\n", timeAbsolute);

Again, to understand the details: -

To add this specific GT_1trace to the output you can do: -

CE_DEBUG=1 CE_DSP0TRACE="CV=5;ti.sdo.ce.video.VIDDEC=5;GT_time=2" ./decode -v davincieffect_ntsc.mpeg4 -t 2

The add-on to the CE_DSP0TRACE mask allows us to see the output of our additional GT_1trace instrumentation, resulting in: -

[DSP] @+004,348us: [+5 T:0x8fa52584] CV - VISA_enter(visa=0x8fa51fb0): algHandle = 0x8fa51fe0
[DSP] @+000,023us: [+5 T:0x8fa52584] CV - VISA_exit(visa=0x8fa51fb0): algHandle = 0x8fa51fe0
[DSP] @+003,426us: [+5 T:0x8fa52584] CV - VISA_enter(visa=0x8fa51fb0): algHandle = 0x8fa51fe0
[DSP] @+016,383us: [+5 T:0x8fa52584] CV - VISA_exit(visa=0x8fa51fb0): algHandle = 0x8fa51fe0
[DSP] @+000,086us: [+5 T:0x8fa52584] ti.sdo.ce.video.VIDDEC - BENCHMARK> process() call : delta=4236411
[DSP] @+000,001us: [+5 T:0x8fa52584] ti.sdo.ce.video.VIDDEC - BENCHMARK> process() call : abs time in msec=7

As per the calculations in the previous section you can see that the abs time in msec=7 matches up nicely (7040 microseconds above). Hence we've now validated the Codec Engine DSP tracing benchmarks and shown how to add arbitrary benchmark segments.

See Also

Leave a Comment
Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox