Please note as of Wednesday, August 15th, 2018 this wiki has been set to read only. If you are a TI Employee and require Edit ability please contact x0211426 from the company directory.

C6000 Compiler: Recommended Compiler Options

From Texas Instruments Wiki
Jump to: navigation, search

Introduction

This article discusses the best C/C++ compiler options to use for performance on the C6000, though there are occasional nods to compiling for smaller code size.

Definitely use the following:

  • –o

[ 2 | 3 ] . Optimization level. This option is critical for generating efficient code sequences. There are four levels of optimization: -o3, -o2, -o1, -o0. The main difference among these levels is how much of the source code is considered at once when making optimization decisions. At –o3, file-level optimization is performed. At –o2, function-level optimization is performed. At -o1, optimization is performed on blocks of C/C++ statements. At –o0, optimization is performed at the single statement level. Note that –o defaults to –o2.

By default, the –o switch optimizes for performance. This can increase code size. If code size is an issue, do not reduce the level of optimization. Instead, use the –ms switch to affect the optimization goal (performance versus code size).

When safe, consider using the following:

  • –mt. Assume no pointer-based parameter writes to a memory location that is read by any other pointer-based parameter to the same function. This option is generally safe except for in place transforms (where modified data is written back to the same memory location from which it was initially read). Most users avoid in-place transforms for performance reasons.

For example, consider the following function:

selective_copy(int *input, int *output, int n)
{
    int i;
    for (i=0; i<n; i++)
       if (myglobal[i]) output[i] = input[i];
}

–mt is safe when the memory ranges pointed to by "input" and "output" do not overlap.

Be aware of the limitations of -mt. This option applies only to pointer-based function parameters.

  • It says nothing about the relationship between parameters and other pointers accessed in the function (for example, "myglobal" and "output").
  • It says nothing about non-parameter pointers used in the function.
  • It says nothing about pointers that are members of structures, even when the structures are parameters.
  • It says nothing about pointers that are dereferenced via multiple levels of indirection.

To overcome these limitations, use the Restrict Type Qualifier.

In addition, using –mt broadly acts upon the full scope of the code, either file or project wide. Use the Restrict Type Qualifier to handle this problem with more precision.

  • –mh

< num > . Permit compiler to fetch (but not store) array elements beyond either end of an array by < num > bytes. This option (known as speculative loads) provides the compiler with extra flexibility when scheduling loops. It can lead to better performance, especially for "while" loops. It can lead to smaller code size for both "while" loops and "for" loops. If –mh is used without < num > , there is no limit to the number of bytes read past either end of the arrays. The software-pipelined loop information in the compiler-generated assembly file notes when adding –mh < num > (or using –mh with a larger value) might improve performance or code size. For example, suppose a (function containing a) loop is compiled without –mh or with –mh < num > where < num > is less than 56. The compiler might output a message similar to:

;*      Minimum required memory pad : 0 bytes
;*
;*      For further improvement on this loop, try option -mh56

This message communicates that currently the compiler is fetching 0 bytes beyond the end (or beginning) of any array. However, if the loop is rebuilt with –mh < num > where < num > is at least 56, there might be better performance and/or smaller code size.

When using this option, ensure that there is a buffer of < num > bytes on both ends of all sections that contain array data. This is the user ’ s responsibility. Padding can be implemented by shrinking the associated memory region in the linker command file. For example, suppose the original memory region is defined as:

MEMORY {
           myregion: origin = 1000, length = 4000
       }

If the goal is to pad the beginning and end of a region by 56 bytes, the region must be shrunk by 2 * 56. The new origin is 1000 + 56 and the new length is 4000-2*56 = 3888:

MEMORY {
	   /* pad (reserved):  origin = 1000, length = 56    */
           myregion:           origin = 1056, length = 3888
           /* pad (reserved):  origin = 3944, length = 56    */
       }

The comments provide a reminder not to put other array data in those regions. Alternatively, one can use other memory areas (code or independent data) as pad regions, provided there is no conflict with EDMA transfers and/or cache-based operations.

If the source code contains many functions that are never executed, consider using:

  • –mo. Place each function in its own input (sub-)section. Normally, input sections contain multiple functions. By default, all code (that is, all processor instructions) is placed into an input section called ".text". The linker then groups input sections into output sections (various ranges of memory) as defined by the linker command file. If all functions in an input section are never executed (that is, if an input section contains only dead code), then the input section is omitted from the executable.

When using the –mo option, each function is put into its own sub-section; for example, the function dotp() is put into section ".text:_dotp". Because there is only one function per input section, the linker can be more aggressive with respect to the removal of functions that are never executed. This can reduce the memory footprint of the resulting executable and, hence, reduce memory cycles as well.

This benefit comes at a cost. On the C6000, each input section must be aligned on a 32-byte boundary. The more input sections there are, the more space is potentially wasted on alignment. Hence, the benefit of this option depends on the percentage of dead source code and the original grouping of functions into input sections. Thus, this option may improve the performance and/or code size of some applications while hurting others. However, the differences can be significant, so give it a try.

If code size is a concern, consider using the following:

  • –ms

[ 0-3 ] . Adjust optimization goal. Higher values of < num > increasingly favor code size over performance. Thus, the higher the value of < num > , the stronger the request. Recommended to be used in conjunction with –o2 or –o3. Try –ms0 or –ms1 with performance critical code. Consider –ms2 or –ms3 for seldom-executed code (such as initialization routines). Note that –ms defaults to –ms0.

In CGT >= 6.1 also consider the -mf[0-5] option which provides more granularity on the codesize .v. optimization tradeoff. See -mf_compiler_option for details.

Do not use the following:

  • –g. Support full symbolic debug. Great for debugging. Do not use in production code. –g inhibits code reordering across source line boundaries and limits optimizations around function boundaries. This results in less parallelism, more nops and generally less efficient schedules. Can cause a 30-50% performance degradation for control code, generally somewhat less but still significant degradation for performance critical code. Moreover, beginning with CCStudio 3.0 (C6000 compiler version 5.0), basic function-level profiling support is provided by default.
  • –gp. Provide support for function-level profiling. Obsolete. Provided by default.
  • –ss. Interlist source code into assembly file. As with –g, this option can negatively impact performance.
  • –ml3. Compile all calls as far calls. Obsolete. Beginning with CCStudio 3.0 (C6000 compiler version 5.0), the linker automatically fixes up near calls that do not reach by using trampolines. In most cases, few calls need trampolines, so removing –ml3 usually makes code a few percent smaller and faster. By default, scalar data (pointers, integers etc.) are near and aggregates (arrays, structs) are far. This default works well for most applications.
  • –mu. Turn off software-pipelining, which is a key optimization for achieving good performance on the C6000 processor. Great for debugging. Do not use in production code.

Options that reduce tuning time by providing additional analysis information, while having no effect on performance or code size:

  • –s –o

[ 2 | 3 ] . This option is incredibly helpful in understanding the compiler-generated assembly. Output a copy of what the source code looks like after high-level optimization. This output, known as optimizer comments, looks much like the original C/C ++ , except that all inlining, transformations and other optimizations from this phase have been applied. Optimizer comments are interlisted with assembly code in the assembly file. The assembly file name is the same as the C/C++ source file name with the extension changed to .asm.

  • –mw. Output extra information about software-pipelined loops. Like with -s, this information appears as comments in the compiler generated assembly file.
  • –on2 –o3. Create a .nfo file with the same base name as the .obj file. This file contains summary information regarding the high-level optimizations that have been applied, as well as advice.

Reference

This article is derived from the application note Hand Tuning Loops and Control Code on the TMS320C6000.



CN C6000 Compiler: Recommended Compiler Options