-mf compiler option
From Texas Instruments Embedded Processors Wiki
This discussion applies to the C6000 compiler.
Compiler options are generally documented in the Optimizing Compiler User's Guide (SPRU187)
With the release of code generation tools 6.1.0 via update advisor, a new compiler option -mf was introduced.
It is not documented in SPRU187o or the online help at present (please check for newer revisions, which may have been published after the creation of this article).
However, as it is prominently visible on the "Basic" Category of the project build options in Code Composer Studio, this article summarizes the available information about the new option.
-mf[=0-5] Optimize for speed (Default:4)
How -ms (--opt_for_space) maps to -mf (--opt_for_speed)
The -ms and -mf options roughly map to each other (in the reverse direction). Here is the mapping:
- -mf0 is equivalent to -ms3
- -mf1 is equivalent to -ms2
- -mf2 is equivalent to -ms1
- -mf3 is equivalent to -ms0
- -mf4 is equivalent to "no -ms specified on the command line"
- -mf5 is a new level that does not exist on the -ms scale. (As of late 2011, there are not many extra optimizations activated at the –mf5 level in 6.1.x, 7.2.x or 7.3.x. The code generation tools team may add new optimizations at -mf5 in the future.)
-ms and -mf are not meant to be used together. Using them together will only cause the last one specified to be honored. –mf does not have any enabling mechanism (i.e. another option which would have to be used to activate it) other than the use of the option.
The -mf option is not available on versions 6.0.x or before.
At the present there are no plans to have -ms# deprecated.
-mf scale
The advantage of the -mf# scale is that it is more straightforward with a clear risk/reward profile. The -mf scale is roughly split at the mid point with -mf[0-2] representing optimize for code size and -mf[3-5] representing optimize for performance/speed. The default use model sets -mf4. Remember -mf is completely independent of the -o flag.
- -mf0 represents optimize for code size with a HIGH RISK that performance/speed is very adversely impacted.
- -mf1 represents optimize for code size with a MEDIUM RISK that performance/speed is adversely impacted.
- -mf2 represents optimize for code size with a LOW RISK that performance/speed is adversely impacted.
- -mf3 represents optimize for performance/speed with a LOW RISK that code size is very adversely impacted.
- -mf4 represents optimize for performance/speed with a MEDIUM RISK that code size is adversely impacted.
- -mf5 represents optimize for performance/speed with a HIGH RISK that code size is adversely impacted.
So, using this scale users can specify exactly what level of risk they are willing to accept. Based on the option specified, the compiler will recognize the risk profile selected and activate the appropriate set of optimizations. Note that speed above DOES NOT represent compile speed but rather the applications overall CPU cycle performance.
Due to cache constraints, it may be advisable in some cases to reduce code size in favor of better memory access times and thus overall application performance. For this reason, there is a general recommendation to use -ms0 (or -mf3) together with -o3.
How does the use of a -mf switch affect my code?
Most compiler optimizations on C6000 involve the tradeoff of code size and performance.
For all C6000 devices, the following optimizations are varied according to the -mf setting:
- Loop unrolling: The amount, location, and type of loop unrolling is varied
- Procedure inlining: The amount and location of inlining is varied
- Software pipelining: Software pipelining is turned off at -mf0 and -mf1 (except if the loop buffer/SPLOOP can be used--applies to C6400+/C674x/C66xx only)
- Cloning of certain loops to facilitate generation of a safe software pipelined loop
- Speculation of instructions out of blocks to improve if-conversion
- Interblock instruction scheduling: Where and how performed (This optimization may move and copy instructions from one "block" to another to increase performance.)
- Collapsing of prolog and epilog stages of a software pipelined loop (collapsing that could decrease performance is allowed at -mf3 and lower)
- Other items that have less of an impact on the code size/performance tradeoff
For C6400+/C674x/C66xx devices, the compiler further varies the following according to the -mf setting:
- Register allocation: Select operands of "compressible" instructions will be biased or restricted to certain registers to help compressible instructions become 16-bit.
- Frame code: When code size is more important, different instructions will be used to take advantage of 16-bit instructions. RTS functions may be called to save/restore save-on-entry registers and increment/decrement the stack pointer.
- CALLP: When not compiling for all-out performance, the CALLP instruction will be used exclusively for (near) function calls.
- Instruction selection
- Certain compressible instructions will only be allowed to be placed on certain units; to aid compression
- If-conversion: If and where if-conversion is performed
- Some very specific instruction transformations that involve small sequences of very specific instructions
Note that these lists are incomplete, but they cover most of the major things that change based on the -mf setting.
