ARM compiler optimizations
From Texas Instruments Embedded Processors Wiki
Translate this page to
The TI ARM compiler has been optimized for use with our Cortex microcontrollers. Optimizations have been developed and tuned using a wide variety of customer benchmarks and code. Key characteristics of these benchmarks include:
- Dominated by control code (i.e. code consisting of mostly function calls and conditional branches with few loops)
- Auto-generated code
- Bitfield manipulations
- 16-bit arithmetic
- Single precision floating point operations
Key optimizations
- High level optimizations critical for auto generated code including common subexpression elimination, value propagation, and copy propagation.
- Removal of unneeded sign extension instructions when performing 16-bit arithmetic
- Using 16x16 multiplication instructions
- Using MOVW and MOVT instructions for loading literals to avoid flash memory latencies (enabled at -mf3 and higher)
- Utilizing the bitfield manipulation instructions on Cortex devices like UBFX, SBFX, and BFI
- Utilizing predicatable instructions to avoid branch latencies.
- Improve floating point performance by utilizing the VFP instruction set and providing a relaxed floating point mode (--fp_mode=relaxed) to improve performance at the expense of accuracy.
- Link time optimization (-o4), which allows the compiler to optimize across file boundaries. This allows many optimizations to increase their effectiveness, but some key opportunities for ARM are:
- Increased opportunity of inlining functions.
- Global variable grouping, which allows the compiler to reduce the number of variable address loads which can dramatically improve both code size and performance.
- Function specialization
