Please note as of Wednesday, August 15th, 2018 this wiki has been set to read only. If you are a TI Employee and require Edit ability please contact x0211426 from the company directory.

MCU Compiler v15

From Texas Instruments Wiki
Jump to: navigation, search

MCU CGT v15.12.0.LTS

Long Term Support (LTS) Release

The long term support release concludes the release version and incorporates all the new features added throughout the release. It will be supported for approximately two years with bug patches.


New features available in this release

DWARF4

This release introduces the option to use the DWARF 4 Debugging Format. DWARF 3 is still enabled by default, but DWARF 4 may be enabled by using --symdebug:dwarf_version=4. The RTS still uses DWARF 3. DWARF versions 2, 3, and 4 may be intermixed safely.

When DWARF 4 is enabled, type information will be placed in the new .debug_types section. At link time, duplicate type information will be removed. This method of type merging is superior to those used in DWARF 2 or 3 and will result in a smaller executable. In addition, the size of intermediate object files will be reduced in comparison to DWARF 3.

For more information, see: http://processors.wiki.ti.com/index.php/DWARF_4

Aggregate data in subsections

The compiler will now place all aggregate data (arrays, structs, and unions) into subsections. This gives the linker more granularity for removing unused data during the final link step. The behavior can be controlled using the --gen_data_subsections=on,off option. The default is on.

New Object File Display option to display stack usage information

The -cg option for the object file display utility prints function stack usage and callee information in XML format.

Additional boot hook functions

The following boot hooks are now provided to improve customization of the boot process.

  void _system_post_cinit(void); 

_system_post_cinit() is invoked during C/C++ environment setup, after C/C++ global data is initialized but before any C++ constructors are called. This hook function is supported on all MCU targets.

  void __mpu_init(void);

__mpu_init() provides an interface for initializing the MPU, if MPU support is included. It is invoked after the stack pointer is initialized, but before C/C++ environment setup is performed. This hook function was previously only supported on MSP, but is now also supported on ARM.

The RTS uses default versions of these functions which may be overridden with customized versions. Note that the TI-RTOS operating system uses custom versions of the boot hook functions for system setup. Therefore, care must be taken when overriding these functions for TI_RTOS projects.

Intrinsics for saturated addition/subtraction of signed short/long types (MSP430)

New intrinsics have been added for performing saturated addition and subtraction that use fewer instructions and cycles. When the result of the mathematical operation is smaller or larger than the return type, these intrinsics return the minimum or maximum value for the return type.

    short       __saturated_add_signed_short(short, short);
    long        __saturated_add_signed_long (long,  long);
    short       __saturated_sub_signed_short(short, short);
    long        __saturated_sub_signed_long (long,  long);


C2000 Unsigned Integer Division Intrinsics

The following C2000 intrinsic provides efficient support for signed by unsigned long modulus and division. This variant, better suited for control applications, varies from the C-language standard in that the quotient does not round towards 0 and the "remainder" is a true modulus calculation.

  long quotient = __euclidean_div_i32byu32(long numerator, unsigned long denominator, unsigned long &remainder);

The following intrinsics are also provided and map directly to hardware instructions <SUBCUL ACC, loc32> and its repeat version.

Repeat conditional long subtraction used in unsigned modulus division:

  unsigned long quotient = __rpt_subcul(unsigned long numerator, unsigned long denominator, unsigned long &remainder, int rpt_cnt);

Conditional long subraction instruction:

  unsigned long quotient = __subcul(unsigned long numerator, unsigned long denominator, unsigned long &remainder);


C2000 Byte Peripherals Support

The C2000 architecture has 16-bit words. Some peripherals, however, are 8-bit byte addressable. The byte peripherals bridge translates addresses between the CPU and the byte peripherals by treating the address as a byte address. Therefore, only some C2000 addresses map correctly to the byte peripherals. Even and odd addresses to 16-bit data will both map to the same data element on the byte peripheral. The same is true for addresses to 32-bit data. Thus, addresses for 16-bit accesses must be 32-bit aligned and those for 32-bit accesses must be 64-bit aligned.

Driver libraries and bitfield header files are provided to access periperhals. To support these libraries and enable customers to extend them or write their own, we introduce a source-level intrinsic and type attributes to access byte peripheral data.

The C2000 driver library accesses byte peripheral data at the correct starting address. However, on C2000, operations on 32-bit data are often broken up into 2 operations on 16-bit data because these are more efficient on the architecture. Accesses to 32-bit byte peripheral data cannot be broken up regularly into 2 16-bit accesses because the starting offset for the second 16-bits will be incorrect. Thus, the __byte_peripheral_32 intrinsic can be used to access a 32-bit byte peripherals data address, preventing these accesses from being broken up. The intrinsic returns a reference to an unsigned long and can be used both to read and write data.

  unsigned long &__byte_peripheral_32(unsigned long *x);

For bitfield support, 2 type attributes have been introduced that can be applied to typedefs of unsigned ints and unsigned longs. For example:

  typedef unsigned int bp_16 __attribute__((byte_peripheral)) ; 
  typedef unsigned long bp_32 __attribute__((byte_peripheral));

The typedef names do not matter. The attributes automatically apply the volatile key word and handle alignment. All struct members for byte peripheral structs, whether bitfields or not, must have these attributes applied via typedefs to ensure proper alignment of the struct accesses. Please note that struct layout will be different due to differences in alignment, so the bitfields cannot always be accessed via the same container types as in regular structs. See the Compiler Readme or User Guide for examples.

CLA-only Object File Compatibility (C2000)

CLA-only object files will now be compatible with any C2000 object files regardless of device support options for FPU and TMU. Previously, when CLA object files were compiled, any C2000 device support options limited linking compatibility even though code for those devices was not present in the CLA object.

GCC common symbol bug workaround

There is a bug in the GNU linker (Bugzilla 18614) where common symbols that are STB_LOCAL are not handled correctly. The GNU linker fails to allocate space for them. To work around this bug, the TI tools will no longer use common for unitialized local symbols. This should have very little impact on the final executable.

Available as of v15.9.0.STS

Improved support for placing/running functions in RAM

The ramfunc attribute can be used to specify that a function will be placed in and executed out of RAM. This allows the compiler to optimize functions for RAM execution, as well as to automatically copy functions to RAM on flash-based devices.

The attribute is applied to a function with GCC attribute syntax, as follows:

  __attribute__((ramfunc))
  void f(void) { ... }

The --ramfunc=on option can be used to indicate that the attribute should be applied to all functions in a translations unit, eliminating the need for source code modification.

For more information, see: http://processors.wiki.ti.com/index.php/Ramfunc_Attribute

Improved code generation of 32x32=>64 multiplies (C2000, MSP430)

For 32x32=>64-bit multiplies:

 long a, b;
 long long x = (long long)a * (long long)b; 

the Optimizer previously performed a transformation that prevented the code generator from matching native machine instructions. Therefore, when Optimization was enabled, library calls would be generated instead, resulting in worse performance. This issue has been fixed so that code like that shown above matches the correct native instructions to perform the operation efficiently. (Please note that to be semantically correct, at least one of the 32-bit operands must be cast to a 64-bit value. Otherwise, a 32-bit multiply is performed and then the result is cast to 64-bits. See [[1]] for more information.)


New compiler option to treat .c files as CLA files (C2000)

The --cla_default option can be used to treat .c files as CLA files.

Removed language constraints from the CLA compiler (C2000)

Integer divide, modulus and unsigned integer compares are now permitted in CLA code. If the compiler can't eliminate or transform these non-native operations through optimization, performance advice will be issued indicating that the operations might lead to poor performance.

_Bool support for CLA (C2000)

Previously, _Bool was only defined if the --c99 compiler options was selected. Now, _Bool can be used without C99 support by including the stdbool.h header file.


C2000 EALLOW, EDIS intrinsics

The following intrinsics are now supported for C2000:

  void __eallow(void);  // generates EALLOW instruction
  void __edis(void);    // generates EDIS instruction

ARM and MSP430 COFF ABI no longer supported

As of version 15.9.0.STS of the ARM and MSP430 CGT, COFF ABI support is discontinued. If COFF ABI support is needed for your application, please use ARM CGT 5.2.x or MSP430 CGT version 4.4.x.

MSP430 performance improvements

v15.9.0.STS introduces two MSP430 performance improvements.

With --opt_level=3 or 4, and --opt_for_speed=5, the compiler will attempt to unroll loops for performance improvement.

With --opt_level=3 or 4, and --opt_for_speed=4 or 5, the compiler will use a slightly more aggressive function inliner algorithm. At other optimization settings the previous inlining behavior has not been changed.

Function inlining can additionally be controlled with existing PRAGMAs for FUNC_ALWAYS_INLINE and FUNC_CANNOT_INLINE as well as options --auto_inline --no_inlining, and --single_inline. See the MSP430 Optimizing C/C++ COmpiler User's Guide for more details.

Module summary in linker map file

The linker map file now contains a module summary view. This view organizes the object files by directory or library and displays the code, read-write, and read-only size that the file contributed to the resulting executable.

Sample output:

MODULE SUMMARY

       Module                     code    ro data   rw data
       ------                     ----    -------   -------
    .\Application\
       file1.obj                  1146    0         920    
       file2.obj                  316     0         0      
    +--+--------------------------+-------+---------+---------+
       Total:                     1462    0         920 

 mylib.lib
       libfile1.obj               500     0         0      
       libfile2.obj               156     4         0      
       libfile3.obj               122     0         20      
    +--+--------------------------+-------+---------+---------+
       Total:                     778     4         20

       Heap:                      0       0         0      
       Stack:                     0       0         1024   
       Linker Generated:          424     200       0      
    +--+--------------------------+-------+---------+---------+
       Grand Total:               2664    204       1964


Generate REV, REVSH, REV16 instructions from native C (ARM)

The _rev, _revsh, and _rev16 intrinsic functions have always been available to use these instructions. However, it is possible for the compiler to generate them from native C expressions. This is ideal for customers who wish to write source code that is portable across ISAs and tool chains. The instructions can now be generated based on the following source code sequences:

REV:

(((word << 24U) & 0xFF000000U) |
 ((word << 8U)  & 0x00FF0000U) |
 ((word >> 8U)  & 0x0000FF00U) |
 ((word >> 24U) & 0x000000FFU))

REV16:

((word >> 8) & 0x00FF00FF) | ((word << 8) & 0xFF00FF00)

REVSH:

((word << 24) >> 16) | ((word >> 8) & 0xFF)


Available as of v15.3.0.STS

(This STS version was not released separately for all MCU targets.)

STLport C++ RTS

v15.3.0.STS introduced the STLport C++03 RTS. The move to STLport will break ABI compatibility with previous C++ RTS releases. Attempting to link old C++ object code with the new RTS will result in a link-time error. Suppressing this error will likely result in undefined symbols or undefined behavior during execution. Breakages are known to occur in particular for object code using locale, iostream, and string.

In most cases, recompiling old source code with the new RTS should be safe. However, for non-standard API extensions to the C++ library, compatibility is not guaranteed. This includes usage of hash_map, slist, and rope.

Dependence between locale and iostream is increased in STLport. Usage of one will likely cause the other to be linked as well. This may cause an additive increase in both code size and initialization time.

C ABI compatibility will not be affected by this change.

Math library improvements

  • C99 math support for C2000

The RTS Math Library has been changed. C99 math support is now available for C2000, including long double (64-bit) and float versions of floating point math routines. (Currently, float is the same size as double on C2000, so there is no advantage in using the float versions of library calls.)

See [this page] for a list of C99 math routines.

  • Improved performance for ARM and MSP430

Below is a table comparing all the C99 single precision math routines on a TM4C1294XL board. Similar improvements exist in the double precision routines, but they are not shown here since TM4C1294XL only supports single precision floating point. These results are dependent on the test environment and board configuration.

benchmark 5.2 15.9 % improvement
acosf 282.64 210.784 25.42
acoshf 384.528 297.024 22.76
asinf 245.392 163.472 33.38
asinhf 320.864 213.856 33.35
atan2f 243.904 212.912 12.71
atanf 190.048 113.792 40.12
atanhf 371.552 353.488 4.86
cbrtf 424.576 149.632 64.76
ceilf 72.144 53.152 26.33
copysignf 42.48 29.952 29.49
cosf 403.68 141.744 64.89
coshf 963.824 180.288 81.29
erfcf 450.992 126.512 71.95
erff 168.176 116.32 30.83
exp2f 875.84 538.496 38.52
expf 842.544 122.592 85.45
expm1f 318.608 128.784 59.58
fdimf 50.016 101.856 -103.65
floorf 74.24 53.072 28.51
fmaxf 56 68.016 -21.46
fminf 46.176 68.768 -48.93
fmodf 4652.128 549.072 88.20
frexpf 118.128 67.936 42.49
hypotf 314.192 82.128 73.86
ldexpf 162.32 89.232 45.03
lgammaf 654.496 643.6 1.66
llrintf 292.096 311.056 -6.49
llroundf 209.856 205.344 2.15
log10f 305.056 207.2 32.08
log1pf 348.272 257.472 26.07
log2f 275.2 202.112 26.56
logf 270 202.688 24.93
lrintf 267.696 271.936 -1.58
lroundf 183.344 128.416 29.96
modff 65438560 65438400 0.00
powf 1470.864 735.92 49.97
remainderf 6558.688 663.024 89.89
remquof 6386.336 536.8 91.59
rintf 225.136 61.904 72.50
roundf 126.656 105.04 17.07
scalblnf 166.56 107.056 35.73
scalbnf 166.56 99.952 39.99
sinf 404.048 135.44 66.48
sinhf 411.744 125.584 69.50
sqrtf 26.96 45.968 -70.50
tanf 390.464 126.976 67.48
tanhf 322.512 82.816 74.32
tgammaf 1450.976 26663.04 -1737.59
truncf 102.576 56.448 44.97

Aliased Memory Ranges

Some devices have a region of RAM that can be addressed by two different memory buses (a system bus and an instruction bus).

In order to use above capability, the linker must be aware of the two memory ranges. Use below syntax for ALIAS memory ranges. ALIAS regions must have the same length.


  MEMORY
  {
      ...
      ALIAS
      {
        SRAM_CODE (RWX) : origin = 0x01000000
        SRAM_DATA (RW)  : origin = 0x20000000
      } length = 0x0001000
      ...
  }


C28x compiler is publicly available

The C28x compiler is now available with limited restrictions. It is available without a click-wrap license or export control restrictions and can be redistributed freely.