NOTICE: The Processors Wiki will End-of-Life in December of 2020. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
System Analyzer Tutorial 8B
Tutorial 8B: Periodically Logging Performance Monitoring Counters
This tutorial provides an example of how to read some of the Performance Monitoring Counters (PMCs) that are provided by various silicon modules within a Keystone architecture device, log these count values periodically as UIA events, and graph these values over time using System Analyzer. Reading the performance counters from the target is preferable to having a script on the host read the counters via the emulation driver since it is not limited by the CCS Debug Server polling rate, supports remote targets as well as locally connected targets, and allows target-synchronous real-time accesses.
In order to easily correlate these PMC values with the performance measurements logged by CP_Tracer, we will need to write a target-side ISR that periodically samples the PMCs and logs the values to STM via LoggerSTM. This ISR is triggered by an interrupt from a CP_Tracer module that is raised when the CP_Tracer's sliding window expires, which allows us to better align the performance counter statistics with the statistics from MSMC CP_Tracer modules.
The ISR will:
- disable interrupts
- save the context of the MPAX registers
- reprogram the MPAX registers to access the performance counters
- read the counters
- restore the MPAX register context
- restore the interrupt enable state
- log the PMC count values as UIA events
- XMC: External Memory Controller
- MPAX: XMC Memory Protection and Address Extension
- MAR: Memory Attributes Registers
- EMIF: External Memory Interface
- A 64-bit EMIF interface is provided for accessing off-chip DDR3 SDRAM, which can be used as data or program memory. Although this interface supports a 64-bit data bus, it can also be configured to operate using a 32-bit or 16-bit data bus
- EMIF16: External Memory Interface peripheral utilizing a 16-bit bus
- The EMIF16 module is intended to provide a glue-less interface to a variety of asynchronous memory devices like ASRAM, NOR and NAND memory. A total of 256M bytes of any of these memories can be accessed at any given time via four chip selects with 64M byte access per chip select. NOR Flash can be used for boot purposes. These memories can also be used for Data Logging purposes.
- Synchronous memories such as DDR1 SDRAM, SDR SDRAM and Mobile SDR are not supported.
- Provides information about the XMC (External
- C66x CorePac User's Guide (sprugw0b) : Info about MPAX registers
- Throughput Performance Guide for C66x Keystone Devices (sprabk5a)
- Multicore Shared Memory Controller (MSMC) for Keystone Devices User Guide (sprugw7a)
- DDR3 Memory Controller for KeyStone Architecture (sprugv8c)
- TMS320C6678 Data Manual (sprs691c)
- The PDK and CSL packages (installed as part of the MCSDK) for your board
C6678 Memory Map: Performance Monitoring Registers
|Logical Start Adrs (32b)||Logical End Adrs (32b)||Physical Start Adrs (32b)||Physical End Adrs (32b)||Bytes||Description|
|08000000||0800FFFF||0 08000000||0 0800FFFF||64K||Extended memory controller (XMC) configuration|
|0BC00000||0BCFFFFF||0 0BC00000||0 0BCFFFFF||1M||Multicore shared memory controller (MSMC) config|
|21000000||210001FF||1 00000000||1 000001FF||512||DDR3 EMIF configuration|
The MPAX Registers
With XMC’s MPAX feature, C66x CorePac supports systems with address widths up to 36 bits, despite only supporting 32-bit addresses internally. It accommodates these large memory systems by extending addresses on external requests with its Memory Protection and Address eXtension (MPAX) unit.
The MPAX combines memory protection and address extension into one unified process. The memory protection step determines what types of accesses are permitted on various address ranges within C66x CorePac’s 32-bit address map. The address extension step projects those accesses onto a larger 36-bit address space.
Each core has its own set of MPAX Registers:
- Translate between physical and logical address
- 16 registers (64 bits each) control (up to) 16 memory segment
- Each register translates logical memory into physical memory for the segment
- Segment definition in the MPAX registers:
- Segment size – 5 bits – power of 2, smallest segment size 4K, up to 4GB
- Logical base address – (up to 20 bits) the upper bits of the logical segment base address. The lower N bits are zero where N is determined by the segment size
- For segment size 4K, N = 12 and the base address uses 20 bits
- For segment size 8k, N=13 and the base address uses only 19 bits
- For segment size 1G, N=20 and the base address uses only 2 bits
- Segment definition in the MPAX registers:
- Physical (replacement address) base address – (up to 24 bits) the upper bits of the physical (replacement) segment base address. The lower N bits are zero where N is determined by the segment size
- For segment size 4K, N = 12 and the base address uses up to 24 bits
- For segment size 8k, N=13 and the base address uses up to 23 bits
- For segment size 1G, N=20 and the base address uses up to 6 bits
- Permission – access type allowed in this address range
- Three bits are dedicated for supervisor mode (write, read, execute)
- Three bits are dedicated for user mode (write, read, execute)
Overview: DDR3 Performance Monitoring
The DDR3 controller provides a set of performance counter registers which can be used to monitor or calculate the bandwidth and efficiency of the DDR traffic. The counters can be configured to count events such as total number of SDAM accesses, SDRAM activates, reads, write and so on.
- The Performance Counter 1 and 2 Registers (PERF_CNT_1 and PERF_CNT_2) act as two 32-bit counters that are able to count events independent of each other.
- To provide more granularity the counters can also be configured to filter events originating from a particular master or address space.
- The events to be counted and filter enabled are programmed in the Performance Counter Config Register (PERF_CNT_CFG).
- The actual value of the filter is programmed in Performance Counter Master Region Select Register (PERF_CNT_SEL).
- The counters start counting the events independently when commands enter the Command FIFO.
The CNTRN_CFG (N=1,2) fields in the PERF_CNT_CFG register are used to select the event for the counter to count. The PERF_CNT_CFG also includes options to enable or disable the master (CNTRN_MSTID_EN) and address space (CNTRN_REGION_EN) filters for each counter. The filters are disabled by default.
If the respective filters are enabled, the master ID value and region select options can be programmed in the PERF_CNT_SEL register. It should be noted that the master ID and region select filters apply only to a certain subset of events that can be counted. The table below shows the events for which the filters are applicable.
|Offset||Acronym||Register Description||Width (bits)||Field Descriptions|
|0x80||PERF_CNT_1||Performance Counter 1 Register||32||COUNTER_1: 32-bit counter can be programmed as specified in the PERF_CNT_CFG and PERF_CNT_SEL registers.|
|0x84||PERF_CNT_1||Performance Counter 1 Register||32||COUNTER_2: 32-bit counter can be programmed as specified in the PERF_CNT_CFG and PERF_CNT_SEL registers.|
|0x88||PERF_CNT_CFG||Performance Counter Config Register||Bitfield||CNTR2_MSTID_EN (b31), CNTR2_REGION_EN(b30), CNTR2_CFG(b19-16), CNTR1_MSTID_EN(b15), CNTR1_REGION_EN(b14), CNTR1_CFG(b3-0)|
|0x8C||PERF_CNT_SEL||Performance Counter Master Region Select Register||32b Bitfield||MSTID2 (b31-24), REGION_SEL2 (b19-16), MSTID1 (b15-8), REGIONS_SEL1 (b3-0)|
|0x90||PERF_CNT_TIM||Performance Counter Time Register||32||TOTAL_TIME: 32-bit counter continuously counts number of DDR clock cycles elapsed after the controller is brought out of reset.|
XMC Performance Monitoring Counters
The following counters are provided:
|08000310||XPFAC0[19:0]||Prefetch Sent Count|
|08000314||XPFAC1[19:0]||Prefetch Canceled Count|
|08000318||XPFAC2[19:0]||Prefetch Hit Count|
|0800031C||XPFAC3[19:0]||Prefetch Miss Count|
All four counters stop when one counter saturates. The quickest it can saturate is 2ms, but it could take longer. The counters can be halted without resetting them to allow suspending the count.
These counters provide an indication of how how efficiently the GEM prefetch system is working (ratio of Prefetch Hit to (Hit+Miss)), the amount of bandwidth wasted (Miss count), and the actual bus bandwidth consumed (Sent count). There are a number of interesting composite statistics you can derive from these four totals:
- Total demand fetches = (hit + miss)
- Total prefetches = (sent - canceled)
- Total bandwidth = (sent - canceled + miss)
- Wasted bandwidth = (sent - canceled - hit)
- Bandwidth amplification % = ((sent - canceled - hit) / (hit + miss)) * 100%
- Hit rate % = (hit / (hit + miss)) * 100%
- Cancel rate % = (canceled / sent) * 100%
- Stream (re)starts = (sent - hit) / 2
- (Note that "stream (re)starts" is only useful in "data only" counting mode.)
This information can used by developers to help tune code. For example, some applications get slower with prefetch enabled due to bandwidth amplification effects. One may be able to diagnose that situation, as well as assess tuning effectiveness, by looking at the bandwidth amplification and cancel rates.