Cache Analysis Using Simulator
A cache is a block of RAM used for temporary storage of data that is likely to be used again. In a CPU there can be several caches, to speed up instructions in loops or to store often accessed data. These caches are small but very fast. Reading data from cache memory is much faster than reading it from RAM.
This page describes the various performance issue due to inefficent usage of the cache subsystem and ways to identify and remove it.
Cache thrashing occurs when two or more data or program items that are frequently accessed map to the same cache address. Each access will replace the other item in the cache resulting in continous misses. Cache thrashing could also happen if you are access items which does not fit into the cache.
Summarizing the reason for thrashing as below
- Linear accessing of data structure or program with size exceed the cache size. For example if we execute 32Kbyte(Kb) code in linear fashion with the size of program cache to be 8Kb then this will lead to continuous misses. These misses are categorized as capacity miss. You can do nothing about it unless the code or data section is restructured to fit the cache.
- Two or more data structure accessed in parallel and both conflicting to the same line. This will lead to one replacing the other.
- Two or more code section run sequenctial and both conflicting to the same set of line. This will lead to one replacing the other.
The last two categories are termed conflict misses.
How to identify and remove cache thrashing issue using Simulator
Before starting the analysis , select the correct device you want to simulate. The preference of the simulator in order will be
- Device cycle accurate simulator
- Megamodule cycle accurate simulator
- Device functional simulator(which support cache events)
- Megamodule functional simulator(which support cache events)
For example, to anlayse the perfomance of cache in c6416 device, select c6416 device cycle accurate simulator.
How to identify cache conflict miss
The below step should be followed to identify cache conflict miss within the application for the device
- Run the appplication on the selected simulator with functional profiling enabled for all functions.
- Select cycle.CPU and cycle.Total events in the profile data output.
- Identify the functions which has large value for (cycle.Total - cycle.CPU). These are functions which has large number of memory stall.
- Run the application on the simulator with profiling enabled for selected functions.
- Select cache events for all level of caches which may include level1 data/program cache, level2 unified cache etc.
- Identify the functions which has large value for cache.miss.conflict. These are the function which can be optimized for cache with memory placement and without restructing the code or data section.
How to remove cache conflict miss
The below step can be used to remove the program cache conflicts
- Identify the address range used by the functions which has high value for program cache conflict. Most probabilty these function will be conflicting with each other
- Replace the functions to different address so that the cache set does not match.
For example, let us take a case of an application with three function func0, func1, func2. Let us take a device with program cache size as 32kb, 2 way, 16 byte(32:15 - Tag, 14:5 - Set, 4:0 - Offset).
function(start address, end_address) : func0(0x0x4000400, 0x4000800), func1(0x0x8000400, 0x8000800), func2(0x0x9000400, 0x9000800).
All three function maps to same set 0x40 down to 0x80. They will conflict with each other.
Remap the function to a different location
function(start address, end_address) : func0(0x0x4000400, 0x4000800), func1(0x0x8001400, 0x8001800), func2(0x0x9002400, 0x9002800).
The below step can be used to remove the datacache conflicts
- Identify the functions which has high value for data cache conflict.
- Identify the data structures which are frequently accessed within the function.
- Identify if this data structure is conflict with other variable within this function or other function with high data conflict
- Map the data structure to a different address.
Identifing data conflict address using Tag Ram viewer
- Put a breakpoint at the data structure access point within the function which is giving conflict miss
- Identify the cache set number the data structure is map to. Set number can be calculated if the cache size. cache line size and associativity is known.
- On hiting the breakpoint , check which address is present in this set. These are the conflicting addresses
- Details of data cache optimizations - Data Cache Demo