NOTICE: The Processors Wiki will End-of-Life in December of 2020. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

Keystone Error Detection and Correction EDC ECC

From Texas Instruments Wiki
Jump to: navigation, search

Keystone Error Detection and Correction - EDC and Error Correcting Codes - ECC

Many applications have very stringent requirements to detect faults in the memory system of a processor to avoid failures of the end-system that could lead to dangerous situations for the end-user, or high requirements regarding the availability of the end-system. There are many mechanisms that could lead to faults in the memory of a processor. Some of these could lead to permanent faults and others to transient faults.

Detection of transient faults while the system is running is very important for critical applications. While permanent faults can be equally important, the likelihood of them occurring is usually significantly lower than transient faults. Permanent faults can, in many cases, be detected by running appropriate test algorithms at application startup or shutdown. Transient faults are faults predominantly introduced by soft errors. Major contributors of these are alpha radiation of the materials used in the package of a chip or neutron particles from cosmic rays. These can lead to bit flips in memories or changes to the state of a flip-flop.

There are multiple mechanisms in the Keystone Architecture that is provided to detect such faults and in specific instances provide the ability to correct some of these faults.


Keystone Error Detection and Correct - EDC

C66x L1P - EDC Implementation

L1P Error Detection Logic can detect single bit error for accesses that hit within L1P RAM or L1P cache. While the Error Detect logic is enabled, all 64-bit DMA writes will update and store parity and valid bits. Writes narrower than 64 bits (or) non-aligned writes will update the parity RAM to indicate ‘invalid parity.’ L1P checks parity for each program fetch on L1P as all the program fetches are 256-bit aligned. In the case of DMA/IDMA read access to L1P memory, the parity check occurs only when the data size is at least 64-bit wide or a multiple of 64-bit wide.

For Full Details on EDC Implementation in L1P on Keystone Devices, please see the C66x CorePac UG

C66x L1D

No error detection or correction is implemented in L1D SRAM/Cache. The L1D is normally all cache, and the memory is usually temporary and in the rare instance of a bit flip that may occur it typically would not result in a system crash.

C66x L2 - EDC Implmentation

The L2 memory controller provides EDC with a hamming code capable of detecting double-bit errors and correcting single-bit errors within each 128-bit word. EDC is supported for both L2 RAM and L2 cache accesses. All 128-bit writes to L2 memory update the stored parity and valid bits in L2 RAM regardless of whether EDC logic is enabled or disabled. The L2 memory controller always performs a full hamming code check on 128-bit reads of L2 regardless of whether the fetch is from L1P, L1D, IDMA, or DMA. Writing narrower than 128 bits updates the parity RAM in L2 to indicate invalid parity and zeroes the parity values regardless of whether EDC is enabled or disabled. All 128-bit reads will be parity-checked when the EDC logic is enabled. L2 memory controller also applies EDC to L2 victims. Error Detection is performed on all L2 data fetches by L1D cache without any correction.

For Full Details on EDC Implementation in L1P on Keystone Devices, please see the C66x CorePac UG.


Keystone MSMC RAM - EDC Implmentation

The MSMC has error detection and correction hardware to protect the contents of the MSMC memory storage against corruption due to transient (soft) errors. The level of protection provided and the scheme used is the same as that of the C66x CorePacs (that is, one-bit error correction, two-bit error detection, with the parity codes calculated over a 256 bit datum).

The MSMC EDC HW also provides a scrubbing engine which periodically cycles through each location of each memory bank in the MSMC, reading and correcting the data, recalculating the parity bits for the data, and storing the data and parity information. Each such “scrubbing cycle” consists of a series of read-modify-write “scrub bursts” to the memory banks.

For Full Details on EDC Implementation in MSMC, please see the devices MSMC UG. - Keystone I MSMC UG- Keystone II MSMC UG


Keystone DDR3 Error Correcting Code - ECC

For data integrity, the DDR3 memory controller supports ECC on the data written to or read from the ECC protected address ranges in memory. Eight-bit ECC is calculated over 64-bit data quanta and provides SECDED (single error correction, double error detection) for the quanta. The system must ensure that any bursts accesses starting in the ECC protected region must not cross over into the unprotected region and vice-versa.

The ECC algorithm used in EMIF is the industry standard Hamming code (72,64)SECDED algorithm.

For Full Details on ECC Implementation in DDR3 on Keystone Devices, please see the DDR3 UG for the device. - Keystone I DDR3 UG - Keystone II DDR3 UG


ARM-A15 Error Detection and Correction (ECC) Keystone Support

Memory Protection Notes
L1 Data RAM ECC (per 32 bits) 1-bit evict corrected line to L2, treat as L1D miss and refetch from L2; 2-bit detect
L2 Data RAM ECC (per 64 bits) 1-bit inline correct to reader and evict (corrected); 2-bit detect
L1 Instruction Data RAM Parity (per 16 bits) 1-bit detect, invalidate, and treat as cache miss (fetch from L2/DDR)
L1 Instruction Tag RAM Parity 1-bit detect and treat as cache miss (fetch from L2/DDR)
L1 instruction BTB RAM Parity 1-bit detect and treat as branch predictor miss
L1 Instruction GHB RAM None Error looks like prefetched the wrong address, effectively a predictor miss
L1 Instruction Indirect Predictor RAM None Error looks like prefetched the wrong address, effectively a predictor miss
L1 Data Tag RAM ECC 1-bit evict corrected line to L2, treat as L1D miss and refetch from L2; 2-bit detect
L2 TLB RAM Parity TLB entry invalidate, trigger page walk
L2 tag RAM ECC 1-bit correct (read-correct-write), replay lookup; 2-bit detect
L2 Snoop Tag RAM ECC 1-bit correct (read-correct-write), replay lookup; 2-bit detect
L2 dirty RAM ECC 1-bit evict cache line to DDR, replay load;2-bit detect
L2 Prefetch RAM Parity 1-bit invalidate line


For full details on the ARM A15 Implementation, please see ARM A-15 RTM Documentation.