Raw NAND ECC

From Texas Instruments Wiki
Jump to: navigation, search


Introduction

On OMAP3 (including AM/DM37xx) and AM35xx devices the General Purpose Memory Controller (GPMC) supports ECC calculation and error detection for Hamming (1b correction) and BCH (4b/8b), while error correction is performed in software when needed. OMAP4, Netra (DM816x/AM387x), and later devices also support up to 16b ECC detection along with an Error Location Module (ELM) that gives the location of detected errors for BCH-4/8/16.

GPMC on some devices has an erratum affecting BCH-4 calculation. A software workaround is available. This erratum affects OMAP34xx/35xx (all revisions), AM35xx 1.0, AM/DM37xx 1.0, and Netra (all revisions). The erratum does not affect Netra's ELM.

OMAPL1xx/C674x/AM1xxx devices use the EMIFA module instead of GPMC, which supports 1b (Hamming) and 4b (Reed-Solomon) ECC, including correction. Older DaVinci parts also use EMIFA, some only support 1b correction. Please see here for details.

With NAND Flash manufacturers moving to smaller process technologies, they are now requiring 8b ECC correction on SLC NAND and will eventually move to higher ECC requirements.

ECC support by device

Hardware Boot ROM Code Driver Solution
Error Detection Error Location Error Correction Error Correction
1b 4b 8b 16b 1b 4b 8b 16b 1b 4b 8b 16b 4b 8b 16b
DM33x/DM35x/DM36x
AM1xxx/C674x/OMAP-L1xx
EMIFA EMIFA Y Y
OMAP34xx/35xx
AM35xx 1.0
AM/DM37xx 1.0
G G G G Y Y
AM35xx 1.1
AM/DM37xx 1.1+
GPMC G Y Y
AM389x/C6A816x/DM816x G G GPMC G ELM Y Y
AM387x/DM814x
AM335x
AM437x
GPMC G ELM Y Y

Note: GPMC abbreviated to G in narrow columns and grayed out if affected by the BCH-4 erratum.

What is ECC

An Error correcting code (ECC) is redundant data added to the original data. In event of errors, the combined data allows the recovery of the original data. The number of errors that can be recovered depends on the algorithm used.  Wikipedia article for more details on ECC.


Why is ECC required for NANDs?

Data stored in NANDs can get corrupted (randomly). There is an upper limit on the number of error per byte depending on the NAND process and the technology. SLC NANDs have less ECC requirements than MLC NANDs. The NAND datasheet gives the ECC requirement for the NAND device. For SLC NANDs, 1/4bits per 512 bytes are common currently. For MLC, devices with 4/8/16 bits per 512 bytes ECC requirements are in the market. 

 

What are the various algorithms and the differences used to implement ECC?


The various algorithms used in TI ECC hardware are:

  1. Hamming: For 1 bit
  2. Reed Solomon: For up to 4 bits of
  3. BCH : For more than 4 bits



Where is ECC stored in the NANDs?

Extra memory (called the "spare memory area" or "spare bytes region") is provided at the end of each page in NAND which could be used to store the ECC. This area is similar to the main page and is susceptible to the same errors. For the present explanation, assume that the page size is 2048 bytes and the ECC requirements are 4 bits per 512 bytes. Let's assume that the algorithm generates 16 bytes of redundant data per 512 bytes. For 2048 bytes page, 64 bytes of redundant data will be generated. (In current TI devices, the ECC data is generated for every 512 bytes)

There are two ways to store the redundant data:

  1. After every 512 bytes of data. This will cause some original data to be stored in the spare area. In RBL terminology, this is called "compatible mode".
  2. Completely within the spare memory area. Here, the original data is stored in the page followed by all of the redundant data for that page together. This is called "non-compatible mode".




Does ECC have to be calculated on a 512-byte data chunk?

TI device's hardware ECC implementation calculates ECC on 512 byte data chunks. As page size is always a multiple of 512, this gives a generic way to calculate the whole page ECC in parts and still reuse the same IP for various size NAND devices. The particular BCH family used by GPMC and ELM however requires that the data size including ECC bits is at most 8191 bits, i.e. just under 1KB, which effectively limits it to 512 bytes + ECC + some optional auxiliary data.

GPMC is however able to keep track of up to eight BCH calculations, allowing pages of up to 4KB to be read/written in a single pass. Various layouts are supported for the spare bytes.



Is it possible to use any ECC algorithm for any NAND?

As strong or complex ECC schemes generate more number of bytes as ECC signature, which is stored in OOB/spare region of NAND. Hence choice of ECC scheme is limited by size of OOB/spare region available per page of the NAND.

Apart from ECC signature, some OOB/spare region needs to be reserved for storing:

  • Bad-block marker(2-Bytes), And
  • File-system metadata.

So if NAND device has OOB/spare region, which can accommodate all above, then it can use the given ECC scheme. As per general calculations [1]

OOB Area (spare region) >= B * ( Page_Size / 512 ) + 2 + FileSystem_metadata
where
B =  8 bytes for BCH4
B = 14 bytes for BCH8 
B = 26 bytes for BCH16

Generally NAND pages are of size 2048 (with 64-bytes of OOB/spare region) or 4096 (with 224-bytes of OOB/spare region). Thus using BCH8 for 2048-byte page NAND device, using UBIFS File-system means:

  • 14*(2048/512) = 56bytes of ECC
  • 2 bytes for Bad-Block marker
  • 0 bytes for meta-data (as UBIFS does not uses OOB/spare region for storing its metadata).

Total = 56 + 2 + 0 = 58-bytes which can be accommodated in 64-Bytes of OOB/spare region of NAND. Hence, BCH8 ECC scheme can be used with UBIFS on a 2048/64 NAND device.





What will happen if an 8-bit ECC NAND is used with our 4-bit ECC capable devices?


In this scenario, if more than 4 errors are detected, the errors can't be corrected. This can have serious consequences including boot failure. It is advisable to keep correcting the ECC errors in the designated read-only/boot sections of the NAND to reduce the chances of boot failure.



What is required to support 4b/8b ECC NAND devices?

ECC Correction

  • The OMAP35x, AM35x, and AM/DM37x devices do not support 4b or 8b correction in hardware, however, they do support 1b, 4b (not OMAP35x), and 8b hardware detection.  This requires for error correction to be done in software.
  • The AM335x and AM437x devices support 4b, 8b, and 16b detection and error location.  The software only needs to flip the error bits at the locations provided by the ELM.&nbsp


Boot Support

  • Currently the OMAP35x, AM35x, and AM/DM37x devices support only 1b error correction in the ROM for NAND Boot. However, many of the devices requiring 4- or 8-bit ECC have specified that the first block can be used with 1-bit ECC for a certain number of program erases cycles, e.g. 1000.
  • The AM335x and AM437x boot ROM code allows support for up to 16b ECC SLC NAND devices.



Software Performance:

Raw perfromance testing of the Software ECC correction algorithm was done on the AM37x EVM.  Below are the results of the test with artificially inserted errors.

  • 1-bit correction mode: 120 timer ticks or 4.61us to correct 1 error
  • 4-bit correction mode: 234,000 timer ticks or 9.00ms to correct 1 error with no meaningful performance difference when correcting 2 to 4 errors
  • 8-bit correction mode: 244,000 time ticks or 9.38ms to correct 1 error with no meaningful performance difference when correcting 2 to 8 errors



What are the options?

If you are planning a new design, with any of the above mentioned devices that only support 1b ECC for ROM boot, you can utilize any of the options below.  For simplicity, we recommend the "eMMC/eSD NAND option".   For existing designs, choose the option below that best suits your application.


Use a NAND device that only requires 1 bit ECC

  • Some of the NAND Flash manufacturers are ending production of their NAND that only require 1 bit correction, but there are other manufacturers that do not have a current plan to EOL their 1 bit ECC devices. 
  • Hynix is one NAND manufacturer that currently continues to support 1b ECC NAND devices and is utilized on the AM37x EVM (TMDXEVM3715).


Boot with 1 bit ECC correction and run with NAND 4b correction in NAND flash

  • Some of the NAND Flash providers have started producing NAND devices with built in or on-die ECC for detection and correction.
  • For some of the NANDs, the built in ECC correction defaults to an "off state" upon power-on of the NAND device.
  • The NAND flash is specified such that the first block only requires 1-bit ECC correction.

At boot time, ROM Code will use 1 bit ECC algorithm to boot from the NAND device.

  • The ECC in the device (OMAP35x,AM35x,AM/DM37x) must then be disabled after boot (ie in XLOADER for example)
  • Then the built-in ECC in NAND device can be enabled (ie again in XLOADER)

Note:

If the device (OMAP35x, AM35x, AM/DM37x) has a warm reset the ROM code will startover and use 1b ECC correction, but because the NAND has the on-die ECC enabled there will be a conflict in the spare area for the ECC data because the ROM code spare area mapping and the NAND on-die mapping overlap.  The only way to disable the on-die ECC once it is enabled is to send a command to disable it, or to power cycle it.  Because of this the system will either fail to boot because the ROM code will try to correct the "errors" that it sees from the conflicting data, or it will boot because there are too many errors so it will just send the raw data without correcting. 

One option is for the system to cycle the power on the NAND when the PMIC receives the warm reset.  This will allow the NAND to go back to the on-die ECC off state and the system will boot as normal.

Note:

Precautions should be taken to ensure the ECC area doesn't conflict with the Flash File System area.


Boot with 1 bit ECC correction and run with 4b/8b correction in software

  • Some NAND providers are providing devices which guarantee that the 1st block only requires 1 bit correction.  Because the OMAP35x, AM35x, and AM/DM37x devices support 1b correction in ROM, booting can be done from the NAND device. After boot is complete, software ECC correction should be utilized.


eMMC/eSD NAND

  • Managed NAND has built in ECC and is connected via the MMC/SD interface. Because this interface has built-in ECC and is connected through the MMC/SD, the issues with the GPMC 4b/8b correction are not relevant. Memories compatible to MMC 4.2 and SD 2.1 will work seamlessly with these processors.
  • The Following managed NAND devices have been tested with OMAP35x, AM35x, and AM/DM37x devices:
Sandisk – SDIN2C2
Samsung – KMAFN0000M-S998


OneNAND

  • OneNAND has hardware ECC built in which eliminates the need for error correction to be done by the GPMC.
  • OneNAND is Interfaced through the GPMC as a muxed NOR flash device.  

Note: OneNAND devices is offered by samsung e.g. KFN8G16Q4M-AEB10


(AM35x only) Secondary Boot from SPI EEPROM

  • Boot from another type of device like NOR or SPI and then continue using NAND with 4b/8b ECC software correction.
  • SPI boot only available for AM35x


Secure a lifetime buy for current NAND device or utilize a pin for pin compatible solution that supports 1 bit ECC

  • Customers with existing designs with NAND that is becoming EOL should make arrangements with their NAND suppliers to secure supply for the lifetime of their product(s) or utilize one of the above options. 

  1. http://processors.wiki.ti.com/index.php/TI81XX_PSP_UBOOT_User_Guide#BCH_Flash_OOB_Layout