Please note as of Wednesday, August 15th, 2018 this wiki has been set to read only. If you are a TI Employee and require Edit ability please contact x0211426 from the company directory.

Early Boot and Late Attach in Linux

From Texas Instruments Wiki
Jump to: navigation, search

Introduction

DRA7xx SOC's have multiple processor cores - Cortex A15, C66x DSP's and ARM M4 cores. The A15 typically runs a HLOS like Linux/QNX/Android and the remotecores(DSP's and M4's) run a RTOS. In the normal operation, boot loader(U-Boot/SPL) boots and loads the A15 with the HLOS. The A15 boots the DSP and the M4 cores. In this sequence, the interval between the Power on Reset and the remotecores (i.e. the DSP's and the M4's) executing is dependent on the HLOS initialization time. This delay may not be suitable for realizing some usecases with tight time constraints. e.g. Rear View Camera.

caption Normal Boot Flow

To address this usecase, one may need the boot loader to boot a remote core before booting the A15 with the linux kernel i.e. booting the remotecore early. The kernel then attaches to the already booted remote core for further communication i.e. connecting to the remotecore later in its execution. We refer to this feature as the "Early Boot - Late Attach" functionality. The "Early Boot" functionality is provided by the boot loader. The "Late Attach" functionality is a feature of the Linux Kernel.

caption Early Boot Flow

Late attach functionality has been validated for all IPU2,IPU1,DSP1 and DSP2 on the DRA7xx platform. The next sections describe how to use this feature and how to troubleshoot any issues with early boot and late attach.

Using Early Boot/Late Attach

Early Boot/Late Attach functionality is supported on the TI GLSDK, PSDKLA and Android SDK from the same codebase. There are minor differences in the build procedure and the location where the binaries are stored based on the SDK and kernel/u-boot version.

  1. Kernel 3.14/u-boot 2014.07

    1. When using GLSDK, the configuration used for building u-boot is dra7xx_evm.h and the remotecore binaries are stored in the FAT partition of the SD/eMMC card.

    2. When using Android SDK, the configuration used for building u-boot is dra7xx_evm_android.h and the remotecore binaries are stored in individual partitions of the eMMC storage on the EVM.

  2. Kernel 4.4/u-boot 2016.05

    1. When using PSDKLA, the remotecore binaries are stored in the FAT partition of the SD/eMMC card.

    2. When using Android SDK, the remotecore binaries are stored in individual partitions of the eMMC storage on the EVM.

    In both Android SDK and PSDKLA, early boot support in u-boot is enabled using KConfig options.

    http://processors.wiki.ti.com/index.php?title=Processor_SDK_Linux_Automotive_Software_Developers_Guide#Configuring_U-Boot

Pre-flight checks

  1. Before attempting to early boot a remotecore from u-boot, please ensure that the remotecore binary can be loaded by Linux without any issues. This ensures that the memory map and MMU configuration are done correctly.

  2. MLO uses the same memory allocation strategy as the kernel for the carveouts specified in the resource table from the memory pool used by the kernel. The location of the memory pools for each of the remotecores is hardcoded in MLO to kernel defaults. In case the memory allocations in kernel are modified, U-Boot should be modified to match with the configuration specified in the kernel.

    The u-boot source file to modify is board/ti/dra7xx/lateattach.c.

    #define DRA7_RPROC_CMA_BASE_IPU1             0x9d000000
    #define DRA7_RPROC_CMA_BASE_IPU2             0x95800000
    #define DRA7_RPROC_CMA_BASE_DSP1             0x99000000
    #define DRA7_RPROC_CMA_BASE_DSP2             0x9f000000
     
    #define DRA7_RPROC_CMA_SIZE_IPU1             0x02000000
    #define DRA7_RPROC_CMA_SIZE_IPU2             0x03800000
    #define DRA7_RPROC_CMA_SIZE_DSP1             0x04000000
    #define DRA7_RPROC_CMA_SIZE_DSP2             0x00800000

    The definitions above should match the definitions in the file arch/arm/boot/dts/dra7-evm.dts in the kernel source.

    ipu1_cma_pool: ipu1_cma@9d000000 {
        compatible = "shared-dma-pool";
        reg = <0x9d000000 0x2000000>;
        reusable;
        status = "okay";
    };
    
    ipu2_cma_pool: ipu2_cma@95800000 {
        compatible = "shared-dma-pool";
        reg = <0x95800000 0x3800000>;
        reusable;
        status = "okay";
    };
    
    dsp1_cma_pool: dsp1_cma@99000000 {
        compatible = "shared-dma-pool";
        reg = <0x99000000 0x4000000>;
        reusable;
        status = "okay";
    };
    
    dsp2_cma_pool: dsp2_cma@9f000000 {
        compatible = "shared-dma-pool";
        reg = <0x9f000000 0x800000>;
        reusable;
        status = "okay";
    };

    If the allocations do not match, MLO execution will fail when trying allocate memory for the carveouts.

    You can use the binary print_macros_for_mlo from the dra7xx-earlyboot-utils repository to parse the dtb and print the macros that need to be included in the MLO.

  3. In case of Kernel 4.4/u-boot 2016.05, there is an additional memory area that needs to be reserved. This area is used for storing the MMU page tables of the individual cores. It is specified in the late attach device tree as shown below.

    &reserved_mem {
        latea_pagetbl: late_pgtbl@bfc00000 {
            reg = <0x0 0xbfc00000 0x0 0x100000>;
            no-map;
            status = "okay";
        };
    };

    For each core, we reserve 16 KB for the Level 1 page table and an additional 16 KB for Level 2 page tables i.e. 32 KB for core. This information is passed to the boot loader via the below macros in board/ti/dra7xx/lateattach.c.

    #define DRA7_PGTBL_BASE_IPU1                 0xbfc00000
    #define DRA7_PGTBL_BASE_IPU2                 0xbfc08000
    #define DRA7_PGTBL_BASE_DSP1                 0xbfc10000
    #define DRA7_PGTBL_BASE_DSP2                 0xbfc18000

    Based on the usage in early boot, we need only 4x32 = 128 KB of memory for the pagetables. However the carveout in memory is 1 MB in size. This can be reduced to 128 KB if the system is under a memory constraint.

    You can use the binary print_macros_for_mlo from the dra7xx-earlyboot-utils repository to parse the dtb and print the base address of the IPU1 page table location. Page table addresses for other cores are placed at a linear offset starting with this address.

  4. MLO first loads the remotecore binaries from storage media to a temporary DDR address. Then MLO parses the binaries and copies the code/data sections to the their final locations. Please ensure that the physical addresses used by the remotecore binaries during execution do not overlap with these temporary load addresses.

    The location of the macros controlling these temporary load locations is listed below.

    U-boot 2014.07:Temporary load address for Early boot binaries
    Core Load Address Defined in

    IPU1

    IPU_LOAD_ADDR

    include/configs/dra7xx_evm.h

    DSP1

    DSP_LOAD_ADDR

    include/configs/dra7xx_evm.h

    IPU2

    IPU_LOAD_ADDR

    include/configs/dra7xx_evm.h

    DSP2

    DSP_LOAD_ADDR

    include/configs/dra7xx_evm.h

    In u-boot 2016.05, each core is assigned a distinct temporary load address. The header file in which the macro is defined is also modified.

    U-boot 2016.05:Temporary load address for Early boot binaries
    Core Load Address Defined in

    IPU1

    IPU1_LOAD_ADDR

    include/configs/ti_omap5_common.h

    DSP1

    DSP1_LOAD_ADDR

    include/configs/ti_omap5_common.h

    IPU2

    IPU2_LOAD_ADDR

    include/configs/ti_omap5_common.h

    DSP2

    DSP2_LOAD_ADDR

    include/configs/ti_omap5_common.h

Enabling Early Boot

Early Boot functionality needs to be explictly enabled in u-boot.

U-Boot 2014.07

This is done by definining the flag "CONFIG_LATE_ATTACH" in the configuration file. In case of GLSDK, change the below lines in include/configs/dra7xx_evm.h from

#undef CONFIG_LATE_ATTACH

#ifdef CONFIG_LATE_ATTACH

to

#define CONFIG_LATE_ATTACH

#ifdef CONFIG_LATE_ATTACH

and rebuilt u-boot using the configuration dra7xx_evm_config.

U-Boot 2016.05

Please enable the configuration option "CONFIG_LATE_ATTACH" in the file configs/dra7xx_evm_defconfig.h. Change the lines

CONFIG_LATE_ATTACH=n

to

CONFIG_LATE_ATTACH=y

and rebuilt u-boot using the configuration dra7xx_evm_config.

Choosing the cores to early boot

The array cores_to_boot in the file common/spl/spl.c:board_init_r() controls which cores are loaded by MLO.

#ifdef CONFIG_LATE_ATTACH
    u32 cores_to_boot[] = { IPU2, DSP1, DSP2, IPU1 };
#endif

This array can be modified to specify the cores to loaded from MLO and the order in which the cores should be loaded.

Binary Placement

When building with the GLSDK/PSDKLA configuration, the binaries are expected to be present on the same partition as the boot loader on the SD card or on the eMMC card.

Binary Locations for GLSDK
Core Binary name Location
DSP1 dra7-dsp1-fw.xe66 Boot partition(FAT)
DSP2 dra7-dsp2-fw.xe66 Boot partition(FAT)
IPU1 dra7-ipu1-fw.xem4 Boot partition(FAT)
IPU2 dra7-ipu2-fw.xem4 Boot partition(FAT)

The binaries can be stripped of debug symbols to save on the size.

Customizing Early Boot for a Usecase

The Early boot code in U-Boot does the necessary configuration to bring up a remotecore. This includes the timers and the MMUs. It does not configure any other peripherals by default. Some usecases may require additional peripheral configuration before running the remotecore. U-Boot includes placeholder functions that can be populated for this purpose. These can be found in the file board/ti/dra7xx/lateattach.c.

/*
 * If the remotecore binary expects any peripherals to be setup before it has
 * booted, configure them here.
 *
 * These functions are left empty by default as their operation is usecase
 * specific.
 */
 
u32 ipu1_config_peripherals(u32 core_id, struct rproc *cfg)
{
    return 0;
}
 
u32 ipu2_config_peripherals(u32 core_id, struct rproc *cfg)
{
    return 0;
}
 
u32 dsp1_config_peripherals(u32 core_id, struct rproc *cfg)
{
    return 0;
}
 
u32 dsp2_config_peripherals(u32 core_id, struct rproc *cfg)
{
    return 0;
}

As an example, an audio playback usecase running on DSP1 needs McASP3, I2C1, I2C2 and a PLL configuration. These can be configured in the board/ti/dra7xx/lateattach.c:dsp1_config_peripherals() function.

u32 dsp1_config_peripherals(u32 core_id, struct rproc *cfg)
{
    u32 reg=0;
 
    /* enable I2C1 */
    reg = __raw_readl(CM_L4PER_I2C1_CLKCTRL);
    __raw_writel((reg & ~0x00000003)|0x2, CM_L4PER_I2C1_CLKCTRL);
 
    reg = __raw_readl(CM_L4PER_I2C2_CLKCTRL);
    __raw_writel((reg & ~0x00000003)|0x2, CM_L4PER_I2C2_CLKCTRL);
 
    /* enable mcasp3 */
    reg = __raw_readl(CM_L4PER2_MCASP3_CLKCTRL);
    __raw_writel((reg & ~0x00000003)|0x2, CM_L4PER2_MCASP3_CLKCTRL);
 
    dpll_abe_opp_config(44100);
    return 0;
}

Testing early boot

  1. Place the MLO built with early boot enabled and the remotecore binaries in the specified locations and power on the EVM.
  2. The MLO should locate the remotecore binary and proceed to load it and then jump to U-Boot or Kernel.

An easy way to verify that early boot is working is by stopping the A15 execution at the U-Boot prompt and connecting to the remotecore via a JTAG. If connecting to a remotecore via JTAG does not work, please refer to the section of "Debugging Early Boot" later in the document.
Another way to check the functionality is to execute the below command after kernel boot-up. When the core is bootstrapped from u-boot, the values of 'Time at reset()', 'Time at startup()' and 'Time at main()' should be below 60,000 ticks since it typically takes less than 2s to load the firmware. These values are above 300,000 when the firmware is bootstrapped from kernel.

root@dra7xx-evm:~# cat /sys/kernel/debug/remoteproc/remoteproc0/trace0
[0][      0.000] Watchdog enabled: TimerBase = 0x68824000 SMP-Core = 0 Freq = 19200000
[0][      0.000] Watchdog enabled: TimerBase = 0x68826000 SMP-Core = 1 Freq = 19200000
[0][      0.000] Watchdog_restore registered as a resume callback
[0][      0.000] 18 Resource entries at 0x3000
[0][      0.000] messageq_single.c:main: MultiProc id = 2
[0][      0.000] Time at reset() is 51615 ticks
[0][      0.000] Time at startup()  is 51726 ticks
[0][      0.000] Time at main()  is 51804 ticks
[0][      0.000] registering rpmsg-proto:rpmsg-proto service on 61 with HOST
[0][      0.000] tsk1Fxn: created MessageQ: SLAVE_IPU1; QueueID: 0x20080
[0][      0.000] Awaiting sync message from host...

In the next section, we describe the kernel modifications necessary to allow it to connect to a remotecore already loaded by MLO.

Enabling Late attach

Loading the remotecores in the kernel is done via the remoteproc module. Each remotecore requires timers for OS tick and watchdog purposes and MMU's for mapping virtual addresses to physical addresses. The remoteproc module uses device tree determine the timers and mmu's used for each remotecore.

The device tree nodes for each of the cores are shown below. The allocation of timers to remotecores is from the files arch/arm/boot/dts/dra7-evm.dts and arch/arm/boot/dts/dra7.dtsi in the kernel source tree.

Core Remotecore node OS timer node Watch dog timer node(s) MMU node(s)
IPU2 ipu2 timer3 timer4,timer9 mmu_ipu2
IPU1 ipu1 timer11 timer7,timer8 mmu_ipu1
DSP2 dsp2 timer6 mmu0_dsp2,mmu1_dsp2
DSP1 dsp1 timer5 timer10 mmu0_dsp1,mmu1_dsp1

During the normal boot flow, Linux kernel resets, idles and configures all functional blocks to reach a known initial state. This sequence of operations will terminate execution on a remotecore started by the boot loader. To prevent this from happening, the following attributes need to be set on each device tree node corresponding to the remotecore.

  1. ti,late-attach
  2. ti,no-idle-on-init
  3. ti,no-reset-on-init.

These three attributes together signal to the kernel that

  1. Late attach feature is in use for the remote core.
  2. The remotecore and other nodes have been configured and are in use before the kernel boot. These should not be reset or idled during kernel boot.

An example showing the device tree modifications necessary when late attaching to IPU2 are shown below. Please note that the attributes are set on the ipu2 node as well as the timers and mmu nodes used by IPU2.

&ipu2 {
    ti,late-attach;
    ti,no-idle-on-init;
    ti,no-reset-on-init;
};

&timer3 {
    ti,late-attach;
    ti,no-idle-on-init;
    ti,no-reset-on-init;
};

&timer4 {
    ti,late-attach;
    ti,no-idle-on-init;
    ti,no-reset-on-init;
};

&timer9 {
    ti,late-attach;
    ti,no-idle-on-init;
    ti,no-reset-on-init;
};

&mmu_ipu2{
    ti,late-attach;
    ti,no-idle-on-init;
    ti,no-reset-on-init;
};

You can use the binary check_late_props from the dra7xx-earlyboot-utils repository to parse the dtb and check whether late attach properties are applied on each remotecore node.

Validation

Early Boot/Late attach functionality has been validated with binaries for different usecases.

  1. The IPC messageq_single sample application has been used to validate late attach for all the cores as a first level test. All the cores were loaded and functionality verified. On PSDKLA, these binaries can be found prebuilt in the target file system.

    http://processors.wiki.ti.com/index.php?title=Processor_SDK_Linux_Automotive_Software_Developers_Guide#User_space_sample_application

    On GLSDK, one can build these applications using the SDK top level makefile.

  2. IPU2 late attach has been verified using the IPUMM firmware(dra7-ipu2-fw.xem4) supplied in GLSDK.

  3. DSP1, DSP2 and IPU1 binaries from Vision SDK were used to verify late attach.

In the next sections, we will discuss how to debug any issues encountered with Early Boot and Late Attach.

Debugging Early Boot

  1. Ensure that Linux kernel is able to boot the remotecore using the same binary.

  2. Ensure that the MLO has been built with the CONFIG_LATE_ATTACH macro enabled in the right configuration file.

  3. Ensure that the desired cores are entered in the cores_to_boot array.

  4. If you are facing issues with early loading multiple cores, try early loading one core at a time to isolate issues.

  5. In case of PSDKLA/GLSDK, MLO looks for binaries in the FAT partition of the eMMC or of the SD card. The binaries are expected to have the same names as expected by the Linux Kernel. Please confirm the name and the location of the binaries.

    Binary Names
    Core Binary name

    DSP1

    dra7-dsp1-fw.xe66

    DSP2

    dra7-dsp2-fw.xe66

    IPU1

    dra7-ipu1-fw.xem4

    IPU2

    dra7-ipu2-fw.xem4

  6. The remotecore binary should contain a resource table for MLO to setup the MMU mappings and copy the ELF sections to the desired locations. If the remotecore binary can be loaded in the normal manner by the kernel, this is taken care off. For more information on the resource table, refer to

    1. http://lwn.net/Articles/489009/
    2. http://processors.wiki.ti.com/index.php/IPC_Resource_customTable

    You can use the binary dump_rsc_table from the dra7xx-earlyboot-utils repository to parse the remotecore binary and print the resource table in human readable form.

  7. Ensure that the location of the memory pools specified in the kernel device tree is the same as the locations specified in MLO. Please see the above section on "Pre-Flight Checks" for details.

  8. As long as the above steps are taken care off, MLO should load the remotecore binaries and start them. To trace the load process, please enable debug prints on the following files.

    1. board/ti/dra7xx/lateattach.c.
    2. common/elf_remoteproc.c
    3. common/spl/spl.c

    Enabling debug flag can be done by either doing

    #define DEBUG

    at the top of the source file. This should be done before including common.h. Otherwise the debug flag can be set through the makefile in the appropriate directory. For lateattach.c, below change can be made in board/ti/dra7xx/Makefile.

    CFLAGS_lateattach.o := -DDEBUG

Once the remotecore is loading and running, there are two kinds of faults that can be observed.

  1. MMU Faults - These faults are mostly due to incorrect or incomplete resource table configuration.
  2. Peripheral configuration faults - These faults are due to a subsystem or peripheral required by the remotecore not being powered on/configured by u-boot.

Debugging MMU Faults

In the normal flow, Linux kernel handles the MMU error interrupts, prints the MMU fault address and reloads the core. When loading the remotecore from MLO, we need to check the MMU registers for any faults. MMU faults can be determined by checking the MMU_IRQSTATUS register of MMU. In case of a translation fault, the address with the fault can be found by reading the MMU_FAULT_AD register in the MMU.

As the MMU error handling is normally done on the A15, MLO configures the MMU to not generate an interrupt into the remotecore. As we are polling the MMU registers for errors, it is possible that the captured MMU fault address is not the first one that occurred.

As a temporary debug measure, we can configure the MMU to generate an interrupt to the remotecore itself by changing the below lines in board/ti/dra7xx/lateattach.c from

reg = __raw_readl(mmu_base + 0x88);
 
/* enable bus-error back */
__raw_writel(reg | 0x1, mmu_base + 0x88);

to

reg = __raw_readl(mmu_base + 0x88);
 
/* enable bus-error back */
__raw_writel((reg & (~0x1)), mmu_base + 0x88);

This will cause the remotecore to jump to the exception handler and possibly prevent further MMU faults from overwriting the first address where the MMU fault occured.

Once the MMU fault address is determined and it is a valid address, the fault can be fixed by including a mapping for the fault address in the resource table. This is usually the case when we need to access certain L3 and L4 regions or physical memory locations.

Code changes for debugging

If the above approach does not help, one can try to

  1. Put a while() loop in the main function of the remotecore.

    volatile int ccs_dbg_flag = 0;
    void wait_for_debugger(void)
    {
        ccs_dbg_flag = 1;
        System_printf("Entering Infinite loop\n");
        while (ccs_dbg_flag == 1)
            asm(" NOP");
        return;
    }
     
    Int main(Int argc, Char* argv[])
    {
        wait_for_debugger();
  2. Load the remotecore via MLO.

  3. Stop A15 execution before it reaches kernel.

  4. Connect to the remotecore via JTAG and step through the remotecore execution until the MMU fault is encountered.

If the remotecore encounters a fault even before reaching the main() function, one can add a SYSBIOS hook that calls a user defined function on taking the core out of reset and place the while() loop in the startup function.

In the SYSBIOS .cfg file, do the following.

var Startup = xdc.useModule('xdc.runtime.Startup');
Startup.resetFxn = "&dbgResetFxn";
Startup.firstFxns[Startup.firstFxns.length++] = "&dbgUserStartupFxn";

Add the below functions to your source.

void dbgUserStartupFxn(void)
{
        wait_for_debugger();
}
 
void dbgResetFxn(void)
{
        wait_for_debugger();
}

With these changes, one should be able to determine the code causing the MMU fault.

Debugging Peripheral configuration Faults

Peripheral configuration faults can be easily avoided by making a list of the peripherals or subsystems accessed by the remotecore and powering them on in MLO. MLO includes placeholder functions that are invoked before starting a remotecore. These functions can be populated to power on the required peripherals.

Debugging Late Attach

  1. Ensure that all the 3 late attach attributes are set on the device tree nodes corresponding to the remotecore node being loaded from the boot loader. Otherwise the kernel will reset and reload the remotecore as in the normal boot flow.
  2. Ensure that the 3 late attach attributes are set only on the device tree nodes corresponding to the remotecore node being loaded from the boot loader. Otherwise the kernel will try to communicate with a remotecore that is not loaded and run into an error or a crash in a worst case scenario.
  3. Ensure that the peripherals accessed by the remotecore are not being handled by the kernel. This can be accomplished by removing the corresponding nodes from the device tree.

Handling various peripherals/usecases

GPIO

If you find, kernel turning off GPIO clocks below are two ways to keep them on. The example below uses GPIO2 as an example.

  1. Add below attributes to the GPIO node in device tree. This is only necessary if the GPIO2 pins are also being accessed from kernel.

    &gpio2 {
        ti,no-idle;
        ti,no-idle-on-init;
        ti,no-reset-on-init;
    };
  2. Delete the GPIO node from the device tree.

    /delete-node/ &gpio2;

In both cases, the GPIO2 instance remains on.

root@dra7xx-evm:~# omapconf read 0x4a009760
00020001
root@dra7xx-evm:~# omapconf read 0x48055134
FFFFFFFE

Vision SDK

Vision SDK uses DSS, VIP, VPE from the M4 core. Customers typically do an early boot of the M4 binary to bring up the display and camera quickly. In such scenarios, the peripherals used by Vision SDK need to be turned on before starting the M4 and DSP. Below patch can be used a starting point for the required code changes.

No. URL Headline
1 http://review.omapzoom.org/38438 spl: dra7xx: early boot: enable peripherals for vision sdk