Please note as of Wednesday, August 15th, 2018 this wiki has been set to read only. If you are a TI Employee and require Edit ability please contact x0211426 from the company directory.

Debugging AM335x Suspend-Resume Issues

From Texas Instruments Wiki
Jump to: navigation, search

Introduction

This page is geared toward Linux developers using the AM335x family of embedded processors. Many of the topics contained in the page, however, will still apply to users of other operating systems.

The goal of this page is to give users some insight into common issues they might bump into, as well as tools to help identify issues.

Hardware Setup

First and foremost you must be certain that your hardware is configured appropriately. In general:

  • Consult the data manual for a general diagram of how the memory should be connected to the AM335x.
  • When using a single 16-bit external memory device, there is no need for VTT termination for the address/control lines. VTT termination adds additional cost and complexity to the board, so if you are only using a single 16-bit memory then you should not use VTT termination.
  • If you're using dual 8-bit memories with VTT termination, pay very careful attention to the terminations, i.e. some signals get pulled to VTT while others are pulled to VDDS_DDR.
  • If using VTT termination and want to control the LDO (TPS51200 or equivalent) then you must select a pin from GPIO bank 0.
  • A pull-up on DDR_RESETn is not recommended since we want reset to be asserted (low) during power-up.
  • Since most peripherals are turned off during DeepSleep0, it is necessary to specify the state of each and every pin associated with the PER domain. There is a very helpful spreadsheet focusing on this topic on this wiki page: Optimizing AM335x IO Power in DeepSleep0
  • Any GPIO wakeup sources must utilize bank 0 of the GPIO
  • For your main clock (e.g. 24 MHz, etc.) you can use either a crystal or a LVCMOS square wave clock. There is a power benefit to using a crystal because there is hardware inside the chip that can shutoff the crystal entirely during DeepSleep0. When using a square wave clock there is unfortunately no mechanism for automatically turning the clock off and on, which results in additional current consumption.


Software

Configuring the I/O values for DeepSleep0

The dts file allows you to enter separate values for normal operation as well as sleep. The actual run-time change is driver dependent. There is a call to pinctrl_pm_select_sleep_state that is usually in a suspend handler within the driver. It is generally the last thing done before sleep. Therefore the IP should be ready to be shut off and not doing anything in preparation of having its clocks disabled.

As a general note, the mux mode does NOT need to be changed for DS0. A different mux mode needs to be chosen only in the case where the spreadsheet from Optimizing AM335x IO Power in DeepSleep0 dictates that a mux change is needed. In these cases you should use the spreadsheet to find a mux mode that it "likes" in order to achieve the necessary state on the pin. Changing the mux mode does not reduce power consumption, so there's not a general need to do it. The only reason to do it is in cases where there is a conflict, e.g. the pin drives low during DS0 but you need to have it high. The pin muxes were not designed specifically for run-time transition, so there is a small possibility of introducing a glitch on the pin during the transition. So in general changing the muxes can/should be avoided.


VTT considerations

SDK 6.00
  • VTT_toggle gpio is hardcoded to GPIO0_7 in sleep33xx.S line: 262. Though it uses the evm_id to enable this feature.
Processor SDK 1.00 and later
  • See arch/arm/boot/dts/am335x-evmsk.dts for reference. Look for the "wkup_m3" entry.

Controlling the Power Management IC during DeepSleep0

Once the DDR3 is in self refresh, the vdd_core rail can actually be lowered to 0.95V to save even more power. This can be achieved through the "scale-data.bin" files that are part of the firmware.

More in depth documentation can be found at Linux Core Power Management User's Guide#Deep_Sleep_Voltage_Scaling

Patches for Sitara SDK 6.00

Beagle Bone Black Suspend/Resume Fix

Patch to fix suspend/resume for BeagleBone Black

diff --git a/arch/arm/mach-omap2/sleep33xx.S b/arch/arm/mach-omap2/sleep33xx.S
index 1ee0b1a..8e4aa19 100644
--- a/arch/arm/mach-omap2/sleep33xx.S
+++ b/arch/arm/mach-omap2/sleep33xx.S

@@ -789,6 +789,10 @@ sdram_ref_ctrl:

        str     r4, [r3, #EMIF4_0_SDRAM_REF_CTRL_SHADOW]

pmcr:
        ldr     r4, emif_pmcr_val
+       bic     r4, r4, #0x0700
+       orr     r4, r4, #0x0200
+       str     r4, [r3, #EMIF4_0_SDRAM_MGMT_CTRL]
+       bic     r4, r4, #0x0700
        str     r4, [r3, #EMIF4_0_SDRAM_MGMT_CTRL]

pmcr_shdw:
        ldr     r4, emif_pmcr_shdw_val

Patch for Processor SDK 1.00

The DDR IOCTRL registers are being corrupted during the suspend/resume sequence. The following patch introduces code to save/restore those registers during a suspend/resume sequence. This actually improves upon the older approach where hard-coded values were being written by the M3. With this new patch, customers with customized IOCTRL values (i.e. based on IBIS analysis, etc.) will have their values preserved.

0001-CM3-ddr-Split-335x-and-437x-ddr-io-ctrl-handling.patch

Patch for Processor SDK 2.00

Processor SDK 2.00.00 through 2.00.02 all use the same patch as Processor SDK 1.00:

0001-CM3-ddr-Split-335x-and-437x-ddr-io-ctrl-handling.patch

Patch for Processor SDK 3.00

This resolution to this issue ("Split 335x and 437x ddr io ctrl handling") is part of Proc SDK 3.01. Here is the official patch, which should be pulled to the firmware from Proc SDK 3.00:

http://git.ti.com/cgit/cgit.cgi/processor-firmware/ti-amx3-cm3-pm-firmware.git/patch/?id=97c2c32d0bc8ca0254710dcb5df055aa9a569ae6

Alternatively, just grab the binary from Proc SDK 3.01.

Validating Suspend/Resume

There are two main things to validate:

  1. That all of the domains powered down as expected.
  2. That the IO values correlate with your spreadsheet.

Verifying all domains transitioned

When you begin the suspend sequence you see something like this:

root@am335x-evm:~# echo mem > /sys/power/state
[ 3394.834101] PM: Syncing filesystems ... done.
[ 3395.466312] Freezing user space processes ... (elapsed 0.002 seconds) done.
[ 3395.476099] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 3395.485132] Suspending console(s) (use no_console_suspend to debug)

It's actually during resume that you can see whether or not all the power domains transitioned as expected. After hitting a key on the console to resume you will see something like this in the case of a successful suspend/resume sequence:

[ 3395.628781] PM: suspend of devices complete after 136.014 msecs
[ 3395.631267] PM: late suspend of devices complete after 2.446 msecs
[ 3395.634411] PM: noirq suspend of devices complete after 3.097 msecs
[ 3395.634427] PM: Successfully put all powerdomains to target state
[ 3395.634427] PM: Wakeup source UART
[ 3395.668907] PM: noirq resume of devices complete after 34.296 msecs
[ 3395.671091] PM: early resume of devices complete after 1.911 msecs
[ 3395.672008] net eth0: initializing cpsw version 1.12 (0)
[ 3395.753887] net eth0: phy found : id is : 0x4dd074
[ 3395.754026] libphy: PHY 4a101000.mdio:01 not found
[ 3395.754044] net eth0: phy 4a101000.mdio:01 not found on slave 1
[ 3396.103275] PM: resume of devices complete after 432.127 msecs
[ 3396.172099] Restarting tasks ... done.

If you get the message in green above, then things appear to be in good shape. In the case of a failed suspend/resume sequence, it will look like this:

root@am335x-evm:~# echo mem > /sys/power/state
[  615.163344] PM: Syncing filesystems ... done.
[  615.176086] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  615.185179] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  615.194341] Suspending console(s) (use no_console_suspend to debug)
[  615.319638] PM: suspend of devices complete after 117.927 msecs
[  615.321581] PM: late suspend of devices complete after 1.905 msecs
[  615.324047] PM: noirq suspend of devices complete after 2.423 msecs
[  615.324064] PM: Could not transition all powerdomains to target state
[  615.324064] PM: Wakeup source UART
[  615.342318] PM: noirq resume of devices complete after 18.079 msecs
[  615.343951] PM: early resume of devices complete after 1.390 msecs
[  615.344757] net eth0: initializing cpsw version 1.12 (0)
[  615.347859] net eth0: phy found : id is : 0x7c0f1
[  615.347975] libphy: PHY 4a101000.mdio:01 not found
[  615.347992] net eth0: phy 4a101000.mdio:01 not found on slave 1
[  615.692217] PM: resume of devices complete after 348.214 msecs
[  615.763381] Restarting tasks ... done.

In the case all power domains did not transition you will need to use JTAG to determine which power domain and clocks are still alive. Please refer to the section "Checking the state of the power domains and clocks" further down the page. A script has been developed to make this job much easier.

Verifying the I/O State

The other major place for power consumption to occur is with the I/Os. If for example, an external device is driving an I/O pin high, but the internal pull-down is configured, then you are going to have constant leakage on that pin. Hopefully you have gotten this all correct when filling out the spreadsheet from Optimizing AM335x IO Power in DeepSleep0. And furthermore, hopefully everything has been entered correctly into the device tree so that the pins are in the proper state.

More often than not, given so many pins, there are a few mistakes lurking... To find the mistakes the best approach is to measure each pin with a multimeter and record the corresponding voltage in Column U of the spreadsheet. Do they all match the expected state? If they don't match, or you have an in-between level then you need to revisit that particular pin. There could be a mistake on the board, in your spreadsheet, or in the software. These mistakes all have a cumulative impact.

NoteNote: Floating pins consume significant current. For any pins where you are disabling the internal pull during DeepSleep0 you need to be absolutely sure that the pin is being driven to a known state by an external device. When probing a floating pin with a multi-meter, the ~1Mohm impedance to ground is often sufficient to pull a floating pin low and hide the fact that you have a floating pin.

Finding Underlying Issues

Investigating a mystery hang using no console_suspend

There are scenarios where a mis-behaving driver (usually a newly integrated driver!) can introduce issues. If the crash happens while the UART is disabled, you won't see any of the output. One method of checking for this type of issue by setting console_suspend to "no":

echo no > /sys/module/printk/parameters/console_suspend

NoteNote: With this configuration, the console cannot be used as a wakeup source!

So for example, on PG2.1 silicon you can use the RTC as a wakeup source (broken on PG1.0). The following command will use RTC to wake back up in 10 seconds:

rtcwake -d /dev/rtc0 -m mem -s 10

Alternatively you could use a GPIO bank 0 wakeup, etc. In this way, if a driver crashes during the suspend/resume sequence, you should be able to see the kernel panic, which will help you track down the issue.

Inspecting the SoC state immediately prior to entering DeepSleep0

One of the main difficulties of debugging issues pertaining to DeepSleep0 is that you have limited visibility once everything (or nearly everything) is turned off! When entering DeepSleep0, the Cortex A8 will turn off many clocks but the final steps of turning off the various power domains is handled by the wakeup controller (i.e. Cortex M3). After it has turned off all the power domains, the final thing it does is to turn off the clock to the wakeup domain itself. Once that clock is turned off you no longer have JTAG connectivity.

The main goal of this particular section is to enable you to connect to the M3 at the point right after the M3 has attempted to turn off all the power domains, but right before the wakeup clock is turned off such that you can still use JTAG. So in order to do this, we will need to modify the power management firmware to put a spin loop at the place where we want to connect with JTAG.

Downloading the Power Management Firmware

If you look in the software manifest for a Linux SDK, there is a line that tells the corresponding firmware version. For example, in Processor SDK 2.00, it mentions amx3-cm3 version 1.9.1 from git://git.ti.com/ti-cm3-pm-firmware/amx3-cm3.git;protocol=git;branch=ti-v4.1.y. So you can do:

mkdir ~/git-clones
cd ~/git-clones
git clone git://git.ti.com/ti-cm3-pm-firmware/amx3-cm3.git
cd amx3-cm3/
git checkout -b PSDK_2.00 origin/ti-v4.1.y

So now you have a local version of the Cortex M3 firmware.

Building/Editing the Code

Make the following edit:

diff --git a/src/pm_services/pm_handlers.c b/src/pm_services/pm_handlers.c
index d4bd911..f9a04f2 100644
--- a/src/pm_services/pm_handlers.c
+++ b/src/pm_services/pm_handlers.c
@@ -83,6 +83,7 @@ void a8_lp_ds0_handler(struct cmd_data *data)
        unsigned int per_st;
        unsigned int mpu_st;
        int temp;
+       volatile int stop=1;

        if (cmd_handlers[cmd_global_data.cmd_id].do_ddr)
                ds_save();
@@ -138,6 +139,8 @@ void a8_lp_ds0_handler(struct cmd_data *data)
                }
        }

+       while(stop) {}; // connect with JTAG and set stop=0 to continue
+
        /* TODO: wait for power domain state change interrupt from PRCM */
        clkdm_sleep(CLKDM_WKUP);
 }

Next, go to the root directory of Processor SDK 2.00:

cd ~/ti-processor-sdk-linux-am335x-evm-02.00.00.00

Edit the Makefile there in the base directory. Right after the linux_clean rule you can insert these rules:

cm3:
        @echo =================================
        @echo     Building the M3 Firmware
        @echo =================================
        $(MAKE) -C ~/git-clones/amx3-cm3 ARCH=arm CROSS_COMPILE=$(CROSS_COMPILE) all

cm3_clean:
        @echo =================================
        @echo     Cleaning the M3 Firmware
        @echo =================================
        $(MAKE) -C ~/git-clones/amx3-cm3 ARCH=arm CROSS_COMPILE=$(CROSS_COMPILE) clean

NoteNote: Make sure that the indented lines are with TABS and not SPACES or these rules won't work!

Now from your SDK base directory you can clean and build with these commands:

make cm3_clean
make cm3

That will generate the executable amx3-cm3/bin/am335x-pm-firmware.elf which should be copied into the /lib/firmware directory of the AM335x root file system.

M3 Firmware Binaries (with jtag spin loop)

For quick usage you can use a pre-built firmware with the spin loop integrated. It needs to be copied into the AM335x file system in the /lib/firmware directory.

Connecting to the M3 during suspend

Once your modified M3 firmware is in the file system you can initiate a suspend to DeepSleep0 (DS0):

echo mem > /sys/power/state

In CCS perform the following steps:

  1. If you have not already created a Target Configuration File, go to File -> New -> Target Configuration File. Select your JTAG probe and your device.
  2. In the Target Configurations window (View -> Target Configurations), right-click on your Target Configuration and select Launch Selected Configuration.
    Am335x-suspend-target-config.png
  3. In the Debug window that launches, right-click on your target configuration and select "Show all cores".
    Am335x-suspend-show-all-cores.png
  4. Right-click on M3_wakeupSS_0 and select "Connect Target".
    Am335x-suspend-M3-connect.png


NoteNote: If you're having trouble with connecting to a core it might be due the debug clock being turned off. If that's the case, reboot your board and from the console you can poke the CM_WKUP_DEBUGSS_CLKCTRL register by executing "devmem2 0x44e00414 w 0x12500002".

At this point you will now be connected to the M3 and ideally almost the entire chip should be powered down. Here's a diagram:

Am335x-suspend-interconnect-view.png

We are connected to the highlighted Cortex M3, and the goal at this point is to interrogate the Control Module and the PRCM in order to verify that everything is truly powered down, and to find sources of power consumption.


Validating the State of the DDR Bus

One of the more complex areas managed by software is the DDR interface. There are many registers related to this configuration and so a script was written to help scrape these values from memory and decode them to provide insight.

Once you have managed to halt the M3 and connect with CCS during the suspend path, you can follow these steps:

  1. Download am335x-ddr-analysis.dss.
  2. Launch CCS.
  3. Create an appropriate target configuration file for connecting to your board.
    • File -> New -> Target Configuration File
    • Supply a name/location for the file.
    • View -> Target Configurations to see the available target configurations (yours should now be among them!).
    • Double-click your file in the Target Configurations panel to open it for editing.
    • Select your emulator and processor. Save.
  4. Launch the debugger, but do not connect to any CPUs.
    • In the Target Configurations window, right-click on your ccxml file and select "Launch Selected Configuration".
  5. Launch the scripting console by going to View -> Scripting Console.
  6. Load am335x-ddr-analysis.dss in the scripting console by executing "loadJSFile <path-to-dss-file>/am335x-ddr-analysis.dss".
  7. It will use the Debug Access Port (DAP) unobtrusively behind the scenes such that the Cortex A8 is never halted. It will generate a am335x-ddr-analysis_yyyy-mm-dd_hhmmss.txt file on your desktop.

Things to check in the output file:

  • DDR_PHY_CTRL_1[20] reg_phy_enable_dynamic_pwrdn should be enabled all the time (active and sleep).
  • The ddr_cmd0/1/2_ioctrl and ddr_data0/1_ioctrl registers should have the same value before suspend and after resume.
  • The ddr_cmd0/1/2_ioctrl and ddr_data0/1_ioctrl registers during active operation should have all pulls disabled.
  • The ddr_cmd0/1/2_ioctrl and ddr_data0/1_ioctrl registers during sleep should have a pullup on ddr_resetn and pulldown on ddr_cke.

Checking the state of the power domains and clocks

  1. Download am335x-ds0-analysis.dss and am335x-ctt.dss.
  2. Launch CCS.
  3. Create an appropriate target configuration file for connecting to your board.
    • File -> New -> Target Configuration File
    • Supply a name/location for the file.
    • View -> Target Configurations to see the available target configurations (yours should now be among them!).
    • Double-click your file in the Target Configurations panel to open it for editing.
    • Select your emulator and processor. Save.
  4. Launch the debugger, but do not connect to any CPUs.
    • In the Target Configurations window, right-click on your ccxml file and select "Launch Selected Configuration".
  5. Launch the scripting console by going to View -> Scripting Console.
  6. Load am335x-ds0-analysis.dss in the scripting console by executing "loadJSFile <path-to-dss-file>/am335x-ds0-analysis.dss".
  7. Two files will be created on your desktop:
    • am335x-ds0-analysis_yyyy-mm-dd_hhmmss.txt
    • am335x-ds0-padconf_yyyy-mm-dd_hhmmss.csv
  8. Analysis of am335x-ds0-analysis_yyyy-mm-dd_hhmmss.txt:
    • Cortex A8, Graphics, and PER are supposed to all be off. Generally if there's a problem, it's something in "PER".
    • In some cases, this file might tell you precisely what failed (e.g. "TIMER7 active"). In other cases (e.g. "L3 active") you will need to continue onto the following steps to identify the precise peripheral.
  9. The file am335x-ds0-padconf_yyyy-mm-dd_hhmmss.csv is intended to facilitate entering data into the spreadsheet on this wiki page.
  10. Load am335x-ctt.dss in the scripting console by executing "loadJSFile <path-to-dss-file>/am335x-ctt.dss".
  11. A file named am335x-ctt_yyyy-mm-dd_hhmmss.rd1 will be created on your desktop. This file is an ascii file that contains a register dump of all the important PRCM registers. It is useful in identifying what peripherals are enabled that shouldn't be.
  12. Take the output file (am335x-ctt_yyyy-mm-dd_hhmmss.rd1) and perform a diff against a file taken from a "good" suspend sequence. (You might need to make one on the TI EVM if you cannot ever get your hardware to suspend.) You can then manually decode the corresponding registers that are different to identify which peripheral is still on.

Finish Suspend/Resume with JTAG

Once you've finished collecting all the data you want to allow the M3 core to finish the suspend sequence.

CCS 6.1.0 with Processor SDK 2.00 was having an issue with the ELF file (NOTE: fixed in CCS 6.1.1) and so the variables window was not showing the proper value. It was necessary to use a memory window set to address "SP" to modify the "stop" variable. Here's a picture:

Am335x-suspend-resume.png

To allow the AM335x to suspend all the way, you simply change the memory location shown above to zero, and then disconnect from the M3. When you disconnect that will start the M3 running again. It will immediately complete the transition to DeepSleep0. We choose "disconnect" instead of "Run" to prevent a "loss of connection" message in CCS which should lead to a more stable debug experience. At this point you should be able to resume as usual at this point, e.g. press a key on the console, etc.

Stepping through C Code - Load Symbols

It is possible to step through the C code executing on any of the processors of the AM335x by loading symbols. To step through the M3 firmware after halting execution with a while loop mentioned above, ensure the M3 core is selected in the Debug panel, then from the File menu, select Tools->Load Symbols->Load Symbols.

LoadSymbols.png

In the Load Symbols dialog box, select the am335x-pm-firmware.elf.

LoadSymbols-pm-firmware.png

On the first attempt, the code editor panel will need pointing to M3 firmware source tree- select the folder containing the source file.

Investigating a failed resume

Initial investigation should check the hardware:

  1. Check vdd_mpu and vdd_core at 3 points: before suspend, during suspend, after failed resume.
    • Normally these rails will drop to 0.95V during suspend, and they should return back to their original voltages upon resume.
  2. Has the main clock turned back on after the failed resume?
  3. Check DDR_RESETn and DDR_CKE on a scope to make sure there's not any issues with the levels of these signals. There can be issues here if there are improperly configured internal pullups/pulldowns or improper external pullups/pulldowns.
    • DDR_RESETn should remain high (i.e. VDDS_DDR voltage) the entire time.
    • DDR_CKE should be low during suspend. It should be high after resume.

Next you can move on to JTAG debug:

  1. Can you connect to the Cortex A8? If so, where is it? Be careful if the MMU is turned on. You can see this in the status bar at the bottom of CCS.
  2. Connect to the DAP and open a memory window to the base of DDR (0x80000000):
    • Launch the debugger.
    • In the "Debug" window, right-click on the ccxml file and select “Show all cores” (i.e. this is the same way as shown earlier for M3).
    • Expand the “Non Debuggable Devices” entry that comes up.
    • Right click on the one that says CS_DAP_DebugSS and choose “Connect”.
    • Go to View -> Memory and enter address 0x80000000.
  3. Disconnect from all cores (e.g. DAP and Cortex A8). Follow the directions from the earlier section "Checking the state of the power domains and clocks" to get a better view into what's on or off.

Video Tutorials

Successful suspend/resume

Failed suspend/resume

Finding the root cause of a failed suspend

Other Resources