NOTICE: The Processors Wiki will End-of-Life in December of 2020. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
DSP BIOS Debugging Tips
This topic describes a list of scenarios/samples of common mistakes/problems that users run into when using DSP/BIOS.
- 1 Problem: Software instability, i.e. code jumps to random location, crashes, behaves erratically
- 1.1 Cause: Stack overflow
- 1.2 Cause: Improper use of the interrupt keyword
- 1.3 Cause: Invoking BIOS APIs from the wrong context
- 1.4 Cause: Uninitialized global variables
- 1.5 Cause: MBX message size does not correspond to size of message buffer
- 1.6 Cause: Non-reentrant RTS functions only callable from TSK
- 1.7 Cause: Improper register/stack handling in hand-written assembly
- 2 Problem: The Real Time Analysis Tools Aren't Working!
- 3 Problem: Program hits a SYS call, but I can't see any output/error message.
- 4 Problem: Performance is really bad
- 5 Problem: My code isn't behaving the way I think it should.
Problem: Software instability, i.e. code jumps to random location, crashes, behaves erratically
BIOS itself is extremely stable software. However, there are several common mistakes that can cause instability. Luckily, most of these mistakes can be avoided or else detected/corrected.
Cause: Stack overflow
Description: From a software perspective, stack problems are the most common cause of stability issues. These issues can result from a stack overflow, a stack corruption or a stack-pointer (SP) corruption. This could be related to the system stack used by HWIs and SWIs as well as the individual task stats.
How to diagnose (Method 1):
- After code has gotten into a bad state, halt the processor and open the DSP/BIOS Kernel Object Viewer (KOV). In CCS 3.3 go to DSP/BIOS -> Kernel Object View. In CCS 4.x go to Tools -> ROV. Note that this does not require any kind of special instrumentation to be built into BIOS such as RTA or RTDX. Furthermore absolutely no code changes are required to use this feature.
- The Kernel Object Viewer will inspect all of the BIOS stacks (i.e. task stacks and system stack) to ensure that the watermarks are still present. If one of the stacks has overflowed, indicated by an exclamation point on the corresponding task/KNL in KOV.
- Here are some screenshots demonstrating a normal scenario as well as an overflow:
- In certain cases you may need to try and "catch" the issue sooner. For example, if the run away code corrupts the emulation you may experience a CCS disconnect after the issue happens. In this scenario you might use hardware such as Advanced Event Triggering (AET) to set a watchpoint on the end of the stack in order to catch the problem while the CPU is still in a good state. This method is described in the article Checking for Stack Overflow using Unified Breakpoint Manager. You might also try "method 2" or "method 3" as follows.
How to diagnose (Method 2):
- Update your BIOS configuration to check the stacks on every context switch.
- BIOS fills the stack with a known value (0xBE on the 6x). The TSK_checkstacks() function will check the top of stack for the "from" and "to" task to make sure the top element matches this cookie. Not that it is possible for the stack to overflow without overwriting the top element (i.e., a local variable or struct in a function that is not modified by the function could overlay the top of stack and never change -- and never corrupt the cookie), but this check will catch most stack overflows. The overhead is very small since we doing a couple of quick compares.
How to diagnose (Method 3):
- Extract Tsk_switch_dbg.tar.gz at your location of choice.
- Add the attached C to your project, i.e. Project -> Add Files to Project.
- The C file is calling LOG_printf and storing to a LOG called "trace" so you need to have defined a LOG object called trace for everything to build/run.
- To add the TCI just add the following to your TCF config file
- this will show the currTask and nextTask Stack pointers at each TSK switch.
- tsk_switch_dbg.c: TSK switch hook function useful for Stack Ptr debug
- tsk_switch_dbg.tci: TSK switch hook function config file for Stack Ptr debug
How to resolve:
- Go back to your BIOS configuration file and increase the size of the overflowing stack.
- System stack usage can be reduced by preventing SWI and HWI threads from nesting:
- Making SWI threads the same priority prevents them from preempting one another.
- The default mask for HWIs in BIOS is "self" meaning "can`t preempt/interrupt myself but can be interrupted by other ISRs". If you want to prevent nesting of interrupts you should set the mask to "all" in the interrupt configuration as follows:
- This FAQ gives more detail on the BIOS interrupt masks: Kbase FAQ 31530
Cause: Improper use of the interrupt keyword
Description: CLK functions and ISRs that use the interrupt dispatcher should not use the interrupt keyword.
How to diagnose:
- Extract List_interrupt.tar.gz at a location of your choice.
- Run perl script on list_interrupt.pl on an executable (.out) to find out if there are any functions that were declared with the interrupt keyword. Works only for codegen tools >= v5.1.6 for c6000, v3.2.3 for C55, v4.1.1 for C54, v4.1.2 for C2000. You will need to install the cg_xml package(available at https://www-a.ti.com/downloads/sds_support/applications_packages/cg_xml/index.htm my.ti.com free account registration required) for this to work. Requires ActiveState Perl 5.8.3 or above. Type perldoc list_interrupt.pl at the command prompt for more help on how to run.
- One should use EITHER the HWI_dispatcher OR the interrupt keyword, but never both.
Cause: Invoking BIOS APIs from the wrong context
Description: The BIOS API Guide has an appendix called "Function Callability Table". The table lists all BIOS APIs and specifies from what thread context it is legal to call them. Calling an API from the wrong context can cause stability issues.
- If you choose NOT to use the dispatcher then you are not allowed to call certain BIOS APIs from the ISR (i.e. cannot call SWI_post, SEM_pend, etc.).
- SEM_pend should not be called from a HWI or SWI except with a timeout of 0.
- LCK_* functions should not be called from a HWI/SWI
Cause: Uninitialized global variables
Description: By default global variables are not automatically initialized. This could be a problem because the user may assume an initialization of zero of these global variables.
Avoiding the issue:
- In your code make sure you always assign an initial value to variables before using them.
- Have startup code zero out the sections where global variables are allocated.
- Extract Init_utility.tar.gz at your location of choice.
- Load zeroInitSectionsCmdFile.tci and call the function zeroInitSectionsCmdFile in your .tcf file.
Cause: MBX message size does not correspond to size of message buffer
Related Symptoms/Problems: data missing in message buffer received
Description: All MBX objects have a message size property that is either set statically via BIOS configuration or passed dynamically through MBX_create. It is important for this size to match the size of the buffer that is passed in through MBX_pend and MBX_post, since these APIs do a memcpy internally to copy the message to and from the mailbox's own internal buffers. If there is a size mismatch, problems could occur in the following two scenarios:
- mbx message size > buffer size: MBX_pend would copy more data than there is space available in the buffer, corrupting memory immediately following the buffer. Since message buffers are often allocated on the stack, this could cause stack corruption.
- mbx message size < buffer size: MBX_post would copy only part of the data in the buffer into the mailboxes internal buffer, causing truncation of the message.
How to diagnose:
- Use MBX_validate()
- Extract Mbx_validate.tar.gz at a location of your choice
- Call MBX_validate before each call to MBX_pend and MBX_post. If an assertion is hit inside the MBX_validate function, then we know there is a mismatch. Follow the instructions in MBX_validate.h.
- It is recommended to set the mbx message size = buffer size at all times.
Cause: Non-reentrant RTS functions only callable from TSK
Description: Some RTS functions are not reentrant. The bulk of these functions are the file I/O functions (printf, fprintf, scanf, etc.). These RTS functions call lock(), which in a BIOS application gets linked to LCK_pend and LCK_post, and can not be called in the context of a SWI or HWI thread. For a complete list of RTS functions callable from TSK threads only you can search the rts sources for "lock".
How to diagnose:
- Extract Lck.tar.gz at your location of choice.
- A special version of lck_pend and lck_post has been provided that asserts stop-word opcodes if they are called from HWI & SWIs. Add this file to your build. See DSP/BIOS API guide for API calling convention.
Other note on C I/O functions: Regardless of whether or not BIOS is being used, the C I/O functions from the RTS library make use of the heap. If you have not defined a heap or have defined too small of a heap the C I/O functions will not operate correctly.
Cause: Improper register/stack handling in hand-written assembly
Description: Hand-written assembly often is developed/tested in isolation and therefore issues related to handling of stack/registers go undetected until the algorithm gets integrated. Detection: One method of detecting these issues is simply bracketing the algorithm process call with _disable_interrupts() and _restore_interrupts().
- Bracketing your algorithms with the disable/enable interrupts functions is intended to help you track down the issue.
- Once you determine the cause and make the appropriate correction then you can remove the bracketing.
Problem: The Real Time Analysis Tools Aren't Working!
Cause: Improper setup of BIOS configuration file
Cause: Necessary interrupts for RTDX not enabled
For a deep dive into this topic please see the RTA troubleshooting wiki page.
Problem: Program hits a SYS call, but I can't see any output/error message.
Common Symptoms: Program gets stuck spinning in UTL_halt, can't see output message from SYS_printf
Description: When SYS_abort, SYS_error is called, the error/system message is added to the system log. Other SYS functions such as SYS_printf, SYS_vprintf will output to the memory area labeled _SYS_PUTCBEG.
How to diagnose:
- You can view the system messages by selecting "Execution Graph Details" in the drop box of the message log window
- Any related messages outputted as a result using SYS_printf, SYS_putchar, etc. will appear under a memory area labeled _SYS_PUTCBEG. You can view this using the Memory Window in CCS by setting it to display in Character mode.
Problem: Performance is really bad
Description: Performance is much worse than you think it should be (orders of magnitude).
Cause 1: Improper cache configuration can cause serious performance degradation.
Solution 1: Follow the directions in the article Enabling 64x+ Cache. Also, make sure the cache coherency is being maintained (see this page). Newer C6000 processors have device drivers that already take this effect into account.
Cause 2: Suboptimal code/data placement in the internal/external memory.
Solution 2: See this FAQ for tips on which sections to place in internal memory.
Problem: My code isn't behaving the way I think it should.
- If something strange happens, such as the program unexpectedly exits then you should look at the kernel's message log. In CCS 3.3 you go to DSP/BIOS -> Message Log -> Execution Graph Details. If, for example, a hardware exception occurred (e.g. illegal opcode, etc.) then it would be logged here.
- When using Code Composer Studio, keep in mind that DSP/BIOS is command-line in nature (apart from the Graphical Configuration Tool). Therefore carefully check the Build window in Code Composer Studio for different or strange paths used by the tools - this can potentially indicate errors in include paths, especially after installing a new tool release.
- Make sure the environment variable BIOS_INSTALL_DIR points to the correct directory. CCS usually spawns a warning if the project explicitly uses this variable, however no warning is generated if this variable is used inside a specific source file or the code is being built using the command line tools.
<TODO> Add usage of LOG_printf, execution graph, message log to see what's happening.