Dsptop

From Texas Instruments Wiki
Jump to: navigation, search

TIBanner.png

Dsptop example 1.png

Introduction

dsptop, similar to the Linux top utility, provides visibility to usage data for TI multicore DSP-ARM SoC devices such as the 66AK2Hxx family.

The example window (right) shows dsptop's usage display. Above the interactive command line (>) dsptop provides the operating mode (Moving Average Mode), the DSP and memory usage aggregated across all DSPs, and the device temperature (for supported devices). If a memory type is not allocated, the memory usage for that type will not be displayed. The Accuracy% value reflects the percentage of total run time dsptop is not collecting ULM messages (See Operation Overview for details).

Below the interactive command line, dsptop shows the usage, run time and idle time for each DSP. Moving Average Mode provides usage, run time and idle time over the last N seconds (the sample window). The default value for the Moving Average sample window is 3 seconds and it can be set to any value between .5 and 900 seconds (--sample/-s value). Setting the sample window to 0 will changes dsptop's mode to Total Mode, providing usage, run time and idle time over an infinite sample window. The process option (--process/-p) can be used to stop the display update when the current DSP process completes. When a new DSP process starts the display will automatically be cleared (values set to 0) and then display updates resumed until the next DSP process is completed. This is very useful in a development environment when you are launching execution of individual DSP processes.

The display update rate can be set to any value between .5 to 900 seconds (--delay/-d value). The default is 3 seconds. The screen update iteration count can be used to terminate dsptop after N display updates (--num/-n value). It's default value is 0 (run forever).

While the display is showing DSP Usage data, the usage data can also be sampled and saved to a file and/or plotted to a graph. See Saving/Graphing Usage Data for details.

Logging mode (--logging/-l <first|last>) disables the usage display (see Operation Overview for details) and collects ULM messages until dsptop is terminated. ULM messages are collected in a circular trace buffer, thus if the trace buffer fills you can choose between seeing the first N messages collected or the last N messages collected in which case the buffer will wrap. ULM messages collected in Logging Mode can be written by default to stdout or optionally to a file (--out/-o filename). When logging the display will show the percentage of the trace buffer that has filled. If quiet operation (--quiet/-q) is selected the dsptop display and interactive command line are disabled. See Logging Notes for an example and additional details.

Interactive commands include:

  • c - Clear the display
  • h - Interactive help, display operating parameters (including ARM and DSP frequencies), and dsptop version
  • d - Change the display update rate
  • r - Change the usage sampling resolution (when saving usage data to a file)
  • s - Change the moving average sample window size
  • w - Write the current operating parameters to the .dsptoprc file

When dsptop starts execution, options are first read from the .dsptoprc file in the current directory and then from the command line. See the dsptop man page for complete help or use -h for a short explanation of command line options.

Note: This wiki page applies to dsptop version 1.1.x (where x is the bug fix level).

Note: DSP application code must be integrated with ULMLib (Usage & Load Monitor) for dsptop to provide usage data. TI DSP environments integrated with ULMLib include:

Since these frameworks are already integrated with ULMLib, no additional ULMLib integration with your DSP application is required. For DSP applications that do not use one of the ULM supported frameworks, see ULMLib Integration Notes for details.


Examples

NoteNote: The following examples assume you have the OpenCL Examples installed. See OpenCL Runtime for details.
  1. Measure DSP usage of the OpenCL Mandelbrot example with dsptop set for a one second moving average sample window, a one second display update, and stop the display update when the Mandelbrot DSP processes terminate.
A. In a terminal start dsptop:
$ dsptop -s1 -d1 -p
B. In a second terminal execute the OpenCl Mandelbrot example (for ssh use -X option):
$ cd /usr/share/ti/examples/opencl/mandelbrot
$ ./mandelbrot
Dsptop mandelbrot s1d1p.png
Dsptop mandelbrot.png


  1. Measure DSP usage of the OpenCL Mandelbrot example with dsptop set for Total Mode, a one second display update, and stop the display update when the Mandelbrot DSP processes terminate.
A. In a terminal start dsptop:
$ dsptop -s0 -d1 -p
B. In a second terminal execute the OpenCl Mandelbrot example (for ssh use -X option):
$ cd /usr/share/ti/examples/opencl/mandelbrot
$ ./mandelbrot
Dsptop mandelbrot s0d1p.png
Dsptop mandelbrot.png


Saving/Graphing Usage Data

Usage data per DSP and device temperature data can be sampled and written to a file by simply providing an output filename (--output/-o <filename>). The file format(--foramt/-f) can be selected between text, csv, gnuplot and gnuplot_wxt. The default is text. A file extension is automatically applied (filename.txt|.csv|.gp_cmd/.gp_dat). In the case of gnuplot and gnuplot_wxt two files are created, filename.gp_cmd and filename.gp_data. In addition to writing the gnuplot command and data files, in the case of gnuplot_wxt, gnuplot is launched and the usage and temperature data displayed. Text and csv files include Usage %, Run Time, Idle Time, per DSP and device temperature. Gnuplot file formats provide just the Usage % per DSP and the device temperature.

The resolution at which data is captured can be changed to any value between .001 seconds and 1 second (--resolution/-r value OR interactively with the 'r' command). The default is .1 seconds.

If --process/-p is used, the usage files and gnuplot display in the gnuplot_wxt case, are updated each time the current DSP process exits. Only usage data for the last DSP process is retained in the usage file. If --process/-p is not selected the usage file is updated when dsptop is terminated and in the gnuplot_wxt case gnuplot is launched to show the usage data.

Using the previous example, to display gnuplot data, simply replace the dsptop command line with the following:

$ dsptop -s0 -d1 -p -f gnuplot_wxt -o myplot -r .01
Dpstop mandelbrot gnuplot.png


This example creates two files, myplot.gp_cmd and myplot.gp_dat. This data can then be display anytime with:

$ gnuplot myplot.gp_cmd


Automation

Execution of linux commands can be synchronized to the execution of dsptop through the use of the dsptop_sync.sh bash script (installed with dsptop). The dsptop_sync.sh bash script takes a linux command as an argument and executes that command after dsptop has been started, and then terminates dsptop after execution of the linux command argument. When using the dsptop_sync.sh script the .dsptoprc file must be used to configure dsptop.

For example to automate the generation of gnuplot data for the mandelbrot opencl example, create a .dsptoprc file with the following contents:

-s0
-d1
-p
-f gnuplot
-o mandelbrot_usage
-r .01

And then execute:

dsptop_sync.sh "pushd /usr/share/ti/examples/opencl/mandelbrot && ./mandelbrot > /dev/null && popd"

You can then at a later time review the plots with:

gnuplot mandelbrot_usage.gp_cmd

Operation Overview

Q. Is there any impact on IPC traffic between the ARM and DSPs by dsptop?

A. Dsptop decouples ULM message traffic from all other ARM/DSP IPC traffic by using the Embedded Trace Buffer (ETB) to collect ULMLib generated messages. ULM messages are transported to the ETB through the System Trace Module (STM). The combination of small message sizes and low overhead makes using this mechanism very efficient with respect to the DSP cycles required, and therefore much faster that a traditional IPC or printf calls.

Q. Why do I occasionally experience unexpected dsptop behavior?

A. Unfortunately, the ETB can not be read by the ARM while it's collecting data, thus small gaps can occur in the ULM data. The ETB data size is checked every 5 milliseconds by dsptop and is emptied anytime it's found to be 50% full. In cases where the DSPs are constantly sending ULM messages, some messages will be lost to the ETB recording gaps. In cases where the DSPs send a sparse number of ULM messages over a long period of time, the chances of ETB gaps are significantly reduced, but can still happen. The Accuracy value simply shows what percentage of dsptop execution time ULM data is not collected because it's reading the ETB (this value can change depending on how busy the system is).

Q. Why can't I see usage data while logging?

A. When logging (--logging/-l first|last) the ETB is not sampled and is only read once when the user terminates dsptop. This ensures there is no data loss due to gaps caused by sampling. The ETB is of limited size, so the user will record either the first N contiguous messages or the last N contiguous messages in which case the ETB wraps. The ULM message density in the ETB varies with the type of ULM message and message alignment if the ETB wraps. For ulm_put_statemsg() calls, the most common ULMlib function used by OpenCL and OpenMP, typical message counts are in the in 1700-2300 message range.


Execution Notes

Since dsptop uses ncurses, if dsptop must be killed (kill -9 pid), the "reset" command must be executed to restore terminal functionality.

If the ARM core running dsptop has a significant load, it may be necessary to increase the nice level of dsptop (e.g. $nice -n -20 dsptop) to achieve correct operation.


Installation

Ubuntu Installation:

  1. Add the mcsdk-hpc PPA to your system if you haven’t already.
A. Navigate to https://launchpad.net/~ti-keystone-team/+archive/ubuntu/keystone-hpc-3.0.1.5 in a browser.
B. Under the “Adding this PPA to your system” section, select the “Technical details about this PPA” link.
C. Select your Ubuntu version to update the sources.list entry. The text should be something similar to the following:
deb http://ppa.launchpad.net/ti-keystone-team/keystone-hpc-3.0.1.5/ubuntu trusty main
deb-src http://ppa.launchpad.net/ti-keystone-team/keystone-hpc-3.0.1.5/ubuntu trusty main
D. Add the two lines from the sources.list entry to the /etc/apt/sources.list file in your Ubuntu file system.
E. Run “sudo apt-get update”.
  1. Run “sudo apt-get install dsptop”.
  2. Insert kernel modules by rebooting or by running the following commands.
A. sudo modprobe debugss_kmodule
B. sudo modprobe temperature_kmodule
  1. Run “dsptop” to confirm installation.
  1. Note: To update the debugss and temperature kernel modules run the following. Typically, this is only needed when updating dsptop or specifically updating the kernel modules.
  2. A. sudo apt-get install debugss-mod-dkms
    B. sudo apt-get install temperature-mod-dkms
    C. Next you can either reboot to re-insert modules or re-insert the modules manually (steps D-H).
    D. To re-insert kernel modules manually, quit dsptop, kill any process using the ulm library, and run steps (E-H).
    E. sudo modprobe -r debugss_kmodule
    F. sudo modprobe -r temperature_kmodule
    G. sudo modprobe debugss_kmodule
    H. sudo modprobe temperature_kmodule

Installing on EVM:

  1. Review the MCSDK-HPC Getting Started Guide (http://processors.wiki.ti.com/index.php/MCSDK_HPC_3.x_Getting_Started_Guide#Install_MCSDK).
  2. Download and install the mcsdk-hpc package (http://software-dl.ti.com/sdoemb/sdoemb_public_sw/mcsdk_hpc/latest/index_FDS.html).
  3. Copy IPK files from [mcsdk-hpc install dir]/mcsdk_hpc_<version>/images to EVM target filesystem.
  4. Log on to the EVM as a root user and install IPKs using opkg:
A. opkg install debugs-mod-dkms_<ver>_cortexa15fh-vfp-neon.ipk
B. opkg install temperature-mod-dkms_<ver>_cortexa15fh-vfp-neon.ipk
C. opkg install dsptop_<ver>_cortexa15fh-vfp-neon.ipk
  1. For an Ubuntu x86 hosted file system use dpkg to install ipks.
A. sudo dpkg –x debugs-mod-dkms_<ver>_cortexa15fh-vfp-neon.ipk /evmk2h_nfs
B. sudo dpkg –x temperature-mod-dkms _<ver>_cortexa15fh-vfp-neon.ipk /evmk2h_nfs
C. sudo dpkg –x dsptop _<ver>_cortexa15fh-vfp-neon.ipk /evmk2h_nfs
  1. Insert kernel modules by running the following commands.
A. insmod /lib/modules/<kernel ver>/extra/debugss_kmodule.ko
B. insmod /lib/modules/<kernel ver>/extra/temperature_kmodule.ko
  1. Run “dsptop” to confirm installation.
  1. Note: To update the debugss and temperature kernel modules run the following. Typically, this is only needed when updating dsptop or specifically updating the kernel modules.
  2. A. Install kernel module IPKs using steps 4 or 5.
    B. Next you can either reboot to re-insert modules or re-insert the modules manually (steps C-G).
    C. To re-insert kernel modules manually, quit dsptop, kill any process using the ulm library, and run steps D-G as root.
    D. rmmod /lib/modules/<kernel ver>/extra/debugss_kmodule.ko
    E. rmmod /lib/modules/<kernel ver>/extra/temperature_kmodule.ko
    F. insmod /lib/modules/<kernel ver>/extra/debugss_kmodule.ko
    G. insmod /lib/modules/<kernel ver>/extra/temperature_kmodule.ko

ULMLib Integration Notes

DSP applications that do not use a ULM supported framework ( OpenCL or OpenMP Accelerator Model) must be integrated with ULMLib state functions. The DSP build of ULMLib provides the following two ULM state message transport functions for use in DSP applications:

void ulm_put_state(ulm_state_t state);
void ulm_put_statemsg(ulm_state_t state, uint32_t taskid, uint32_t value);

Where state will typically be set to one of the following:

ULM_STATE_IDLE     /* For use with overhead code (setup code or in a idle loop), and is reflected in the dsptop idle time column */
ULM_STATE_RUN      /* For use with application processing code, and is reflected in the dsptop run time column */
ULM_STATE_EXIT     /* For use to terminate a dsp process and stop both idle time and run time advancement (if dsptop --process/-p used) */

And for the ulm_put_statemsg() function, "taskid" and "value" (tracking id) are user definable.

When dsptop decodes a state message packet (generated with either ulm_put_state() or ulm_put_statemsg()) it simply accumulates time for that state until it encounters a state change in the message packet stream.


Logging Notes

The advantage of using the ulm_put_statemsg() function over the ulm_put_state() function is when logging (dsptop --logging/-l), a timestamped message with state, "taskid" and "value" (tracking id) is provided in a text format:

[ULM_STATE_IDLE] = "Idle, task id %d, tracking id %d"
[ULM_STATE_RUN] = "Running, task id %d, tracking id %d",
[ULM_STATE_EXIT] = "Exit, task id %d, tracking id %d",

The ulm_put_state() function simply provides the timestamp and state in a text format.


Logging Example

To log the first N ULM messages generated by Mandelbrot to a file use the following dsptop command line (ULM message recording is stopped when the trace buffer (ETB) is full):

$ dsptop -l first -o mandelbrot.txt

Then execute the OpenCl Mandelbrot example in another terminal.

While mandelbrot is executing, the dsptop display shows the percentage of the trace buffer (ETB) that has been filled. In this case once the trace buffer fills, dsptop will terminate automatically and display the following message.

$ dsptop -l first -o mandelbrot.txt
Terminating logging mode, 32768 ULM message bytes collected, Trace Buffer 100% full: Trace buffer stopped on full

Note that if the trace buffer is not filled or the last option is used, dsptop must be terminated manually (either with the q command or CNTL-C).

The following shows the first messages from mandlebrot.txt. The first column is the timestamp, the second column is the delta time to the previous message, the core that generated the message, and then the message. Time is shown as HH:MM:SS:microseconds.

00:00:00.000000 00:00:00.000000 arm_0, Internal Data Total 4864 KB, 67.2% Free, 32.8% Used
00:00:00.000003 00:00:00.000003 arm_0, External Data Total 0 KB, 67.2% Free, 32.8% Used
00:00:00.000006 00:00:00.000003 arm_0, External Code/Data Total 1474560 KB, 67.2% Free, 32.8% Used
00:00:00.424679 00:00:00.424673 arm_1, External Code/Data Total 1474560 KB, 99.9% Free, 0.1% Used
00:00:00.424680 00:00:00.000001 arm_1, External Data Total 0 KB, 0.0% Free, 100.0% Used
00:00:05.294267 00:00:04.869587 arm_3, External Code/Data Total 1474560 KB, 99.9% Free, 0.1% Used
00:00:05.294404 00:00:00.000137 dsp_2, OpenCL NDR Cache Coherence Complete, Kernel id -2, Work Group id 0
00:00:05.294404 00:00:00.000000 dsp_1, OpenCL NDR Cache Coherence Complete, Kernel id -2, Work Group id 0
00:00:05.294404 00:00:00.000000 dsp_4, OpenCL NDR Cache Coherence Complete, Kernel id -2, Work Group id 0
00:00:05.294404 00:00:00.000000 dsp_6, OpenCL NDR Cache Coherence Complete, Kernel id -2, Work Group id 0
00:00:05.294404 00:00:00.000000 dsp_0, OpenCL NDR Cache Coherence Complete, Kernel id -2, Work Group id 0
00:00:05.294404 00:00:00.000000 dsp_3, OpenCL NDR Cache Coherence Complete, Kernel id -2, Work Group id 0
00:00:05.294404 00:00:00.000000 dsp_5, OpenCL NDR Cache Coherence Complete, Kernel id -2, Work Group id 0
00:00:05.294405 00:00:00.000001 dsp_7, OpenCL NDR Cache Coherence Complete, Kernel id -2, Work Group id 0
00:00:05.294568 00:00:00.000163 dsp_0, OpenCL NDR Overhead, Kernel id 0, Work Group id 0
00:00:05.294577 00:00:00.000009 dsp_5, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 0
00:00:05.294578 00:00:00.000001 dsp_7, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 1
00:00:05.294580 00:00:00.000002 dsp_4, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 2
00:00:05.294581 00:00:00.000001 dsp_2, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 3
00:00:05.294583 00:00:00.000002 dsp_1, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 4
00:00:05.294585 00:00:00.000002 dsp_6, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 5
00:00:05.294587 00:00:00.000002 dsp_3, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 6
00:00:05.295097 00:00:00.000510 dsp_5, OpenCL NDR Kernel Complete, Kernel id 0, Work Group id 0
00:00:05.295099 00:00:00.000002 dsp_5, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 7
00:00:05.295126 00:00:00.000027 dsp_7, OpenCL NDR Kernel Complete, Kernel id 0, Work Group id 1
00:00:05.295127 00:00:00.000001 dsp_4, OpenCL NDR Kernel Complete, Kernel id 0, Work Group id 2
00:00:05.295128 00:00:00.000001 dsp_2, OpenCL NDR Kernel Complete, Kernel id 0, Work Group id 3
00:00:05.295129 00:00:00.000001 dsp_7, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 8
00:00:05.295130 00:00:00.000001 dsp_4, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 9
00:00:05.295131 00:00:00.000001 dsp_1, OpenCL NDR Kernel Complete, Kernel id 0, Work Group id 4
00:00:05.295132 00:00:00.000001 dsp_2, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 10
00:00:05.295133 00:00:00.000001 dsp_6, OpenCL NDR Kernel Complete, Kernel id 0, Work Group id 5
00:00:05.295134 00:00:00.000001 dsp_3, OpenCL NDR Kernel Complete, Kernel id 0, Work Group id 6


Logging to a Pipe

If -o is not used to select a output file, when message collection is complete, the message data is exported to sdtout which can be piped to additional Linux utilities (normally you will want to also use the --quiet/-q option to silence dsptop's status output).

$ dsptop -q -l first | grep dsp_2
00:00:09.929293 00:00:00.000170 dsp_2, OpenCL NDR Cache Coherence Complete, Kernel id -2, Work Group id 0
00:00:09.929520 00:00:00.000002 dsp_2, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 6
00:00:09.930010 00:00:00.000000 dsp_2, OpenCL NDR Kernel Complete, Kernel id 0, Work Group id 6
00:00:09.930013 00:00:00.000000 dsp_2, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 13
00:00:09.930495 00:00:00.000001 dsp_2, OpenCL NDR Kernel Complete, Kernel id 0, Work Group id 13
00:00:09.930497 00:00:00.000001 dsp_2, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 21
00:00:09.930982 00:00:00.000001 dsp_2, OpenCL NDR Kernel Complete, Kernel id 0, Work Group id 21
00:00:09.930984 00:00:00.000000 dsp_2, OpenCL NDR Kernel Start, Kernel id 0, Work Group id 29
...
00:00:10.395917 00:00:00.000378 dsp_2, OpenCL NDR Kernel Complete, Kernel id 1, Work Group id 343
00:00:10.395919 00:00:00.000002 dsp_2, OpenCL NDR Kernel Start, Kernel id 1, Work Group id 351
00:00:10.397447 00:00:00.000298 dsp_2, OpenCL NDR Kernel Complete, Kernel id 1, Work Group id 351
00:00:10.397448 00:00:00.000001 dsp_2, OpenCL NDR Kernel Start, Kernel id 1, Work Group id 359
00:00:10.398161 00:00:00.000209 dsp_2, OpenCL NDR Kernel Complete, Kernel id 1, Work Group id 359
00:00:10.398163 00:00:00.000002 dsp_2, OpenCL NDR Kernel Start, Kernel id 1, Work Group id 367
00:00:10.399031 00:00:00.000236 dsp_2, OpenCL NDR Kernel Complete, Kernel id 1, Work Group id 367
00:00:10.399033 00:00:00.000002 dsp_2, OpenCL NDR Kernel Start, Kernel id 1, Work Group id 375

Note that if logging "last" and piping the message output to another Linux utility (dsptop -q -l last | grep dsp_2), dsptop's data collection must be terminated manually by sending SIGINT to dsptop with the kill command (kill -INT pid). If you manually attempt to use CNTL-C from the terminal, dsptop will terminate with a "Broken pipe" or "Error while writing log file" error.