MIDAS Ultrasound v4.0 Demo

From Texas Instruments Wiki
Jump to: navigation, search

Overview

The MIDAS Ultrasound v4.0 demo showcases an example system level implementation of midend and backend ultrasound signal processing on Texas Instruments (TI) multicore devices. It is important to note here that though the demo integrates signal processing algorithms specific to ultrasound, it can be used as a reference for any application that requires partitioning and stringing together algorithm/codecs across multiple cores. 

This version of the demo, leverages the homogenous C6678 eight-core Multicore DSP, and the heterogenous OMAP3530 ARM+DSP System-on-Chip. The TI C6678 Low-cost EVM is interfaced via an ethernet-to-ethernet connection to the TI OMAP3530 Mistral EVM. Though this demo is developed and run on these devices, a lot of the code implementation can be used a reference to port to other TI Multicore offerings.

The C6678 is the industry’s highest performing multicore DSP in production today, featuring eight 1.25-GHz C66x DSP cores and delivering 160 single-precision GFLOPS and 60
double-precision GFLOPS in just 10W. The OMAP3530 consists of a 720MHz ARM Cortex A-8 general processor and a 520MHz C64x+ DSP core, which together provide a power-efficient solution to handle both system controller and back-end processing functions in diagnostic ultrasound imaging systems.

The figure below showcases the function of the cores on the C6678 and the OMAP3530. As shown in the diagram, the B-mode Processing Unit (BPU) includes envelope detection and log compression, and the Doppler Processing Unit (DPU) which includes ensemble aggregation, wall filter and flow estimation, are implemented on separate cores on the C6678 device. The Scan Conversion Unit (SCU) is implemented on the OMAP’s C64x+ DSP core. All the algorithm kernels implemented in this system are from TI's Embedded Processor Software Toolkit for Medical Imaging, which includes both the source code and detailed documentation for these algorithm modules. In addition, the demo showcases the use of TI's DMAI APIs on ARM as well as the use of Qt and tslib to create a touchscreen-based Graphical User Interface (GUI) on the SoC that allows users to interact with the ARM and DSP in real-time.

US4 Demo Functional Overview.PNG

The sample raw data used in the demo is from a scan of a Carotid Artery, consisting of 69 frames of post-RF-demodulated data. Each frame worth of input data includes 256 scanlines, 512 samples/scanline of B-mode, and 48 scanlines, 256 samples/scanline, 10 ensembles of doppler (color flow) data. This input data is initially stored on the NFS/SDcard on OMAP3530. During initialization, all 69 frames worth of input data is sent to the C6678 which stores this in DDR. When processing starts, the input data is fed in at a set acquisition interval rate (set at 20 fps in this demo but can be customized), which is then processed through the BPU, DPU (on C6678) and SCU (on OMAP3530's DSP) modules as shown above, and the final scan-converted image is displayed via OMAP3530's DVI output onto an external 7-inch LCD screen.

This wiki document covers various aspects of the demo, including a discussion on the software design as well as step-by-step instructions to obtain the source code, and setup your development environment to build and run the demo.

Source Code Hosting (GForge Project Portal)

MIDAS source code is hosted on TI's GForge project portal.

The MIDAS project page is located at https://gforge.ti.com/gf/project/med_ultrasound/

Software Implementation

This section discusses the software implementation for Ultrasound v4.0 and showcases how TI's software components including the Multicore Software Development Kit (MCSDK), Software Development Kit for OMAP3530 (DVSDK), Codec Engine (CE), Digital Media Application Interface (DMAI) and iUniversal APIs can be leveraged by developers to create applications for homogeneous and heterogeneous multicore systems like the C6678 and OMAP3530 respectively.

Both the SoC and Multicore DSP software implementations leverage TI's Codec Engine (CE), a standard software architecture and interface for algorithm execution. CE eases multicore programming and more specifically in the case of an SoC like the OMAP3530, it abstracts the DSP for GPP (ARM) programmers. CE is based on a client-server architecture: for the SoC implementation, the ARM acts as the client, and the DSP as the server (consisting of single/multiple algorithms), while for the multicore DSP, one DSP acts as the client (master core). CE allows easy algorithm plug-in, where DSP developers supply XDAIS-compliant algorithms, and GPP (ARM) developers integrate these DSP algorithms and make remote procedure calls from their ARM applications. Interprocessor communication and DSP resources (like DMA, memory) are managed by CE under the hood, and no manual coding of IPC is required.

TI's SYS/BIOS 6.x is a highly configurable, real-time operating system that caters to a variety of embedded processors and is included as part of TI’s Code Composer Studio integrated development environment. SYS/BIOS provides some key features that enable easy memory management, preemptive multitasking and real-time analysis. Based on the application's requirements, developers can optimize their final runtime image by including/excluding specific SYS/BIOS modules. 

The Multicore Software Development Kit (MCSDK) includes key components that ease multicore development including the chip support library, low level drivers, platform software (PDK), Network Developer's Kit (NDK), etc. The Codec Engine, which we describe in more detail later provides a framework and APIs to easily plug-and-play algorithms, and handle Inter-Processor Communication (IPC) under the hood.

The figure below showcases the TI software components available for development on the System-on-Chip, that the software application on OMAP3530 leverages:

US3 Demo SoC Components.PNG


Multicore DSP (Mid End)

The software application that runs on C6678 is based on a Master/Slave model, where Core 0 acts as the centralized controlling core aka the Master core, and Core 1 and Core 2 act as the Slave cores. Note that though in this demo we utilize only three cores since they are enough to demonstrate our use-case, it is ofcourse feasible to use the same programming model to extend the application to utilize all eight cores. The processing modules are statically assigned to Core 1 (BPU) and Core 2 (DPU) and Core 0, which serves as the master, takes care of synchronization and sets up the buffer pointers. Note however that even though distribution is done statically, the assignment of algorithms to cores is done outside of the main application. This allows easy reconfiguration and at the application sotware level, the developer can be agnostic to which core is running which algorithm.

Functional Overview

The software application on C6678's Core 0 is designed to integrate the following functional blocks: Front End Interface, Mid End Controller, Mid End Processing and Back End Interface.

US3 Demo Application Design.PNG

Mid End Controller

As the name suggests, the Mid End Controller is responsible for initializing and initiating the other blocks. Commands received from the OMAP3530 are also interpreted here.

Front End Interface

The Front End Interface serves two primary functions: it provides periodic events that mark the availability of incoming input data and it provides functions to access this data that has arrived. Since this version of the demo, Ultrasound v4.0 showcases processing blocks post IQ demodulation, and has no front-end implementation, it becomes necessary to mimic the function of the ultrasound front-end as in a real system, where input frames would be continuously received at a set acquisition frame rate. The Front End Interface serves this role, where in it fires an INPUT_RDY event every (1/acquisition rate) seconds. The clock ticks are derived from the SYS/BIOS Timer module. Note here that though in this design we use a frame-based processing model, where the frame boundary defines the input block size, it is also possible to have partial frames as input boundaries.

Mid End Processing

The Mid End Processing function block is pending on the INPUT_RDY event from theFront End Interface and as soon as a new frame is “received,” it initiates processing on that input block.Mid End Processing acts as the Client in the Codec Engine (CE) framework and uses the iUniversal interface to call upon Algorithm Servers that correspond to various functions within the ultrasound midend processing signal chain. In this implementation, there are two algorithm servers implemented, one for each core (Core 1 and Core 2).

Let us now look at the primary execution threads that define the data flow through the Mid End Processing block. The figure below shows four primary tasks that utilize theMessageQ IPC module for message passing. The messages provide pointers to the data and trigger the execution of tasks in the receiving functions. The actual message buffer is setup in shared memory that both the message sender and the message receiver can access. In this case, the MidEnd_scatterTask() pends on a new input frame. When data becomes available, the MidEnd_scatterTask() allocates memory for the message from heap, and assigns the message pointer to the B-mode input data. It similarly allocates memory for a message that points to the Color flow input data. Using the MessageQ_put blocking call, the MidEnd_scatterTask() passes the input data pointers for B-mode and Color to the BPUcluster_task() andDPUcluster_task(). The BPUcluster_task and DPUcluster_Task use the corresponding MessageQ_get calls to pend on the incoming B-mode and Color data frames. Once a new frame is received, these tasks call the UNIVERSAL_process API provided by the iUniversal interface, to invoke the BPU and DPU processing algorithms on Core 1 and Core 2 respectively. Once the B-mode and Color data are processed, the BPUcluster_task() and DPUcluster_task() use the MessageQ_put API to pass the output data pointers to the MidEnd_gatherTask(). The MidEnd_gatherTask() ensures that data atomicity is maintained, so that the B-mode and Color data that corresponds to a particular frame always stays together.

Note that the control tasks that involve data scattering and gathering, viz. MidEnd_scatterTask() and MidEnd_gatherTask() are running on Core 0, the Master core, and the processing tasks, viz. BPUcluster_task() and DPUcluster_task() are running on Core 1 and Core 2, respectively. It is important to note here that the IPC between cores is handled under the hood via CE and by using MessageQ, the developer is agnostic to this fact and can simply call the MessageQ APIs for passing data pointers between cores.

US3 Demo Data Flow.PNG

Back End Interface

Once the output B-mode and Color data is ready for scan conversion, it is time to pass the data to the OMAP3530 that handles the backend processing and display. The Mid End application's Back End Interface block provides functions to communicate with the OMAP3530 back end. For this example implementation, we use the ethernet ports on both device EVMs to interface the two together. To implement the communication protocol in software, we use RDSP, an application written on top of the MCSDK's Network Development Kit (NDK), that allows easy passing of data and parameters between the C6678 and the OMAP3530.

CE Implementation Details

To better understand this, we delve into a brief discussion on CE. The CE framework is essentially a set of APIs used to instantiate and run XDAIS-compliant algorithms. XDAIS is an algorithm standard that DSP programmers should follow to ensure that their algorithms easily plug-and-play with other algorithms and can be called using CE APIs. CE requires two essential components to operate in tandem: a CE Client and a CE Algorithm Server. In this demo, the master core, Core 0, serves as the CE Client and uses CE APIs to make “remote procedure calls” to CE Algorithm Server executables that reside on DSP Core 1 and Core 2. Essentially, the CE Algorithm Server combines the core codec (BPU in the case of Core 1 and DPU for Core 2) along with the other infrastructure pieces (SYS/BIOS, IPC, etc) to produce an executable (.x64P) that is callable by Core 0, the CE Client. The application on Core 0 invokes the BPU and DPU remote algorithms on Core 1 and Core 2 using the iUniversal interface, which is a set of APIs used to provide an easy way for XDAIS-compliant, non-VISA (Video, Image, Speech, Audio) algorithms to run using CE
CE provides some unique features that significantly eases the multicore DSP software development process. One of the primary advantages is that CE eliminates the need for the developer to manually code any Inter-Processor Communication (IPC) details. Once the developer configures memory for IPC, CE takes care of the rest of IPC under the hood. This is illustrated later in this section with code snippets. CE also captures some key TI hardware features, where resource management for memory and EDMA are done via CE. CE also enables code reuse and faster time to market. For more information on CE lease click on the relevant links in the References section.

System-on-Chip (Back End)

This section discusses the software design for the System-on-Chip application and showcases how TI's SDK for OMAP3530 including the Codec Engine, DMAI and iUniversal APIs can be used to ease SoC software development. As previously mentioned, the SoC's role is to do backend processing and display, and provide mechanisms for data input/output.
The figure below summarizes the implementation.

US4 SoC Application Design.PNG

As shown, OMAP3530's ARM receives data that has undergone B-mode estimation and Color Flow processing, from the C6678. OMAP3530’s C64x+ DSP core runs the DSP/BIOS real-time operating system and runs the Scan Converter Unit (SCU) algorithm on this received data. The SCU converts both B-mode and Color Flow data from the acquired polar/cartesian co-ordinates to the display cartesian co-ordinates. Just like in the case of the Multicore DSP implementation, the algorithm module (SCU in this case) is packaged into a 'DSP Algorithm Server' that is managed by the Codec Engine and executed on the DSP core. On the ARM, the demo application uses Codec Engine APIs to make a remote procedure call to this 'Algorithm Server'.
The ARM core runs the Linux operating system, and all peripherals are controlled through Linux device drivers, which are part of the PSP. The ARM application resides on this Linux filesystem and uses a multithreaded framework to achieve data FIFO management.

Functional Overview

The ARM application consists of a main function, and four execution threads running in parallel viz. acquisition, process, control and display threads, which  are configured as preemptive/ priority-based scheduled.  
• The main function completes initialization and becomes the event handling loop
• The acquire thread (initiated as part of the RDSP application) reads the raw input ultrasound data (from C6678) into the SoC
• The process thread engages the C64x+ DSP for scan conversion
• The display thread transfers ultrasound image frames to the frame buffer of the FBDev display driver
Fixed-size buffers are exchanged between acquisition-process, and process-display threads for data movement. The thread where the data originates creates and maintains the set of data buffers, and FIFOs are used to put and get buffer pointers. In the following subsections we will delve into each of the threads discussed above. It might be useful to follow along with the source code as we discuss these elements.

Main

The main function in main.cpp performs necessary initialization tasks, which include initiating a connection between 6678 and OMAP3530, sending the input files via TFTP from the SD card on OMAP3530 to 6678's DDR, sending configuration parameters for the algorithm modules (BPU, DPU) on C6678, initiating the acqusition, process, display and control threads and setting up the user interface. Once the main window is setup, this main thread becomes the event handling loop that makes function callbacks in response to touchscreen events. All trigger events (slots) and their associated callback functions (signals) are defined in mainwindow.cpp. The image() function is the most important signal in mainwindow.cpp and is triggered at the onset once data has been sent to 6678, and it is time to begin image processing.

Acquisition

The acquisition thread is responsible for receiving raw, pre-scan-converted ultrasound data from the C6678. The Mid End Interface consists of the MidEndIf_getBuffer and MidEndIf_putBuffer functions. The RDSP client application layer calls these functions in the acqusition thread it spawns. The MidEndIf_getBuffer allocates buffer space for the incoming data. All data buffers that are shared between the OMAP3530's ARM and DSP need to be allocated in physically contiguous blocks of memory, which is what the BufTab_create() API call helps achieve for these input buffers. The MidEndIf_putBuffer call pushes the buffer pointers into a FIFO Acq2ProcFifo. Once the Process thread finishes processing the input frame, it releases the used buffer, which the Acqusition thread can then reuse for the next input frame. Note that the Acqusition thread continues to receive data at the acquisition frame interval defined in the C6678 application's Front End Interface. This is to mimic how an actual ultrasound system works where the acquisition rate and the display rate are independent of each other.

Process

The process thread function defined in UsProcess.c interacts with the C64x+ and engages the DSP to run the SCU processing algorithm module. Just like we saw with the acquire thread, since the output buffers will be shared between ARM and DSP, the process thread also allocates contiguous memory for these. When imaging starts, the UsProcess_thrFxn begins and starts receiving input buffers from the acquisition thread on the Acq2ProcFifo/2 FIFOs. It passes the input and output buffer pointers to the UsProcess_ScanConvertB and UsProcess_scanConvertColor calls for the B-mode and Color data respectively. Once scan conversion is complete, the process thread places the pointer to the final output buffer on the Proc2DispFifo. The UsProcess_scanConvertB and UsProcess_scanConvertColor functions invoke the DSP-side scan conversion method using the IUNIVERSAL UNIVERSAL_process() API which in turn invokes the corresponding functions on the DSP-side. When the process thread is fully primed it releases the display thread for it to start accepting buffers on the Proc2DispFifo. Once free running though, the thread continues to process input buffers as fast as it can and sends them to the Display thread.

Display

The Display thread's function defined in UsDisplay.c transfers the final scan-converted ultrasound image frames that it receives on the Proc2DispFifo to the frame buffer of the FBDev display device driver. The Display_create() DMAI method opens the display device driver, and the Display thread uses a handle to this to copy output buffers to the driver. The Display thread uses the H/W resizer to copy the output buffer to the display device driver, instead of using a memcpy() to ensure execution efficiency. DMAI’s Framecopy API is used to perform this buffer copy. The display refresh rate is set at 60 fps but incorporates logic that skips a frame if the frames are received from the process thread too fast, or repeats a frame if there is no new frame available from the process thread. Once the Display thread is triggered, the acquire, process and display threads continue to run in a loop. Since this is a demo with only a limited number of frames, the C6678 loops through the same input dataset which are continuoulsy processed and outputted in real time.

DSP Module Integration

In this section we outline some of the steps we took to integrate the BPU, DPU and SCU algorithm modules to plug-and-play with the Codec Engine (CE) using IUNIVERSAL. As CE Application Developers it is possible to use simple API calls to pass data and configuration parameters between the CE Client and CE Server cores.

The BPU, DPU and SCU modules, as provided in the Embedded Software Toolkit for Medical Imaging, are all XDAIS compliant. You can find many articles on this TI EP wiki that guide you through the process of making your codecs XDAIS compliant.

Since the steps for BPU, DPU and SCU modules are similar, we will focus on the SCU module to illustrate how IUNIVERSAL is setup in this scenario where the CE Client is the ARM core on OMAP3530 and the CE Server is the DSP core on the OMAP3530.

The application capitalizes on the IUNIVERSAL API's capability to make a remote procedure call from the ARM to the DSP algorithm, without the need to write any system software for the C64x+. To initiate the ARM-DSP communication, the ARM creates a Codec Engine instance with an Engine_open() call that resets, loads and starts the DSP Engine and returns a handle to the same. Using this handle hEng, the UNIVERSAL_create() API creates an SCU algorithm instance using parameters from the ISCU_Params structure that specify the size of memory to allocate on the DSP. It is important to note that the SCU header file at “midas/ultrasound/algos/scu/src/scu.h” is shared between the DSP and the ARM, which makes it possible for the two cores to share the same interpretation of SCU-specific structures and datatypes, even when both are compiled using different compilers. The UNIVERSAL_create() call returns a handle to the IUNIVERSAL algorithm instance hAlg as:

hAlg = UNIVERSAL_create(hEng, algName, (IUNIVERSAL_Params*)&ISCU_ALLOC_PARAMS);

Next, the ARM populates the SCU configuration structure,scuConfig_t, with parameter values from the user-provided configuration file/s, and with the tissue and flow color mapping lookup tables specified in "midas/ultrasound/userdata/LUTs". The SCU configuration structure is also defined in the SCU header file that ARM and DSP share. To send this configuration information to the DSP, the ARM passes a pointer to the scuConfig_t structure and the handle to the SCU algorithm instance hAlg, using the UNIVERSAL_control() API as shown in the code section below. UNIVERSAL_control() calls the corresponding SCU configuration function SCU_TI_control() on the DSP.

universalStatus.data.numBufs = 1;
universalStatus.data.descs[0].bufSize = sizeof(scuConfig_t);
universalStatus.data.descs[0].buf =(XDAS_Int8 *)(hUsProcess->pScuConfigB);
status = UNIVERSAL_control(hUsProcess->hAlg, XDM_SETPARAMS, &universalDynParams, &universalStatus);

It is important to note here that any buffers that the ARM shares with the DSP, it allocates in contiguous memory. This is necessary because unlike the ARM, the DSP does not have a virtual memory manager and therefore assumes that the buffer is aligned to a 64-bit boundary and is contiguous. In this demo implementation, the allocated buffers reside in CMEM, a contiguous memory manager by TI. When the ARM allocates contiguous memory using the Memory_contigAlloc() or BufTab_create() DMAI API, a CMEM pool that fits the buffer size requested is reserved for this buffer. The number and total size of CMEM pools is defined in the 'loadmodules.sh' script (from /opt/midas/. on the target), which the ARM application runs during initialization. Since both ARM and DSP have access to this CMEM memory space, they only need to exchange buffer pointers.

Once the SCU algorithm instance on the DSP is configured with the parameters it requires, it is ready to begin scan conversion processing. The ARM uses the UNIVERSAL_process() API to call the DSP-side SCU_TI_process() function; based on the scan conversion mode in the configuration, the DSP runs the corresponding processing function. Again, all input and output buffer pointers that the ARM and DSP exchange point to buffers allocated in CMEM.

status = UNIVERSAL_process(hUsProcess->hAlg, &inpBufDesc, &outBufDesc, NULL, &inArgs, &outArgs);

Finally, when the application exits, the UNIVERSAL_delete() API deletes the SCU algorithm instance hAlg. This deallocates all the dynamic memory that was associated with the hAlg instance. The algorithm instance deletion is accompanied with an Engine_close() call which deletes the Codec Engine instance created for ARM-DSP interaction as:

UNIVERSAL_delete(hUsProcess->hAlg);
Engine_close(hUsProcess->hEng);

To summarize, the CE and IUNIVERSAL APIs allow application developers to seamlessly plug in XDAIS-compliant algorithm modules into their ARM application. As illustrated in the previous section, the use of IUNIVERSAL and CE is similarly used on the C6678 to ease multicore programming.

Get Ultrasound v4.0

Both the prebuilt executables and the source code are available for MIDAS Ultrasound v4.0.

If you would like to just run the demo as-is for starters, you could use the prebuilt SD card image and executables. The instructions for this are outlined in section "Using Prebuilt Executables and SD Card" below. 

If you would like to download the source code to study and play with and setup your development environment to build the demo from scratch, the source code package with build instructions are available, which is described in the section "Build from Source". 

Requirements

Common

  1. Mistral OMAP3530 EVM Rev G w/ power adapter
    (http://focus.ti.com/docs/toolsw/folders/print/tmdsevm3530.html)
  2. C6678 EVM w/ power adapter
    (http://www.ti.com/tool/tmdxevm6678)
  3. Gigabit Network Switch
  4. 2 x Ethernet Cables
  5. USB cable for on-board emulator OR XDS560 Emulator w/ power adapter and cable
  6. HDMI to DVI cable
  7. Lilliput 7-inch LCD screen w/ power adapter
  8. Windows PC w/ CCSv5 installed
  9. RS-232 cable

Additionally, if using prebuilt executables and SD card,

  1. SD card (2GB or larger)
  2. SD card reader
  3. RS-232 cable

Additionally, if building from source,

  1. Linux PC with Ubuntu 10.04 LTS

Using Prebuilt Executables

Follow guidelines in this section if you would like to run the demo quickly and use it as-is for starters. At the end of this section, you'll have a bootable SD card for the OMAP3530 and the prebuilt executables for the 6678.

If you would like to build from source, skip to the next section.

The SD card that you write in this process includes the bootable filesystem and executables for the OMAP3530 side, and also contains the prebuilt executables and CCS target configurations to connect to the C6678 EVM and load the executables on the three cores that will be used.

You can use either a Windows PC or a Linux PC to write the SD Card. Both options are described here. 

Option 1: Using a Windows PC

If you are using a Windows PC to write the SD card follow these instructions:

1. Download and install Cygwin from here

2. Create a new folder with the name 'SDCardImages' at the path C:/cygwin

3. To obtain the SD card image, click here and download the file 'midas_usound_demo4_sdcard.tar.gz' from the MIDASUltrasound4.0 release package into the 'SDCardImages' folder

4. Double click the Cygwin icon on your Desktop to start Cygwin and type the following commands:

host $ cd /cygdrive/c/cygwin/SDCardImages
host $ tar xzf midas_usound_demo4_sdcard.tar.gz

5. WARNING! It is important you perform these steps very carefully. Not doing so can lead to complete data loss!
Plug in your SD card reader with the SD card (min. size of 2GB)
Click on Start --> Control Panel --> Administrative Tools --> 'Computer Management.' On the left hand side, click on the 'Disk Management' tab.
The SD card should show up as “Disk1” or “Disk2”. If it shows up as Disk 1 replace <partition> in the command below with “sdb”; if it shows up as Disk 2 use “sdc” and so on. Double check that the <partition> does correspond to the SD card, else you may risk writing over your hard disk and lose your data.

host $ dd bs=4096 if=midas.img of=/dev/<partition>

This process can take quite some time to complete and while the image is copying, the terminal does not show any progress or update information.

Once the SD card is written successfully you will see a message on the terminal saying that the ~2GB has been copied.

Skip the next section, and proceed directly to 'Setup and Run the Demo'

Option 2: Using a Linux PC

If you are using a Linux PC to write the SD card follow these instructions:

1. Create a directory named 'SDCardImages' on the Linux host where you would like to download the SD card image:

host$ mkdir SDCardImages

2. To obtain the SD card image, click here to download the file 'midas_usound_demo4_sdcard.tar.gz.' Make sure you place this into the 'SDCardImages' folder

3. Uncompress this file as below. Once this uncompresses, you will see a file 'midas_usound_demo4_sdcard.img' in this directory.

host$ cd SDCardImages
host$ tar –xzf midas_usound_demo4_sdcard.tar.gz

4. WARNING!! It is important you perform this step very carefully. Not doing so can lead to complete data loss!
Plug in your SD card reader with the SD card (min. size of 2GB) and write the image 'midas_usound_demo4_sdcard.img' to the card. Replace <partition> in the command below with sdb, sdc, sdd, etc. depending on where your SD card is mounted. Double check that the <partition> does correspond to the SD card, else you may risk writing over your hard disk and lose your data.

host $ sudo dd bs=4096 if=midas.img of=/dev/<partition>

This process can take quite some time to complete and while the image is copying, the terminal does not show any progress or update information. Once the SD card is written successfully you will see a message on the console saying that the ~2GB has been copied.

Skip the next section, and proceed directly to 'Setup and Run the Demo'

Build from Source

Follow guidelines in this section if you would like to download the source code and setup a development environment to build the demo from source.

Multicore DSP C6678 (Mid End)

All development for the C6678 is done on the Windows PC.

  1. TI Code Composer Studio IDE (CCS)
    This version of the demo has been built and verified with CCS version 5.0.3. Later versions of CCS should work, but previous versions of CCS do not have C66x support and will therefore not be compatible with this demo release. You can follow the link to download CCS v5.0.3 from here and install under C:\ti\ccsv5.
    If you install in this default location, your directory structure should be as below. Note the snapshot is of directory 'C:\ti\ccsv5\ccsv5'.

    CCS5 Installation Snapshot.png

  2. PERL and 7-Zip
    Perl and 7-Zip are required to run the automated build script that will be discussed later.
    a. Install Active Perl from http://www.perl.org/get.html. Installs as C:\Perl by default.
    b. Once installation is complete, please ensure that C:\Perl\site\bin and C:\Perl\bin are in your Path. To check this, right click on My Computer --> Properties --> Advanced tab --> Environment Variables. The system variable Path should include these paths.
    c. Install 7-Zip from http://www.7-zip.org/. Installs as C:\Program Files\7-Zip by default. Note that if you plan to install 7-zip in a different location, please update the 'build6678.pl' perl script introduced later.

  3. Environment Variables
    The following environment variables assume that the install packages will/have been installed in the recommended locations
    a. Windows XP: Right-click on My Computer --> Properties --> Advanced tab --> Environment Variables --> New
    b. Define new user variable, where variable name is TI_INSTALL_DIR and variable value is C:\ti
    c. Define another new user variable, where variable name is IQMATH_LIB_DIR and variable value is C:\ti\c64xplus-iqmath_2_01_04_00
    d. Define new user variable, where variable name is CCS5_INSTALL_DIR and variable value is C:\ti\ccsv5. CCS5_INSTALL_DIR represents the location where CCS5 is installed. To verify what CCS5_INSTALL_DIR is, double check to see that CCS5_INSTALL_DIR\ccsv5\eclipse exists.
    e. Add the following variables, if they don't exist, to 'Path' under Environment Variables --> System Variables: 
        C:\ti\ccsv5\ccsv5\utils\gmake
        C:\Program Files\7-Zip
        C:\Perl\site\bin 
        C:\Perl\bin

  4. TI Software Components Setup
    Please note that the demo has been built and verified with the version numbers listed below. Please ensure that you download the correct versions.
    It is assumed in the following steps that C:\ti is TI_INSTALL_DIR

    a. MCSDK 2.0.0.11
    http://software-dl.ti.com/sdoemb/sdoemb_public_sw/bios_mcsdk/02_00_00_11/index_FDS.html
    Install as C:\ti\mcsdk_2_00_00_11
    When asked to choose components, leave default setting i.e. all components checked

    b. Code Generation Tools v7.2.1
    Download from https://www-a.ti.com/downloads/sds_support/TICodegenerationTools/download.htm
    Once you have installed Code Generation tools, verify that it is recognized by CCS. To do this, go to Window -> Preferences -> CCS -> Code Generation Tools.
    If you don't see it listed, click on the 'Add' button and point to the path where you have installed the tools

    US4 CodeGenerationToolsSnapshot.PNG

    c. XDC Tools 3.22.00.09
    http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/rtsc/3_22_00_09/index_FDS.html
    Install as C:\ti\ xdctools_3_22_00_09

    d. Codec Engine 3.21.00.19
    http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/ce/3_21_00_19/index_FDS.html
    Extract as C:\ti\ codec_engine_3_21_00_19

    e. Sys-BIOS 6.31.05.31
    http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/sysbios/6_31_05_31/index_FDS.html
    Install as C:\ti\bios_6_31_05_31

    f. Framework Components 3.21.02.32
    http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/fc/3_21_02_32/index_FDS.html
    Extract as C:\ti\framework_components_3_21_02_32

    g. IQ Math Library 2.14
    http://focus.ti.com/docs/toolsw/folders/print/sprc542.html
    Install as C:\ti\c64xplus-iqmath_2_01_04_00. Note that this corresponds to IQMATH_LIB_DIR defined above.

    h. NDK 2.20.03.24
    http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/ndk/ndk_2_20_03_24/index_FDS.html
    Install as C:\ti\ndk_2_20_03_24

    i. IPC 1.22.05.27
    This should install as part of MCSDK. However, in case it doesn't exist for some reason, you can download it from the link below:
    http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/ipc/1_22_05_27/index_FDS.html
    Install as C:\ti\ipc_1_22_05_27 

  5. Download the source code package. Click here to download the package from the MIDAS GForge project hosting portal

  6. Unzip midas_usound_demo4_rel.zip in TI_INSTALL_DIR. If using recommended paths, this would be unzipped as C:\ti\midas_usound_demo4_rel

    At the end of these steps, your TI_INSTALL_DIR (C:\ti if using recommended paths) should look similar to the snapshot below:

    US4 TI INSTALL DIR Snapshot.png

  7. Start CCSv5.
    Choose C:\ti\midas_usound_demo4_rel\ccsv5_workspace as your workspace
    Go to Window->Preferences->CCS->RTSC->Products. Add C:/ti (or equivalent TI_INSTALL_DIR) under "Tool Discovery Path"
    This should populate "Discovered Tools." with the installed components.
    Close CCSv5.

    US4 CCS5 Window Preferences RTSC Products Snapshot.png

  8. We will now build the BPU and DPU packages. Open a command window (Start -> Run -> 'cmd') and type the following commands (w/o quotation marks)
    'cd C:\ti\midas_usound_demo4_rel'
    'makescript.bat'

  9. In this step we will run the autobuild script to build the executables for the three cores on 6678. This script imports the three CCS projects corresponding to the cores, and builds the executable corresponding to the 'Release' profile. 
    Type the following command on the prompt:
    'perl build6678.pl'

  10. Start CCSv5. Choose the same workspace as before, i.e. C:\ti\midas_usound_demo4_rel\ccsv5_workspace.
    At this point you should see all three projects under 'C/C++ Projects' window in CCSv5 and all three executables should be available as:
    Core 0: 'C:\ti\midas_usound_demo4_rel\miDAS\ultrasound\demo3\midend\midendapp\ccsProj_C6678\Release\midendapp_C6678.out'
    Core 1: 'C:\ti\midas_usound_demo4_rel\miDAS\ultrasound\demo3\midend\servers\server1\ccsProj_C6678\Release\server1_C6678.out'
    Core 2: 'C:\ti\midas_usound_demo4_rel\miDAS\ultrasound\demo3\midend\servers\server2\ccsProj_C6678\Release\server2_C6678.out'


System-on-Chip OMAP3530 (Back End)

All development for the System-on-Chip (OMAP3530) is done on the Linux PC (Ubuntu 10.04).

  1. DVSDK Setup
    A. -- Download --

    File is 'dvsdk_omap3530-evm_4_01_00_09_setuplinux' from http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/dvsdk/DVSDK_4_00/4_01_00_09/index_FDS.html
    B. -- Install --
    'linuxHost$ chmod +x dvsdk_omap3530-evm_4_01_00_09_setuplinux'
    'linuxHost$ ./dvsdk_omap3530-evm_4_01_00_09_setuplinux'
    Follow instructions as directed to setup DVSDK with Network Filesystem. If using defaults, this should install as ~/ti-dvsdk_omap3530-evm_4_01_00_09.
    C. -- Rebuild DVSDK and Setup Environment --

    'linuxHost$ cd ~/ti-dvsdk_omap3530-evm_4_01_00_09'
    'linuxHost$ make clean && make'
    'linuxHost$ setup.sh'
    After this step, among other development environment necessities, you should have NFS setup at ~/workdir/filesys, and minicom setup on your linux host with access to the OMAP terminal.

  2. IQMath Setup
    Download IQMath from http://focus.ti.com/docs/toolsw/folders/print/sprc542.html and install

  3. Download MIDAS package and untar
    Go to https://gforge.ti.com/gf/project/med_ultrasound/frs/, and download MIDASUltrasound4.0 source code package (midas_usound_demo4_rel.zip)
    'linuxHost$ cd ~'
    'linuxHost$ unzip midas_usound_demo4_rel.zip'

  4. Setup Environment
    Edit Paths as per your environment in file ~/midas_usound_demo4_rel/miDAS/ultrasound/demo3/backend/Paths.mak

  5. Build Executable for Backend
    'linuxHost$ cd ~/midas_usound_demo4_rel/miDAS/ultrasound/demo3/backend'
    'linuxHost$ make setup'
    'linuxHost$ make backend'

    This will build both the 'server.x64P' and the OMAP application 'ultrasound' and will copy them to the ${EXEC_INSTALL_DIR} specified in Paths.mak.

Setup and Run the Demo

Connect EVMs via Switch

  1. Connect power adapter to the Network Switch
  2. Connect 1 ethernet wire from Switch to Ethernet port on 6678 EVM
  3. Connect 1 ethernet wire from Switch to Ethernet port on OMAP3530 EVM

6678 Setup

Ensure that 6678 switch settings are setup for 'No Boot' and 'Static IP.' 
SW3  
(1-4)
SW4
(1-4)
SW5
(1-4)
SW6
(1-4)
SW9
(1-2)
off, on, on, on on, on , on, on on, on, on, on on, on, on, on off, off

  1. Connect power adapter to the 6678 EVM
  2. Connect emulator to 6678 EVM.
  3. If using SD Card, insert SD card into reader. The first partition (74MB) of the SD card, contains the folder '6678', which consists of two subfolders 'PrebuiltExecutables' and 'CCSTargetConfigurations.' 
  4. If building from source, download the 'CCSTargetConfigurations6678' and 'PrebuiltExecutables6678' zip from MIDASUltrasound4.0 release package here.
  5. The 'PrebuiltExecutables' subfolder contains the .out file to load on core x (x=0 / 1 / 2). Copy this folder to the Windows PC that has CCSv5 installed.
  6. Copy the .ccxml files in the 'CCSTargetConfigurations' subfolder to location 'C:\Documents and Settings\<<username>>\user\CCSTargetConfigurations.'
    The 'CCSTargetConfigurations' folder contains the .ccxml target configuration files for three different emulator options, that allow you to connect to the 6678 EVM and load the cores with the corresponding executables. 
  7. Start CCSv5, select a workspace directory as desired. Click on View -> Target Configurations. You should see the target configurations as shown in the snapshot:

    US4 CCSTargetConfigurationsSnapshot.PNG

  8. Depending on which emulator you have, you would use the corresponding target configuration. If you don't have any of the XDS560 external emulators you would use 'xds100_onboard_6678.ccxml.' Right click on the desired target configuration and click on 'Launch Selected Configuration.'
  9. Select the first three cores (C66x_0, C66x_1, C66x_2), right click and click on 'Connect Target':

    US4 ConnectTargetSnapshot.PNG

  10. Right click on each core one at a time, and load the corresponding .out files on respective cores. 
    Use the .out files from the 'PrebuiltExecutables' subfolder you copied earlier

    US4 LoadProgramSnapshot.PNG

  11. Run cores 0, 1, and 2 of 6678. In the Console window you should see the following messages displayed, after which it is waiting for a message from the OMAP EVM:
[C66xx_0] HEAPID = 3
[C66xx_0] IPADRESS is = 192.168.3.202
[C66xx_0] Mid End process started
[C66xx_0] MidEnd_gatherTask started
[C66xx_0] Successfully created BPU instance
[C66xx_0] Successfully created DPU instance

OMAP3530 Setup

1. Connect the DVI-HDMI cable between the OMAP EVM and the 7-inch external LCD screen. Connect the power adapter for the 7-inch screen.

2. There are 2 power ports on the OMAP3530 EVM. Connect the power adapter to the power port on shorter side of the EVM.

3. Connect the RS-232 cable between the EVM's UART1/2 port and your computer's serial port

4. Open a terminal program like HyperTerminal or Teraterm on Windows, or minicom on Linux. If you are building from source, you should already have minicom setup. Note, the terminal connection should be setup with the following settings: Bits per second: 115200, Data bits: 8, Parity: None, Stop bits: 1, Flow control: none

5. If using SD card, insert the SD card in the OMAP EVM’s SD card slot.
Ensure that the switch settings for the EVM are set for SD card boot. 

SD card boot switch settings for EVMs with Micron NAND,

Switch  8 7 6 5 4 3 2 1
State OFF OFF OFF OFF OFF ON ON ON

SD card boot switch settings for EVMs with Samsung NAND,

Switch 8 7 6 5 4 3 2 1
State OFF OFF OFF ON ON OFF OFF ON


6. The first time you turn on the OMAP3530 EVM for this demo, you need to modify the default boot arguments, since we want a 720p output via DVI, instead of the default 480p LCD output. Turn the switch on the EVM to the 'ON' position and keep the terminal window (HyperTerminal or Teraterm) open. You will start to see messages on the console. Once uboot boots, you will see a countdown, where you can abort the autoboot sequence. Once you do this, you should see the uboot prompt.

Please enter the following commands at this uboot prompt. Note that copy-pasting directly from the wiki could possibly cause issues. Also note that the bootargs given here are for a OMAP3530 w/ 256MB RAM. If you have a board with 128MB RAM, remove the phrase 'mem=128@0x88000000' from the bootargs below.
To verify that bootargs are set as desired, print them using 'pri' after setting them below:

If using SD card boot:

OMAPEVM# setenv bootargs console=ttyS0,115200n8 rw mem=99M@0x80000000 mpurate=720 mem=100M@0x89300000 omap_vout.vid1_static_vrfb_alloc=y 
omapfb.vram=0:3M omapfb.mode=dvi:1280x720@60 root=/dev/mmcblk0p2 rootfstype=ext3 rootwait ip=off 

OMAPEVM# setenv bootcmd 'mmc init;fatload mmc 0 0x82000000 uImage;bootm 0x82000000'

OMAPEVM# saveenv

OMAPEVM# boot

If building from source (NFS boot):
(Note: In commands below, replace <ip address of linux host> and <username> with appropriate values)

OMAPEVM# setenv bootargs console=ttyS0,115200n8 rw mem=99M@0x80000000 mpurate=720 mem=100@0x89300000 omap_vout.vid1_static_vrfb_alloc=y 
omapfb.vram=0:3M root=/dev/nfs nfsroot=<ipaddress of linux host>:/home/<username>/filesys ip=dhcp omapfb.mode=dvi:1280x720@60

OMAPEVM# setenv bootcmd 'dhcp;setenv serverip <ipaddress of linux host>;tftpboot;bootm'

OMAPEVM# saveenv

OMAPEVM# boot

7. If using SD card, the program (/opt/midas/ultrasound) will autostart once boot is complete. 
   If using NFS boot, you will need to manually launch the program as:

target$ cd /opt/midas
target$ ./ultrasound -qws

8. You should see the following messages appear on the CCSv5 console for 6678. At this point the OMAP should have successfully established connection with the 6678 and starts sending the data and commands. Note that the loading of input data via TFTP can take a few minutes to complete. Once you see these messages you should see the desired output ultrasound images on the external display

[C66xx_0] got 4 cseq=0
[C66xx_0] got 1 cseq=1
[C66xx_0] handling setup request DHM
[C66xx_0] APP STUB: got a setup DHM
[C66xx_0] got 6 cseq=2
[C66xx_0] FrontEndIf_loadParams(), starting to read 532 bytes from bpuConfig.bin
[C66xx_0] FrontEndIf_loadParams(): Read 532 bytes from bpuConfig.bin
[C66xx_0] FrontEndIf_loadParams(), starting to read 2078 bytes from dpuConfig.bin
[C66xx_0] FrontEndIf_loadParams(): Read 2078 bytes from dpuConfig.bin
[C66xx_0] FrontEndIf_loadInputData(), starting to read 36175872 bytes from userdata/InputData/TI_Carotid_tissdata_512x256x69.bin
[C66xx_0] FrontEndIf_loadInputData(): Read 36175872 bytes from userdata/InputData/TI_Carotid_tissdata_512x256x69.bin
[C66xx_0] FrontEndIf_loadInputData(), starting to read 33914880 bytes from userdata/InputData/TI_Carotid_velturbdata_48x256x10x69.bin
[C66xx_0] FrontEndIf_loadInputData(): Read 33914880 bytes from userdata/InputData/TI_Carotid_velturbdata_48x256x10x69.bin
[C66xx_0] Completed rdspcb_set
[C66xx_0] got 2 cseq=3
[C66xx_0] APP RDSPIF: got a play 0
[C66xx_0] MidEnd_controller: received cmd = 2 PLAY
[C66xx_0] Starting FrontEndIf

Known Issues

  • Minor: The directory structure of source code package 'midas_usound_demo4_rel.zip', uses directory name 'demo3' instead of 'demo4', i.e. 'midas_usound_demo4_rel/miDAS/ultrasound/'demo3' instead of 'midas_usound_demo4_rel/miDAS/ultrasound/demo4''


Previous Versions of MIDAS

See http://processors.wiki.ti.com/index.php/Medical_Imaging_Demo_Application_Starter_(MIDAS) for a comparison between various versions of MIDAS


Useful References

For more on DSP and ARM application development, application notes, white papers and information on TI's latest offerings, including the C66x keystone devices, the following are some useful references:

Multicore DSPs

Medical Imaging

ARM and DSP Development


Support and Questions

As noted in the disclaimer, software is licensed under BSD and is provided "as is". Please post questions to http://e2e.ti.com.


Disclaimer

System and equipment manufacturers and designers are responsible to ensure that their systems (and any TI devices incorporated in their systems) meet all applicable safety, regulatory and system-level performance requirements. All application-related information on this website (including application descriptions, suggested TI devices and other materials) is provided for reference only. This information is subject to customer confirmation, and TI disclaims all liability for system designs and for any applications assistance provided by TI. Use of TI devices in life support and/or safety applications is entirely at the buyer's risk, and the buyer agrees to defend, indemnify and hold harmless TI from any and all damages, claims, suits or expense resulting from such use.

All software is licensed under BSD with the following terms and conditions:

Copyright (C) 2011 Texas Instruments Incorporated - http://www.ti.com/
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of Texas Instruments Incorporated nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright owner or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.