NOTICE: The Processors Wiki will End-of-Life in December of 2020. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

DSP/BIOS on Multi-Core multipleimage

From Texas Instruments Wiki
Jump to: navigation, search

Multiple Applications Independent on Each Core

The multipleimage example application shows how different applications can run on multiple cores completely independently. This section contains information on how to allocate memory, load, and run such applications. The intention is to illustrate how to avoid memory collisions when running independent applications on each core.

The shared memory in the C6472 is partitioned such that each partition is "owned" by one and only one core. Figure 1 shows how six different applications will be allocated in memory to run on the six cores of the C6472, using Local L2 and partitions of the Shared L2 and DDR memory.

Figure 1. Multiple Application Images Independent on Each Core of a C6472 Device
Figure 1. Multiple Application Images Independent on Each Core

Multipleimage Example Applications

The example for this scenario comprises the several multipleimage_appN [N=0..5] project folders. The same basic application code from the singleimage project is used for this example, with the exception that the number of writer tasks is equal to N+1 for CoreN and multipleimage_appN. Like singleimage, this application uses a QUE (DSP/BIOS queue) to send messages and a SEM (DSP/BIOS semaphore) to synchronize between one or more writer() tasks to a single reader() task. The reader task, the writer task(s), and the semaphore DSP/BIOS objects are statically created in the DSP/BIOS configuration file in each project folder. The memory location for the code and data are also set in the DSP/BIOS configuration file in each project folder to implement the arrangement shown in Figure 1.

Each of the identical writer tasks will load and send a message through the DSP/BIOS QUE mechanism, then will post a SEM to tell the reader task that a message is ready. The reader task will respond to the SEM by pulling a message off the QUE object and reporting the results using a LOG_printf message.

You will be able to observe the results of the execution by each core by using the RTA->Printf Logs window.

The application should be built using DSP/BIOS 5.41.07 or greater and CCSv4.2.0.09 or greater. Earlier releases will not support the Printf Logs as well or at all. With DSP/BIOS 5.41.07 or greater and CCSv4.2.0.09 or greater, you will be able to observe the results of the execution by each core by using the RTA->Printf Logs window. With earlier releases of DSP/BIOS and CCSv4, you would have to use the ROV tool and click down into the log buffers to observe the results.

Each project folder has a slightly different C source file and a slightly different DSP/BIOS Configuration .tcf file. The mi_appN.c [N=0..5] files differ only by the number of messages allocated, 3 for each of the writer tasks that are defined in the tcf file.

The multipleimage_appN.tcf DSP/BIOS Configuration files define N+1 writer tasks for CoreN, define separate partitions in both SL2RAM and DDR2, define a DDR2_COMMON memory partition that is identical for all cores. The configuration files also place different parts of the application in different memory partitions. This partitioning is what allows the cores to use a physical shared memory resource (Shared L2 and DDR2) at the same time without interfering with each other. There are some duplicate code segments stored in shared memory for each core in this case, but if the applications' differences were greater, then there would be a greater need to keep these segments separate. Using this method, there is no reason to worry about where the various sections of BIOS and application code and data need to be placed - all are kept separate from those sections of the other cores.

Shared Memory Partitioning

Figure 1 shows how Shared L2 (SL2RAM) and DDR2 memory are divided into multiple partitions. Any partition that is allocated to CoreX will not be available to CoreY.

SL2RAM Partitions

The C6472 SL2RAM is 768KB in size. This divides evenly into 6 equal partitions of 128KB each. Some applications may find it useful to have an uneven set of partitions, but for this example and for many applications the even division into partitions will work well. SL2RAM starts at address 0x00200000, so each partition starts at 0x00200000+(0x00020000*N) for CoreN, N=0..5.

As an example, the following lines were added to multipleimage_app1.tcf to implement the SL2RAM partition for Core1. These changes could also be done through the DSP/BIOS Configuration Tool, but it was easier for me to do that once and then copy the lines to the other 5 tcf files and manually edit the addresses.

bios.MEM.instance("SL2RAM").comment = "128K = 1/6th 768K Shared L2 RAM";
bios.MEM.instance("SL2RAM").base = 0x00220000;
bios.MEM.instance("SL2RAM").len = 0x00020000;
bios.MEM.instance("SL2RAM").createHeap = 1;
bios.MEM.instance("SL2RAM").heapSize = 0x00000800;

DDR2 Partitions

For this implementation, the DDR2 memory is 256MB in size. This is divided into 6 equal partitions of 32MB each plus 1 64MB partition. Some applications may find it useful to have a different sizing for the partitions, but for this example this configuration will work well. DDR2 starts at address 0xE0000000, so each partition starts at 0xE0000000+(0x02000000*N) for CoreN, N=0..5.

As an example, the following lines were added to multipleimage_app1.tcf to implement the unique 32MB DDR2 partition for Core1.

bios.MEM.instance("DDR2").comment = "Core1's 32MBytes of DDR2";
bios.MEM.instance("DDR2").base = 0xe2000000;
bios.MEM.instance("DDR2").len = 0x02000000;
bios.MEM.instance("DDR2").createHeap = 1;
bios.MEM.instance("DDR2").heapSize = 0x00000800;

The following lines were added to multipleimage_appN.tcf to implement the shared 64MB DDR2 common partition for CoreN, N=0..5. This logically shared region, DDR2_COMMON, is defined exactly the same for all of the cores so they may all use this common space. DDR2_COMMON is not used in this application note.

bios.MEM.create("DDR2_COMMON");
bios.MEM.instance("DDR2_COMMON").base = 0xec000000;
bios.MEM.instance("DDR2_COMMON").len = 0x04000000;
bios.MEM.instance("DDR2_COMMON").createHeap = 0;
bios.MEM.instance("DDR2_COMMON").space = "code/data";
bios.MEM.instance("DDR2_COMMON").comment = "64MBytes common DDR2 for all cores";

Code and Data Placement

One way to build and run multiple independent applications would be to place all code and data in local L2 (LL2RAM) so that each core is completely isolated from the other cores. That method would work, but it ignores the additional memory space available in the Shared L2 (SL2RAM) on-chip memory and the DDR2 off-chip memory. Larger applications that do not fit into LL2RAM would have to use a different method to fit the application into available memory other than LL2RAM.

Instead, the multipleimage applications use the memory partitions described above. Each of the multipleimage_appN [N=0..5] uses different locations for their program and data. This is not a common way to use the partitions, but it provides an example of the possibilities and flexibility of the CCSv4 tools. The multipleimage applications do not share any code or data space with the other cores. This technique allows you to load and run independent applications on all cores while isolating the cores from each other. See Figure 1 for details.

The default DSP/BIOS configuration platform file for the C6472 is setup with all the initialized sections (program code and constant data) in the Shared L2 space and all of the uninitialized and unique (data and interrupt vectors) sections in the Local L2 space. The multipleimage_appN.tcf [N=0..5] configuration files must be modified to change from these defaults to the program code and data placement that has been defined in Figure 1.

As you see in Figure 1, the code and data placement plan is systematic. For Core0, Core1, and Core2, all code and all data are placed in Local L2 (private physical memory), Shared L2 (private partition), and DDR2 (private partition), respectively. For Core3, Core4, and Core5, code and data are split to different private memory areas and the placement is changed from one core to the next.

Code Placement

In a DSP/BIOS-based application, the .text and .bios sections contain almost all the code needed for the application. For the multipleimage applications, there is no reason to choose to place code in LL2RAM or SL2RAM or DDR2. Any of those could be chosen for performance or space availability reasons. Each of the multipleimage applications uses a different choice of where to locate the program and initialized data. This will allow a simple benchmarking or comparison of performance when the code is placed in one memory or another.

The following memory sections comprise the initialized code and data sections. These are defined in the tcf file to be placed as specified in Figure 1.

  • BIOS Data: .gblinit
  • BIOS Code: .bios, .sysinit, .hwi, .rtdx_text
  • Compiler Sections: .text, .switch, .cinit, .pinit, .const/.printf, .data

Data Placement

There is no reason for the multipleimage applications to choose to place uninitialized data in LL2RAM or SL2RAM or DDR2. Any of those could be chosen for performance or space availability reasons. Each of the multipleimage applications uses a different choice of where to locate the uninitialized data. This will allow a simple benchmarking or comparison of performance when the code is placed in one memory or another.

The following memory sections comprise the uninitialized data sections (and .hwi_vec). These are defined in the tcf file to be placed as specified in Figure 1.

  • Segment For DSP/BIOS Objects
  • Segment For malloc() / free()
  • BIOS Data: .args, .stack, .trcdata, .sysdata, DSP/BIOS Conf Sections
  • BIOS Code: .hwi_vec
  • Compiler Sections: .bss, .far, .cio

Configuration File Updates for Code and Data Placement

The following lines were added at the end of the multipleimage_app5.tcf file to place all of the code into DDR2 and all of the uninitialized data sections (plus .hwi_vec) into SL2RAM. The first part allocates the memory segments and the second part puts all of the BIOS Objects into SL2RAM instead of the default LL2RAM.

/* allocate memory segments */
bios.MEM.BIOSOBJSEG = prog.get("SL2RAM");
bios.MEM.MALLOCSEG = prog.get("SL2RAM");
bios.MEM.ARGSSEG     = prog.get("SL2RAM");  /* BIOS Uninit .args          def: LL2 */
bios.MEM.STACKSEG    = prog.get("SL2RAM");  /* BIOS Uninit .stack         def: LL2 */
bios.MEM.TRCDATASEG  = prog.get("SL2RAM");  /* BIOS Uninit .trcdata       def: LL2 */
bios.MEM.SYSDATASEG  = prog.get("SL2RAM");  /* BIOS Uninit .sysdata       def: LL2 */
bios.MEM.OBJSEG      = prog.get("SL2RAM");  /* BIOS Uninit .*obj?         def: LL2 */
bios.MEM.HWIVECSEG   = prog.get("SL2RAM");  /* BIOS Init'd .hwi_vec       def: LL2 */
bios.MEM.GBLINITSEG  = prog.get("DDR2");    /* BIOS Init'd .gblinit       def: SL2 */
bios.MEM.BIOSSEG     = prog.get("DDR2");    /* BIOS Init'd .bios          def: SL2 */
bios.MEM.SYSINITSEG  = prog.get("DDR2");    /* BIOS Init'd .sysinit       def: SL2 */
bios.MEM.HWISEG      = prog.get("DDR2");    /* BIOS Init'd .hwi           def: SL2 */
bios.MEM.RTDXTEXTSEG = prog.get("DDR2");    /* BIOS Init'd .rtdx_text     def: SL2 */
bios.MEM.BSSSEG      = prog.get("SL2RAM");  /* Cmpl Uninit .bss           def: LL2 */
bios.MEM.FARSEG      = prog.get("SL2RAM");  /* Cmpl Uninit .far           def: LL2 */
bios.MEM.CIOSEG      = prog.get("SL2RAM");  /* Cmpl Uninit .cio           def: LL2 */
bios.MEM.TEXTSEG     = prog.get("DDR2");    /* Cmpl Init'd .text          def: SL2 */
bios.MEM.SWITCHSEG   = prog.get("DDR2");    /* Cmpl Init'd .switch        def: SL2 */
bios.MEM.CINITSEG    = prog.get("DDR2");    /* Cmpl Init'd .cinit         def: SL2 */
bios.MEM.PINITSEG    = prog.get("DDR2");    /* Cmpl Init'd .pinit         def: SL2 */
bios.MEM.CONSTSEG    = prog.get("DDR2");    /* Cmpl Init'd .const/.printf def: SL2 */
bios.MEM.DATASEG     = prog.get("DDR2");    /* Cmpl Init'd .data          def: LL2 */

/* select memory segments for all the objects, def: LL2 */
bios.BUF.OBJMEMSEG = prog.get("SL2RAM");
bios.SYS.TRACESEG = prog.get("SL2RAM");
bios.LOG.OBJMEMSEG = prog.get("SL2RAM");
bios.LOG.instance("LOG_system").bufSeg = prog.get("SL2RAM");
bios.STS.OBJMEMSEG = prog.get("SL2RAM");
bios.CLK.OBJMEMSEG = prog.get("SL2RAM");
bios.PRD.OBJMEMSEG = prog.get("SL2RAM");
bios.SWI.OBJMEMSEG = prog.get("SL2RAM");
bios.TSK.OBJMEMSEG = prog.get("SL2RAM");
bios.TSK.instance("reader0").stackMemSeg = prog.get("SL2RAM");
bios.TSK.instance("TSK_idle").stackMemSeg = prog.get("SL2RAM");
bios.TSK.instance("writer0").stackMemSeg = prog.get("SL2RAM");
bios.TSK.instance("writer1").stackMemSeg = prog.get("SL2RAM");
bios.TSK.instance("writer2").stackMemSeg = prog.get("SL2RAM");
bios.TSK.instance("writer3").stackMemSeg = prog.get("SL2RAM");
bios.TSK.instance("writer4").stackMemSeg = prog.get("SL2RAM");
bios.TSK.instance("writer5").stackMemSeg = prog.get("SL2RAM");
bios.IDL.OBJMEMSEG = prog.get("SL2RAM");
bios.SEM.OBJMEMSEG = prog.get("SL2RAM");
bios.MBX.OBJMEMSEG = prog.get("SL2RAM");
bios.QUE.OBJMEMSEG = prog.get("SL2RAM");
bios.LCK.OBJMEMSEG = prog.get("SL2RAM");
bios.DIO.OBJMEMSEG = prog.get("SL2RAM");
bios.DHL.OBJMEMSEG = prog.get("SL2RAM");
bios.RTDX.RTDXDATASEG = prog.get("SL2RAM");
bios.HST.OBJMEMSEG = prog.get("SL2RAM");
bios.HST.instance("RTA_fromHost").bufSeg = prog.get("SL2RAM");
bios.HST.instance("RTA_toHost").bufSeg = prog.get("SL2RAM");
bios.PIP.OBJMEMSEG = prog.get("SL2RAM");
bios.SIO.OBJMEMSEG = prog.get("SL2RAM");

Installing the Applications

Download the Randyp_multipleimage.zip file from here to a temporary location on your computer. Use the CCSv4 File->Import command to import the multipleimage_appN projects into your workspace using these steps:

  1. From the CCSv4 main menu, select the File->Import command
  2. Select as source type CCS:Existing CCS/CCE Eclipse Projects
  3. Select an archive file
  4. Browse to the Randyp_multipleimage.zip file or enter the path to it
  5. Select all of the multipleimage_appN projects
  6. Click Finish

Since these examples were built and archived from a workspace using CCSv4.2.4.00033, DSP/BIOS 5.41.07.24, Code Generation Tools 7.0.3, and the on-board XDS100USB emulator, there may be some "housekeeping" needed to make your build and debug experience run smoothly. Please follow these steps for each project to make sure the right tools are selected:

  1. In the C/C++ Projects window, right-click on the project folder name and select Build Properties..., select CCS Build in the left pane then the General tab
  2. Click the drop-down arrow for Code Generation tools and select the latest version of CGT 7.0.x that you have. Earlier versions like 6.0.x may work but have not been tested (please comment here if you test them). If your latest version is not yet in the list, click More then Select tool from file-system and browse to the folder, such as C:\Program Files\Texas Instruments\C6000 Code Generation Tools 6.1.17, then click OK and OK.
  3. Click the drop-down arrow for DSP/BIOS version and select the latest version of BIOS 5.xx that you have. Earlier versions like 5.33 may work but have not been tested, and they will not display the RTA Printf Logs correctly. BIOS 6.x will not be compatible without some time to do the migration. If your latest version is not in the list, click More then Select tool from file-system and browse to the folder, such as C:\Program Files\Texas Instruments\bios_5_41_07_24, then click OK and OK.
  4. Click OK to save these changes and select Apply changes to existing build configuration in the Save Build Configuration Settings window if it comes up.
  5. Repeat the steps above for each of the multipleimage_appN projects.
  6. Repeat the steps above for the Release configuration when you want to build your final application with optimization. Use the Debug configuration for evaluation and functional debug.

Building the Applications

For convenience, dependencies may used so that all of the projects will be built and kept up-to-date together, with a click of one button. To make this work, follow the steps below to get a successful build of the projects.

  1. Select the multipleimage_app0 project, right-click and Set as Active Project.
  2. On the multipleimage_app0 project, right-click and Build Properties..., select CCS Build in the left pane then the Dependencies tab. Make sure there are no dependencies, remove any that may be there, then click OK. This is to avoid the compiler running through all the projects if there are any setup problems.
  3. On the main icon row click the Build Active Project icon (see the picture below).
  4. After a successful build, on the multipleimage_app0 project, right-click and Build Properties..., select CCS Build in the left pane then the Dependencies tab. Now add all of the other multipleimage_appN projects, as shown in the picture below, then click OK.
  5. Now and anytime for this set of projects, simply click the Build Active Project icon, and all of the projects will be built, successfully. You should have no errors or warnings. If you change a line in mi_app3.c, click Build Active Project and you will have multipleimage_app3 compile & link, plus multipleimage_app0 will re-link.

Randyp BIOS Multicore mi build.png

Loading and Running the Applications

For the multipleimage example, mach core needs to have a different application image loaded onto it. You can use CCSv4 to load each application one-at-a-time. And you can run each core individually or use the CCSv4 Synchronous Mode to start all cores simultaneously.

After a successful build has been completed for all cores and each time after starting CCSv4 for these projects, follow these steps in the Debug View to load and run the multipleimage applications (use the menu commands View->C/C++ Projects and View->Debug to make both windows visible, if needed):

Randyp BIOS Multicore mi Debug.png

  1. Launch TI Debugger to load the Target Configuration
  2. On the Debug window icon row, click Enable Synchronous Mode
  3. On the main icon row, click Connect Target to connect to all the cores, making sure that all the cores successfully connected and indicate (Suspended)
  4. On the Debug window icon row, click Disable Synchronous Mode
  5. On the Debug icon row click Collapse All (for ease).
  6. Highlight Core0 and on the main icon row, click Load Program then Browse Project and select multipleimage_app0.out. Click OK and OK to load it to Core0. Repeat for all of the cores and their associated multipleimage_appN.out files. (it may be easiest to start at the bottom with Core5 because of the auto-expansion that occurs; and you can double-click on a .out file in the Browse Project window rather than selecting and clicking OK)
  7. On the Debug window icon row, click Enable Synchronous Mode
  8. On the CCS menu bar, go to Tools->RTA->Printf Logs
  9. (This step may not be required with all releases) On the Debug icon row click Collapse All (for ease), then select each core one-at-a-time and make sure in the Printf Logs window that the "Stream RTA data" icon is selected and highlighted.
  10. On the Debug window icon row, click Run, wait 5 seconds and click Halt.
  11. Observe the Printf Logs window for each of the cores. Use Collapse All if it makes it easier to go through the list of cores. When Core1 is selected in the Debug window, the display will be as shown here. Note the Core N number on some lines. In each Printf Log window, click the Auto Fit Columns button to show all of the text.

Printf Logs for Core1

Some points to know:

  • Once you have loaded all the cores, when you later make changes and do a Rebuild, CCSv4 will let you select the option to Reload the program automatically when it detects that a loaded .out file has changed. Checking Remember my decision will save having to answer this every time.
  • After the load or reload, the processor will automatically run to main() and halt. If it is stuck running, click Halt and then Restart (or Reload Program, or CPU Reset and then Reload Program).
  • You can associate a project's .out file with a single CPU core, if you want to. I have not used this method, but it could prove helpful in some cases. In the C/C++ Projects window, right-click on the project folder name and select Debug Properties... and make sure you are on the Debugger tab. For all of the *_appN projects, click on Connect to exact CPU and use the drop-down box to select the matching CoreN.

HINT: With the Synchonous Mode enabled, you can click Reload Program on the main icon row to reload all of the application images to their associated cores. Once they are all loaded, click Run, wait then click Halt.

WHY THIS: Compare the CPU cycles reported to execute each of the different appN instances. Why are the averages different from one core to the next? Are the different execution times consistently different, when considering the location of the program and data in memory? Why are the cycle counts different on the same core from the first writer to the second and others?

TRY THIS: In the C/C++ Projects window, find and open mi_app5.c and scroll down to the beginning of main(). Edit the text in the LOG_printf there, for example change "app5 started" to "app5a started". On the main icon row, click Build Active Project. The changes for multipleimage_app5 will be compiled, all cores will be checked, and all cores will be loaded. You may need to enable the automatic reloading once, but it will always happen after that.

FIX THIS: You may notice that for app5, the first few lines of LOG printf lines do not show. Open the app5 tcf file and find the place where the LOG trace object is created. You can increase the buffer size there. Then re-build and run the application again to see if you get all of the data displayed this time.

Return to Using DSP/BIOS on Multi-Core DSP Devices.