Please note as of Wednesday, August 15th, 2018 this wiki has been set to read only. If you are a TI Employee and require Edit ability please contact x0211426 from the company directory.

SysLink 2.00.00.78 DataSheet TI816x

From Texas Instruments Wiki
Jump to: navigation, search

Introduction

The purpose of this document is to provide the performance data for the SYS/Link modules on TI816X platform.

Terms and Abbreviations

Abbreviation Description
CCS Code Composer Studio
IPC Inter-Processor Communication
GPP General Purpose Processor e.g. ARM
DSP Digital Signal Processor e.g. C64X
EVM Evaluation Module
API Application Programmable Interface
SFQ Single Frame Queue
MFQ Multiple Frame Queue

Processor Information

Processor core Speed
ARM (Cortex A8) 986 MHz
DSP (C674x) 800 MHz
Video-M3 (Cortex M3) 250 MHz (timer runs at 2x)
VPSS-M3 (Cortext M3) 250 MHz (timer runs at 2x)

Note: Performance numbers for Cortex-A8, DSP and VIDEO-M3 cores are only mentioned for all modules. VPSS-M3 numbers are not published as it is identical with VIDEO-M3.

Setup details

  • EVM and Silicon details
  * TI816X EVM (Rev C)
  * DDR2 interface
  * EVM Serial No: NEB1011040
  • Internal memory configuration
  * L1 and L2 cache for DSP is configured as follows:
   * L1D: 32K
   * L1P: 32K
   * L2 : 32K
  * Shared Regions configuration:
   * Cached on the rtos, non-cached on Linux

Build details

The performance numbers were obtained with the following build configurations:

  • IPC product build and SysLink RTOS build (SYS/BIOS side)
    • Release profile
    • Disable asserts
    • Disable logger
    • Non-instrumented
  • Syslink HLOS build (Linux side)
    • Optimized build (SYSLINK_BUILD_OPTIMIZE = 1)
    • Release mode (SYSLINK_BUILD_DEBUG = 0)
    • Disable all the traces (SYSLINK_TRACE_ENABLE = 0)
  • Linux kernel
    • Default configuration with kernel debugging disabled

Resource Usage

Notify

  • Total available events = 32
  • Usage by different modules is as follows:
Module Event Ids used
FrameQBufMgr 0
FrameQ 1
MessageQ (TransportShm) 2
RingIO 3
NameServerRemoteNotify 4

Gate Hardware Spinlocks

  • Total number of Gate hardware spinlocks = 64
  • Usage by different modules is as follows:
Module Number of spin locks used
Shared Region 0 1
Frame Queue instance 2
Frame Queue Buffer Manager instance 2
RINGIO instance 2

Note: The Frame Queue, Frame Queue Buffer Manager and RINGIO instances will utilize the above specified Gate Hardware Spinlocks only if the gate type specifed is GateMP_RemoteProtect_SYSTEM.

Performance data

Notify

ARM to DSP round trip time
The time (round trip) taken for a notification to travel from ARM to DSP and back to ARM is measured. Here is the procedure followed to get the round trip

time:

  • On ARM side, send notification from ARM to DSP (Capture the time stamp "T1" before calling Notify send API)
  • On DSP side, in Notify callback function, send notification to ARM
  • On ARM side, receive the notification from DSP (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Round trip time: 106192 cycles


ARM to Video-M3 round trip time
The time (round trip) taken for a notification to travel from ARM to Video-M3 and back to ARM is measured. Here is the procedure followed to get the round trip time:

  • On ARM side, send notification from ARM to Video-M3 (Capture the time stamp "T1" before calling Notify send API)
  • On Video-M3 side, in Notify callback function, send notification to ARM
  • On ARM side, receive the notification from Video-M3 (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Round trip time: 102642 cycles

Message Queue

ARM to DSP round trip time
The time (round trip) taken for a message to travel from ARM to DSP and back to ARM is measured. Here is the procedure followed to get the round trip time:

  • Transfer the message from ARM to DSP (Capture the time stamp "T1" before calling put() API on ARM side)
  • Receive the message on the DSP and send the received message back to ARM on another messageQ to ARM
  • Receive the message on the ARM (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Message Size Average Round Trip Cycles
64 bytes 75330
128 bytes 74640
1 KB 73950
10 KB 95543
100 KB 207651

ARM to Video-M3 round trip time
The time (round trip) taken for a message to travel from ARM to Video-M3 and back to ARM is measured. Here is the procedure followed to get the round trip time:

  • Transfer the message from ARM to Video-M3 (Capture the time stamp "T1" before calling put() API on ARM side)
  • Receive the message on the Video-M3 and send the received message back to ARM on another messageQ to ARM
  • Receive the message on the ARM (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Message Size Average Round Trip Cycles
64 bytes 103332
128 bytes 103135
1 KB 109446
10 KB 124532
100 KB 213469

Frame Queue

All Frame Queue tests use frame buffers of size 345600 bytes with 2 frame buffers in each frame.

API profiling (DSP)
The frames are allocated and transferred (put) through the frame queue one after the other and after this the frames are received (get) and freed one after

the other in the same thread. Profile each API during the transfer of frame with in the same processor.

  • Frame transfer using SFQ with in DSP with Notify Disabled
API Average (cycles)
FrameQ_alloc 4211
FrameQ_put 6232
FrameQ_get 4393
FrameQ_free 5820
Total time 20656
  • Frame transfer using MFQ with in DSP with Notify Disabled (16 frame pools and internal queues)
API Average (cycles)
FrameQ_allocv 27669
FrameQ_putv 43583
FrameQ_getv 44068
FrameQ_freev 49131
Total time 164451


API profiling (Video-M3)
The frames are allocated and transferred (put) through the frame queue one after the other and after this the frames are received (get) and freed one after

the other in the same thread. Profile each API during the transfer of frame with in the same processor.

  • Frame transfer using SFQ with in Video-M3 with Notify Disabled
API Average (cycles)
FrameQ_alloc 3319
FrameQ_put 5099
FrameQ_get 4656
FrameQ_free 4996
Total time 18070
  • Frame transfer using MFQ with in Video-M3 with Notify Disabled (16 frame pools and internal queues)
API Average (cycles)
FrameQ_allocv 36281
FrameQ_putv 53524
FrameQ_getv 50402
FrameQ_freev 55176
Total time 195383

API profiling (ARM to DSP)
The frames are allocated and transferred (put) from ARM to DSP and on the DSP side the received (get) and freed one after the other. The same procedure is

repeated from DSP to ARM. The APIs are profiled during the above transfers.

  • ARM side
API Average cycles)
FrameQ_alloc 16663
FrameQ_put 22875
FrameQ_get 20312
FrameQ_free 21791
Total time 81641
  • DSP side
API Average (cycles)
FrameQ_alloc 5466
FrameQ_put 20622
FrameQ_get 12920
FrameQ_free 8837
Total time 47845


API profiling (ARM to Video-M3)
The frames are allocated and transferred (put) from ARM to Video-M3 and on the Video-M3 side the received (get) and freed one after the other. The same procedure is repeated from Video-M3 to ARM. The APIs are profiled during the above transfers.

  • ARM side
API Average (cycles)
FrameQ_alloc 15973
FrameQ_put 26228
FrameQ_get 20213
FrameQ_free 21495
Total time 83909
  • Video-M3 side
API Average (cycles)
FrameQ_alloc 13888
FrameQ_put 30490
FrameQ_get 19529
FrameQ_free 20783
Total time 84690

API profiling (Inter Ducati)

  • Frame transfer using SFQ between in Video-M3 and VPSS-M3 with Notify Enabled
API Average (cycles)
FrameQ_alloc 8950
FrameQ_put 18643
FrameQ_get 12064
FrameQ_free 14817
Total time 31101

RingIO

Data transfer from ARM to DSP
The numbers are captured while transfering 1Kbytes of data from ARM to DSP.

  • ARM
APIs Cycles
Create() 6514502
Open() 5180444
Acquire() 34510
Release() 24650
SetAttributes() 22678
Close() 782884
Delete() 736542
  • DSP
APIs Cycles
Create() 142479
Open() 116874
Acquire() 7238
Release() 21257
SetAttributes() 8465
Close() 8641
Delete() 33570


Data transfer from ARM to Video-M3
The numbers are captured while transfering 1Kbytes of data from ARM to Video-M3.

  • ARM
APIs Cycles
Create() 6577606
Open() 5634990
Acquire() 29580
Release() 23664
SetAttributes() 23664
Close() 797674
Delete() 763164


  • Video-M3
APIs Cycles
Create() 205712
Open() 235082
Acquire() 7208
Release() 25576
SetAttributes() 12584
Close() 32636
Delete() 67200

Proc Manager

DSP
The time taken to load and start the DSP image from ARM is captured. The size of the DSP image is 11.2MB and the file is loaded from a ramdisk

APIs Average Cycles
Proc load 5858812
Proc start 13804

Video-M3
The time taken to load and start the Video-M3 image from ARM is captured. The size of the Video-M3 image is 8.6MB and the file is loaded from a ramdisk

APIs Average Cycles
Proc load 10894314
Proc start 16762