Please note as of Wednesday, August 15th, 2018 this wiki has been set to read only. If you are a TI Employee and require Edit ability please contact x0211426 from the company directory.

SysLink 2.00.00.78 DataSheet TI814x

From Texas Instruments Wiki
Jump to: navigation, search

Introduction

The purpose of this document is to provide the performance data for the SysLink modules on TI814X platform.

Terms and Abbreviations

Abbreviation Description
CCS Code Composer Studio
IPC Inter-Processor Communication
GPP General Purpose Processor e.g. ARM
DSP Digital Signal Processor e.g. C64X
EVM Evaluation Module
API Application Programmable Interface
SFQ Single Frame Queue
MFQ Multiple Frame Queue

Processor Information

Processor core Speed
ARM (Cortex A8) 598 MHz
DSP (C674x) 500 MHz
Video-M3 (Cortex M3) 200 MHz (timer runs at 2x)
VPSS-M3 (Cortext M3) 200 MHz (timer runs at 2x)

Note: Performance numbers for Cortex-A8, DSP and VIDEO-M3 cores are only mentioned for all modules. VPSS-M3 numbers are not published as it is identical with VIDEO-M3.

Setup details

  • EVM and Silicon details
  * TI814X EVM (Rev B)
  * DDR2 interface
  * PG1.0 Silicon
  • Internal memory configuration
  * L1 and L2 cache for DSP is configured as follows:
   * L1D: 32K
   * L1P: 32K
   * L2 : 32K
  * Shared Regions configuration:
   * Cached on the rtos, non-cached on Linux

Build details

The performance numbers were obtained with the following build configurations:

  • IPC product build and SysLink RTOS build (SYS/BIOS side)
    • Release profile
    • Disable asserts
    • Disable logger
    • Non-instrumented
  • Syslink HLOS build (Linux side)
    • Optimized build (SYSLINK_BUILD_OPTIMIZE = 1)
    • Release mode (SYSLINK_BUILD_DEBUG = 0)
    • Disable all the traces (SYSLINK_TRACE_ENABLE = 0)
  • Linux kernel
    • Default configuration with kernel debugging disabled

Resource Usage

Notify

  • Total available events = 32
  • Usage by different modules is as follows:
Module Event Ids used
FrameQBufMgr 0
FrameQ 1
MessageQ (TransportShm) 2
RingIO 3
NameServerRemoteNotify 4

Gate Hardware Spinlocks

  • Total number of Gate hardware spinlocks = 64
  • Usage by different modules is as follows:
Module Number of spin locks used
Shared Region 0 1
Frame Queue instance 2
Frame Queue Buffer Manager instance 2
RINGIO instance 2

Note: The Frame Queue, Frame Queue Buffer Manager and RINGIO instances will utilize the above specified Gate Hardware Spinlocks only if the gate type specifed is GateMP_RemoteProtect_SYSTEM.

Performance data

Notify

ARM to DSP round trip time
The time (round trip) taken for a notification to travel from ARM to DSP and back to ARM is measured. Here is the procedure followed to get the round trip

time:

  • On ARM side, send notification from ARM to DSP (Capture the time stamp "T1" before calling Notify send API)
  • On DSP side, in Notify callback function, send notification to ARM
  • On ARM side, receive the notification from DSP (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Round trip time: 58245 cycles


ARM to Video-M3 round trip time
The time (round trip) taken for a notification to travel from ARM to Video-M3 and back to ARM is measured. Here is the procedure followed to get the round trip

time:

  • On ARM side, send notification from ARM to Video-M3 (Capture the time stamp "T1" before calling Notify send API)
  • On Video-M3 side, in Notify callback function, send notification to ARM
  • On ARM side, receive the notification from Video-M3 (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Round trip time: 52564 cycles

Message Queue

ARM to DSP round trip time
The time (round trip) taken for a message to travel from ARM to DSP and back to ARM is measured. Here is the procedure followed to get the round trip time:

  • Transfer the message from ARM to DSP (Capture the time stamp "T1" before calling put() API on ARM side)
  • Receive the message on the DSP and send the received message back to ARM on another messageQ to ARM
  • Receive the message on the ARM (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Message Size Average Round Trip Cycles
64 bytes 56271
128 bytes 55374
1 KB 55733
10 KB 67275
100 KB 189446

ARM to Video-M3 round trip time
The time (round trip) taken for a message to travel from ARM to Video-M3 and back to ARM is measured. Here is the procedure followed to get the round trip time:

  • Transfer the message from ARM to Video-M3 (Capture the time stamp "T1" before calling put() API on ARM side)
  • Receive the message on the Video-M3 and send the received message back to ARM on another messageQ to ARM
  • Receive the message on the ARM (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Message Size Average Round Trip Cycles
64 bytes 90955
128 bytes 89401
1 KB 94723
10 KB 106503
100 KB 196502

Frame Queue

All Frame Queue tests use frame buffers of size 345600 bytes with 2 frame buffers in each frame.

API profiling (DSP)
The frames are allocated and transferred (put) through the frame queue one after the other and after this the frames are received (get) and freed one after the other in the same thread. Profile each API during the transfer of frame within the same processor.

  • Frame transfer using SFQ with in DSP with Notify Disabled
API Average (cycles)
FrameQ_alloc 3975
FrameQ_put 5971
FrameQ_get 4319
FrameQ_free 6007
Total time 20272
  • Frame transfer using MFQ with in DSP with Notify Disabled (16 frame pools and internal queues)
API Average (cycles)
FrameQ_allocv 25188
FrameQ_putv 41783
FrameQ_getv 40187
FrameQ_freev 47096
Total time 154254


API profiling (Video-M3)
The frames are allocated and transferred (put) through the frame queue one after the other and after this the frames are received (get) and freed one after the other in the same thread. Profile each API during the transfer of frame with in the same processor.

  • Frame transfer using SFQ with in Video-M3 with Notify Disabled
API Average (cycles)
FrameQ_alloc 6346
FrameQ_put 9287
FrameQ_get 8902
FrameQ_free 11107
Total time 35642
  • Frame transfer using MFQ with in Video-M3 with Notify Disabled (16 frame pools and internal queues)
API Average (cycles)
FrameQ_allocv 70576
FrameQ_putv 115974
FrameQ_getv 105787
FrameQ_freev 114659
Total time 406996

API profiling (ARM with Linux to DSP)
The frames are allocated and transferred (put) from ARM to DSP and on the DSP side the received (get) and freed one after the other. The same procedure is repeated from DSP to ARM. The APIs are profiled during the above transfers.

  • ARM side
API Average (cycles)
FrameQ_alloc 14591
FrameQ_put 20093
FrameQ_get 16266
FrameQ_free 17521
Total time 68586
  • DSP side
API Average (cycles)
FrameQ_alloc 5769
FrameQ_put 20888
FrameQ_get 12928
FrameQ_free 8590
Total time 48174


API profiling (ARM with Linux to Video-M3)
The frames are allocated and transferred (put) from ARM to Video-M3 and on the Video-M3 side the received (get) and freed one after the other. The same procedure is

repeated from Video-M3 to ARM. The APIs are profiled during the above transfers.

  • ARM side
API Average (cycles)
FrameQ_alloc 13874
FrameQ_put 22246
FrameQ_get 16325
FrameQ_free 17820
Total time 70383
  • Video-M3 side
API Average (cycles)
FrameQ_alloc 16813
FrameQ_put 33065
FrameQ_get 21094
FrameQ_free 22940
Total time 93913

API profiling (Inter Ducati)

  • Frame transfer using SFQ between in Video-M3 and VPSS-M3 with Notify Enabled
API Average (cycles)
FrameQ_alloc 11419
FrameQ_put 19793
FrameQ_get 13600
FrameQ_free 16239
Total time 61051

RingIO

Data transfer from ARM(Linux) to DSP
The numbers are captured while transfering 1Kbytes of data from ARM to DSP.

  • ARM
APIs Cycles
Create() 6636604
Open() 5469069
Acquire() 23920
Release() 19255
SetAttributes() 16146
Close() 623235
Delete() 668205
  • DSP
APIs Cycles
Create() 118618
Open() 93666
Acquire() 4202
Release() 11662
SetAttributes() 7965
Close() 16393
Delete() 31478

Data transfer from ARM(Linux) to Video-M3
The numbers are captured while transfering 1Kbytes of data from ARM to Video-M3.

  • ARM
APIs Cycles
Create() 6883339
Open() 5830978
Acquire() 16504
Release() 23800
SetAttributes() 19255
Close() 625986
Delete() 679328


  • Video-M3
APIs Cycles
Create() 226870
Open() 174207
Acquire() 7302
Release() 29939
SetAttributes() 13672
Close() 36177
Delete() 73272

Proc Manager

DSP
The time taken to load and start the DSP image from ARM(Linux) is captured. The size of the DSP image is 11.4 MB and the file is loaded from a ramdisk

APIs Cycles
Proc load 4863534
Proc start 25714


Video-M3
The time taken to load and start the Video-M3 image from ARM(Linux) is captured. The size of the Video-M3 image is 8.7 MB and the file is loaded from a ramdisk

APIs Cycles
Proc load 8538842
Proc start 14352