SysLink 2.10.03.20 qnx DataSheet TI814x

From Texas Instruments Wiki
Jump to: navigation, search

Introduction

The purpose of this document is to provide the performance data for QNX SysLink on the TI814X platform.

Terms and Abbreviations

Abbreviation Description
CCS Code Composer Studio
IPC Inter-Processor Communication
GPP General Purpose Processor e.g. ARM
DSP Digital Signal Processor e.g. C64X
EVM Evaluation Module
API Application Programmable Interface
SFQ Single Frame Queue
MFQ Multiple Frame Queue

Processor Information

Processor core Speed
ARM (Cortex A8) 598 MHz
DSP (C674x) 500 MHz
Video-M3 (Cortex M3) 200 MHz (timer runs at 2x)
VPSS-M3 (Cortext M3) 200 MHz (timer runs at 2x)

Note: Performance numbers for Cortex-A8, DSP and VIDEO-M3 cores are only mentioned for all modules. VPSS-M3 numbers are not published as it is identical with VIDEO-M3.

Setup details

  • EVM and Silicon details
    • TI814X EVM (Rev B)
    • DDR2 interface
    • PG1.0 Silicon
  • Internal memory configuration
    • L1 and L2 cache for DSP is configured as follows:
      • L1D: 32K
      • L1P: 32K
      • L2 : 32K
    • Shared Regions configuration:
      • Cached on the rtos, non-cached on QNX

Build details

The performance numbers were obtained with the following build configurations:

  • IPC product build and SysLink RTOS build (SYS/BIOS side)
    • Release profile
    • Disable asserts
    • Disable logger
    • Non-instrumented
  • Syslink HLOS build (QNX side)
    • Optimized build (SYSLINK_BUILD_OPTIMIZE = 1)
    • Release mode (SYSLINK_BUILD_DEBUG = 0)
    • Disable all the traces (SYSLINK_TRACE_ENABLE = 0)
  • QNX kernel
    • Default configuration

Resource Usage

Notify

  • Total available events = 32
  • Usage by different modules is as follows:
Module Event Ids used
FrameQBufMgr 0
FrameQ 1
MessageQ (TransportShm) 2
RingIO 3
NameServerRemoteNotify 4

Gate Hardware Spinlocks

  • Total number of Gate hardware spinlocks = 64
  • Usage by different modules is as follows:
Module Number of spin locks used
Shared Region 0 1
Frame Queue instance 2
Frame Queue Buffer Manager instance 2
RINGIO instance 2

Note: The Frame Queue, Frame Queue Buffer Manager and RINGIO instances will utilize the above specified Gate Hardware Spinlocks only if the gate type specifed is GateMP_RemoteProtect_SYSTEM.

Performance data

Notify

ARM to DSP round trip time
The time (round trip) taken for a notification to travel from ARM to DSP and back to ARM is measured. Here is the procedure followed to get the round trip

time:

  • On ARM side, send notification from ARM to DSP (Capture the time stamp "T1" before calling Notify send API)
  • On DSP side, in Notify callback function, send notification to ARM
  • On ARM side, receive the notification from DSP (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Round trip time: 1854338 cycles


ARM to Video-M3 round trip time
The time (round trip) taken for a notification to travel from ARM to Video-M3 and back to ARM is measured. Here is the procedure followed to get the round trip

time:

  • On ARM side, send notification from ARM to Video-M3 (Capture the time stamp "T1" before calling Notify send API)
  • On Video-M3 side, in Notify callback function, send notification to ARM
  • On ARM side, receive the notification from Video-M3 (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Round trip time: 122470 cycles

Message Queue

ARM to DSP round trip time
The time (round trip) taken for a message to travel from ARM to DSP and back to ARM is measured. Here is the procedure followed to get the round trip time:

  • Transfer the message from ARM to DSP (Capture the time stamp "T1" before calling put() API on ARM side)
  • Receive the message on the DSP and send the received message back to ARM on another messageQ to ARM
  • Receive the message on the ARM (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Message Size Average Round Trip Cycles
64 bytes 615641
128 bytes 559130
1 KB 563136
10 KB 608465
100 KB 561761

ARM to Video-M3 round trip time
The time (round trip) taken for a message to travel from ARM to Video-M3 and back to ARM is measured. Here is the procedure followed to get the round trip time:

  • Transfer the message from ARM to Video-M3 (Capture the time stamp "T1" before calling put() API on ARM side)
  • Receive the message on the Video-M3 and send the received message back to ARM on another messageQ to ARM
  • Receive the message on the ARM (Capture the time stamp "T2" after get() API on ARM side)
  • Measure the time elapsed "T2-T1"
Message Size Average Round Trip Cycles
64 bytes 562299
128 bytes 561163
1 KB 550040
10 KB 558113
100 KB 548425

Frame Queue

All Frame Queue tests use frame buffers of size 345600 bytes with 2 frame buffers in each frame.

API profiling (DSP)
The frames are allocated and transferred (put) through the frame queue one after the other and after this the frames are received (get) and freed one after the other in the same thread. Profile each API during the transfer of frame within the same processor.

  • Frame transfer using SFQ with in DSP with Notify Disabled
API Average (cycles)
FrameQ_alloc 1537
FrameQ_put 2929
FrameQ_get 2049
FrameQ_free 3348
Total time 9862
  • Frame transfer using MFQ with in DSP with Notify Disabled (16 frame pools and internal queues)
API Average (cycles)
FrameQ_allocv 11241
FrameQ_putv 25124
FrameQ_getv 21206
FrameQ_freev 38195
Total time 95767


API profiling (Video-M3)
The frames are allocated and transferred (put) through the frame queue one after the other and after this the frames are received (get) and freed one after the other in the same thread. Profile each API during the transfer of frame with in the same processor.

  • Frame transfer using SFQ with in Video-M3 with Notify Disabled
API Average (cycles)
FrameQ_alloc 3666
FrameQ_put 6302
FrameQ_get 4948
FrameQ_free 6595
Total time 21509
  • Frame transfer using MFQ with in Video-M3 with Notify Disabled (16 frame pools and internal queues)
API Average (cycles)
FrameQ_allocv 39071
FrameQ_putv 73565
FrameQ_getv 64293
FrameQ_freev 86299
Total time 263228

API profiling (ARM with QNX to DSP)
The frames are allocated and transferred (put) from ARM to DSP and on the DSP side the received (get) and freed one after the other. The same procedure is repeated from DSP to ARM. The APIs are profiled during the above transfers.

  • ARM side
API Average (cycles)
FrameQ_alloc 61594
FrameQ_put 80790
FrameQ_get 72238
FrameQ_free 67454
Total time 282548
  • DSP side
API Average (cycles)
FrameQ_alloc 3743
FrameQ_put 13844
FrameQ_get 9647
FrameQ_free 8787
Total time 36022


API profiling (ARM with QNX to Video-M3)
The frames are allocated and transferred (put) from ARM to Video-M3 and on the Video-M3 side the received (get) and freed one after the other. The same procedure is

repeated from Video-M3 to ARM. The APIs are profiled during the above transfers.

  • ARM side
API Average (cycles)
FrameQ_alloc 63866
FrameQ_put 79713
FrameQ_get 71222
FrameQ_free 67036
Total time 282309
  • Video-M3 side
API Average (cycles)
FrameQ_alloc 11144
FrameQ_put 22232
FrameQ_get 17475
FrameQ_free 19561
Total time 70412

API profiling (Inter Ducati)

  • Frame transfer using SFQ between in Video-M3 and VPSS-M3 with Notify Enabled
API Average (cycles)
FrameQ_alloc 6960
FrameQ_put 12129
FrameQ_get 8422
FrameQ_free 9297
Total time 36808

RingIO

Data transfer from ARM(QNX) to DSP
The numbers are captured while transfering 1Kbytes of data from ARM to DSP.

  • ARM
APIs Cycles
Create() 2125292
Open() 683912
Writer Acquire() 12757
Writer Release() 95480
Reader Acquire() 13554
Reader Release() 93288
SetAttributes() 16146
Close() 440327
Delete() 822050
  • DSP
APIs Cycles
Create() 123574
Open() 106421
Writer Acquire() 3823
Writer Release() 13012
Reader Acquire() 6222
Reader Release() 18284
SetAttributes() 7025
Close() 17022
Delete() 33844

Data transfer from ARM(QNX) to Video-M3
The numbers are captured while transfering 1Kbytes of data from ARM to Video-M3.

  • ARM
APIs Cycles
Create() 1290883
Open() 752882
Writer Acquire() 11162
Writer Release() 90696
Reader Acquire() 14352
Reader Release() 92291
SetAttributes() 15946
Close() 446108
Delete() 829426


  • Video-M3
APIs Cycles
Create() 209626
Open() 158388
Writer Acquire() 7579
Writer Release() 25481
Reader Acquire() 10213
Reader Release() 28678
SetAttributes() 13882
Close() 35678
Delete() 69684

Proc Manager

DSP
The time taken to load and start the DSP image from ARM(QNX) is captured. The size of the DSP image is 9.45 MB and the file is loaded from an SD card (class 4)

APIs Cycles
Proc load 95168112
Proc start 80132


Video-M3
The time taken to load and start the Video-M3 image from ARM(QNX) is captured. The size of the Video-M3 image is 6.8 MB and the file is loaded from an SD card (class 4)

APIs Cycles
Proc load 61843366
Proc start 77142