PRU Read Latencies

From Texas Instruments Wiki
Jump to: navigation, search

Introduction

The PRU is a scalar processor, processing each instruction sequentially. With the exception of memory read instructions, all PRU instructions execute in a single cycle. However, the execution time of PRU read instructions varies based on memory access latencies. Subsequent instructions will not execute until the completion of the read instruction and may impact time-sensitive operations and applications.

This article discusses the hardware latencies associated with PRU-initiated memory reads.


PRU Instruction Execution Time

PRU Write Instruction

The PRU write instruction is a fire-and-forget command that executes in ~1 cycle.

PRU Read Instruction

The PRU read instruction executes in ~2 cycles, plus additional latencies due to traversing through interconnect layers and variable processing loads.

As discussed in the ARM + PRU SoC Architecture section below, MMRs that are "closer" to the PRU (i.e. within the PRU subsystem) will have lower access latencies.

Other PRU Instructions

All other PRU instructions execute in a single cycle.


Device-specific Read Latency Values

The read latency values at the following links are considered "best-case," accounting for the 2 cycle instruction and interconnect introduced latency.


ARM + PRU SoC Architecture

The PRU Subsystem is a master initiator with access to local subsystem resources, in addition to all SoC resources. Figure 1 (below) shows a generic view of the SoC architecture, highlighting how the PRU subsystem fits into the overall SoC.

Local PRU Subsystem Resources

The PRU accesses local subsystem resources using a Local Memory Map. This Local Memory Map only uses the local 32-bit Interconnect Bus located inside the subsystem to access local resources. Therefore, the close proximity of these resources to the PRU core, coupled with the PRU Local Memory map that ensures the access path is confined to the PRU subsystem, minimizes access latencies.

SoC Resources

To access SoC resources outside of the PRU Subsystem, the PRU accesses must go through external layer(s) of interconnects. In other words, the PRU must go through the local 32-bit Interconnect Bus and varying levels of L3/L4 interconnects external to the subsystem before reaching the resource. This access path is much longer than the PRU's access path to local subsystem resources, causing longer access latencies. Additionally, the access latency for external resources will be indeterministic, as they varies based on system processing loads.

ARM PRU SOC block diag.png