PRU Read Latencies
The PRU is a scalar processor, processing each instruction sequentially. With the exception of memory read instructions, all PRU instructions execute in a single cycle. However, the execution time of PRU read instructions varies based on memory access latencies. Subsequent instructions will not execute until the completion of the read instruction and may impact time-sensitive operations and applications.
This article discusses the hardware latencies associated with PRU-initiated memory reads.
PRU Instruction Execution Time
PRU Write Instruction
The PRU write instruction is a fire-and-forget command that executes in ~1 cycle.
PRU Read Instruction
The PRU read instruction executes in ~2 cycles, plus additional latencies due to traversing through interconnect layers and variable processing loads.
As discussed in the ARM + PRU SoC Architecture section below, MMRs that are "closer" to the PRU (i.e. within the PRU subsystem) will have lower access latencies.
Other PRU Instructions
All other PRU instructions execute in a single cycle.
Device-specific Read Latency Values
The read latency values at the following links are considered "best-case," accounting for the 2 cycle instruction and interconnect introduced latency.
ARM + PRU SoC Architecture
The PRU Subsystem is a master initiator with access to local subsystem resources, in addition to all SoC resources. Figure 1 (below) shows a generic view of the SoC architecture, highlighting how the PRU subsystem fits into the overall SoC.
Local PRU Subsystem Resources
The PRU accesses local subsystem resources using a Local Memory Map. This Local Memory Map only uses the local 32-bit Interconnect Bus located inside the subsystem to access local resources. Therefore, the close proximity of these resources to the PRU core, coupled with the PRU Local Memory map that ensures the access path is confined to the PRU subsystem, minimizes access latencies.
To access SoC resources outside of the PRU Subsystem, the PRU accesses must go through external layer(s) of interconnects. In other words, the PRU must go through the local 32-bit Interconnect Bus and varying levels of L3/L4 interconnects external to the subsystem before reaching the resource. This access path is much longer than the PRU's access path to local subsystem resources, causing longer access latencies. Additionally, the access latency for external resources will be indeterministic, as they varies based on system processing loads.