PRU Linux-based Example Code

From Texas Instruments Wiki
Jump to: navigation, search

Overview

The PRU_SW Project contains example code demonstrating basic tasks executed by the PRU Subsystem. This page describes each example and illustrates the interaction between the ARM, PRU, peripherals, and memory. Note these examples use the ARM processor and the PRU Application Linux Loader to initialize the PRU and may vary from the examples on the DSP.

Building and installing examples

The following describes how to build and run the PRU Linux-based examples.

1. Build the UIO kernel driver.
2. Compile the PRUSSDRV user space library.
3. Compile the eDMA driver.
$host cd pru_sw/trunk/peripheral_lib/eDMA_driver
$host vi Rules.make
Edit Rules.make file and make sure variables are set properly. KERNEL_DIR should point to the kernel source directory.
Example path to this directory: /home/<user>/OMAP_L138_arm_x_xx_xx_xx/DaVinci-PSP-SDK-03.20.00.12/src/kernel/linux-03.20.00.12
$host cd module
$host make
$host cp edma.ko <filesystem>
4. Compile the eDMA user space library.
$host cd pru_sw/trunk/peripheral_lib/eDMA_driver/interface
$host make
5. Compile example applications
$host cd pru_sw/trunk/example_apps
$host make (pass in variables)
$host cp bin/* <filesystem>
6. Run example applications on target platform.

Example descriptions

PRU_access_const_table

The PRU uses the internal constant table to write data into HPI registers from external DDR memory. This example demonstrates accessing both fixed entries and programmable entries in the constant table. As an initialization step, the ARM writes a value 0x00000001 in the 1st memory location of the external DDR memory and the PRU configures the PSC (Power and Sleep Controller) to enable the HPI clock. Note the PRU_pscConfig instruction set could be combined with PRU_access_const_table since memory space on the PRU is not an issue. The PRU then reads this value from DDR using entry 31 of the constant table. The entry 31 in the constant table corresponding to the DDR is a programmable entry. The example illustrates how the 31st entry of the constant table is programmed using the constant table programmable pointer register 1(CTPPR_1). Using the constant tables the value 0x00000001 is fetched from the DDR memory and is placed in the PWREMU_MGMT register in the HPI region. This sets the FREE bit of the PWREMU_MGMT. The value in the HPIC is then read from the location 0x01E10030 by using an offset(0x30) along with the 15th entry in the constant table (base address for UHPI configurations). Value of HPIC is then stored in the second location of the DDR memory. Thus this example illustrates use of both fixed and programmable entries in the constant table.

The diagram below illustrates the basic interaction between the ARM, PRU, and peripherals.

"test title"

PRU_ARMtoPRU_Interrupt

The host ARM interrupts the PRU by generating a system event that the PRU is polling. The PRU polls for interrupts by writing into its SRSR2 registers. Once the PRU completes the event handling, it informs the ARM that the interrupt was serviced by loading a score value from external DDR memory which was originally stored by the ARM, decrementing the score value by 1, and writing the new score value in shared L3 memory. The ARM polls for this flag in L3 memory to verify the example executed successfully.

The diagram below illustrates the basic interaction between the ARM, PRU, and memory.

PRU ARMtoPRU Interrupt.png

PRU_edmaConfig

This example code demonstrates EDMA configuration by the PRU. In the initialization, the PRU configures the PSC (Power and Sleep Controller) to enable the EDMA clocks. Note this instruction set (PRU_pscConfig) could be combined with PRU_edmaConfig, as memory space is not an issue. The ARM allocates a DMA channel and stores the source address, destination address, and channel number into PRU DRAM. The PRU0 configures and initializes the EDMA. The PRU1 polls for the EDMA interrupt notifying the completion of the EDMA transfer and sets a flag in memory. The ARM verifies the values stored at the destination address and the interrupt flag.

The diagram below illustrates the basic interaction between the ARM, PRU, eDMA, and memory.

PRU edmaConfig.png

PRU_gpioToggle

The PRU controls the GPIO output by writing to R30. The PRU configures the pin mux registers for PRU_R30[30, 31] outputs as an initialization step. The example toggles bits 30-31 of R30 by inverting each bit. The ARM compares the original and final values of R30 to verify that bits 30 and 31 were toggled. The PRU then resets the pin mux registers to their original values.

PRU_mem1DTransfer

The PRU executes a simple 1-D byte array system memory transfer. As an initialization step, the ARM writes an array of random byte-sized values into a source address in L3 memory. The ARM then stores the source address, destination address, and number of bytes to transfer into PRU's DRAM. The PRU then transfers the bytes from the source address to the destination address. The ARM compares the values at the source and destination address to verify the transfer was successful.

The diagram below illustrates the basic interaction between the ARM, PRU, and memory.

PRU mem1DTransfer.png



PRU_memAccessL3andDDR

The PRU reads three values from external DDR memory and stores these values in shared L3 using programmable constant table entries. The example initially loads 3 values into the external DDR RAM. The PRU configures its Constant Table Programmable Pointer Register 1 (CTPPR_1) to point to appropriate locations in the DDR memory and the L3 memory. The values are then read from the DDR memory and stored into the L3 memory using the values in the 30th and 31st entries of the constant table.

The diagram below illustrates the basic interaction between the ARM, PRU, and memory.

PRU memAccessL3andDDR.png

PRU_memAccessPRUDataRam

The PRU reads and stores values into the PRU Data RAM memory. PRU Data RAM memory has an address in both the local data memory map and global memory map. The example accesses the local Data RAM using both the local address through a register pointed base address and the global address pointed by entries in the constant table.

The diagram below illustrates the basic interaction between the ARM, PRU, and memory.

PRU memAccessPRUDataRam.png

PRU_memCopy

This example executes optimized memory to memory transfers on the PRU. The following three cases are tests:

Case 1: Source and destination addresses are aligned.
Case 2: Source and destination addresses are not aligned though the offsets of source and destination addresses from next aligned addresses are same.
Case 3: Source and destination addresses are not aligned, and the offsets of source and destination addresses from next aligned addresses are not same.

Note for 32 bit processor, aligned addresses mean that the addresses are in multiple of 4. For example, 0x0000, 0x0004, 0x0008, etc are the aligned addresses where as 0001, 0002, etc are not aligned addresses. They have some offset from the aligned address.

Below are example addresses for each case:

Case 1: Source address – 0x0004 and destination address – 0x0010 (both are aligned)
Case 2: Source address – 0x0001 and destination address – 0x0011 (not aligned but the offset from aligned address is same i.e. 1)
Case 3: Source address – 0x0001 and destination address – 0x0012 (not aligned and the offset from aligned address is also not same i.e. 1 for source address and 2 for destination address)

The algorithm executed on the PRU is as follows:

   1. Load base address in r10.
   2. Load next 4 bytes (size) in r1.
   3. Load next 4 bytes (source address) in r2.
   4. Load next 4 bytes (destination address) in r3.
   5. Jump to label next if source and destination addresses have the same offset from the aligned address 
   6. Jump to case 1 if source and destination addresses are aligned.
   7. Calculate the number of single byte transfer. (If the size of memory block to transfer is not in multiple of 4, some reminder bytes will be transferred one at a time)
   8. Loop 5 is for 4 byte transfer at a time. Loop 6 is for 1 byte transfer at a time. If the total block is transferred then halt.
   9. If the source and destination are not aligned and offset are same (case 2) then find the total single byte transfers.
  10. First transfer number of offset bytes one at a time (loop 2). Then transfer 4 bytes at a time (loop 3). Finally, transfer reminder bytes one at a time (loop 4). 
         If the total block is transferred then halt.
  11. If source and destination addresses are not aligned and their offsets are also not same (case 3), then transfer all bytes one at a time (loop 1). 
         If the total block is transferred then halt.


Throughput data:

*src_addr & dst_addr are in PRU0 DRAM Block size in bytes src_addr dst_addr PRU total cycles PRU stall cycles PRU total cycles/byte
Case 1 16 0x01c30020 0x01c30040 68 30 ~5
Case 2 16 0x01c30021 0x01c30041 113 54 ~7
Case 3 16 0x01c30021 0x01c30042 208 102 13

 

*src_addr & dst_addr are in PRU0 DRAM Block size in bytes src_addr dst_addr PRU total cycles PRU stall cycles PRU total cycles/byte
Case 1 16 0x80000000 0x80000020 96 58 6
Case 2 16 0x80000001 0x80000021 162 103 ~10
Case 3 16 0x80000001 0x80000022 320 214 20

 

NOTE: PRU is running at 150MHz

PRU_miscOperations

This example demonstrations miscellaneous operations on the PRU, including masking to extract bits, bubble sorting, and thresholding. The ARM initializes the example by storing a 32-bit value into a source address in L3 memory and verifies the results stored in memory after the PRU has completed the instruction set. The three operations are described below.

1. Masking to extract bits:
4 bit values are extracted from a 32 bit number using masks and bit shift operations. In the example a 32 bit value is placed in source location in L3 memory. The PRU reads the source data and extracts 8 four bit datas of a 32 bit number and stores the extracted 4 bits (byte rounded) in PRU1 Data memory.
2. Bubble sorting:
The array of extracted bits is used as the initial array to demonstrate implementation of a bubble sort algorithm. The result of the algorithm is stored in the DDR memory.
3. Thesholding:
The sorted array is then thresholded by applying a cut-off value and converted to an array of zeros and ones. The subsequent thresholded result is stored in PRU0 Data memory.


PRU_multiply

This example uses a common multiply macro to multiply two 16-bit numbers on the PRU. The resulting product is a 32-bit number. The ARM writes two 16-bit numbers into PRU DRAM. The PRU uses a multiply macro to multiply the two 16-bit numbers and store their product into DRAM. The ARM then verifies the result stored in DRAM.

PRU_PRUtoARM_Interrupt

The PRU generates an interrupt to the host ARM. The PRU configures the PRU INTC registers and connects system event 34 to channel 2 which in turn is hooked to the host port 2. Host port 2 generates the SYS_EVT0 for the ARM. PRU follows the event out mapping procedure by writing to its own status register 31 to generate an internal system event. After generating the event, the PRU stores a value into external DDR memory. The ARM clears a flag upon receiving the SYS_EVT0 interrupt and verifies the value in DDR memory.

The diagram below illustrates the basic interaction between the ARM, PRU, and memory.

PRU PRUtoARM Interrupt.png

PRU_PRUtoPRU_Interrupt

This example illustrates how two PRUs can communicate between each other by interrupting each other during a process. In this example code, the PRU0 configures the PRU INTC registers and connects system event 32 to channel 0 which in turn is hooked to the host port 0. The PRU0 then generates a system event 32 by writing into its R31 register which sends an interrupt to PRU1 which is polling for it. On receiving the interrupt, the PRU1 performs certain functionality and sets an external flag in DDR memory. The PRU1 completes its task and interrupts PRU0 once the task is done using system event 33 by first mapping this system event number to channel 1 and channel 1 to host 1 and then writing into its R31 register. The PRU0 polls for the interrupt and acknowledges the completion of task by setting another flag in DDR memory. The ARM checks the flag values in DDR memory to verify the example was successful.

The diagram below illustrates the basic interaction between the ARM, PRU, and memory.

PRU PRUtoPRU Interrupt.png

PRU_semaphore

This example demonstrates the PRU and ARM sharing memory access via a semaphore. The L3 shared memory is accessed by the PRU and ARM in succession using flags and memory address pointers also stored in L3 memory. The PRU accesses the L3 memory location which stores the number of memory reads and the address of the source and destination locations. When the flag is set to 1, the PRU then stores the values from L3 memory (source) into external DDR memory (destination). When finished accessing the memory, the PRU resets the flag to 0 and stores the new source address and the number of reads for the ARM to follow. This process continues with the read size (amount of L3 memory read from the source location) increasing by 0x4 each iteration until the terminating condition (read size = 0x70) is satisfied.

PRU_timer2Interrupt

This example shows the PRU detecting a system event interrupt from a system timer. The Timer2 interrupt within the PRU INTC requires the control signal, PRUSSEVTSEL, to enabled. Refer to the PRUSS System Events [0:31] Assignments Table for more details. The ARM sets this control signal mux by writing to the system configuration module's KICK0R, KICK1R, and CFGCHIP3 registers. The ARM also configures and enables Timer2. The PRU then maps the event for Timer2 to channel 0 and channel 0 to host 0 and then polls for the Timer2 interrupt. Upon detecting the interrupt, the PRU stores a value in the PRU DRAM which the ARM checks to verify the PRU successfully received and processed the interrupt.

The diagram below illustrates the basic interaction between the ARM, PRU, timer, and memory.

PRU timer2Interrupt.png