Cortex-A8 Neon Architecture
From Texas Instruments Embedded Processors Wiki
Translate this page to
Contents |
NEON Block Diagram
NEON Hardware Features
- 16-Entry Instruction queue
- Dual view register file
- 32 x 64-bit
- 16 x 128-bit
- 6 Stage execution Pipeline
- Integer
- Single precision floating point
- Load store permute
- Non-pipelined IEEE vector floating point support
- 12 –Entry load data queue
NEON Interface Diagram
Skewed late in pipeline, past the retire point
- reduces interface complexity
- exception handling not required
- decoupling queues from integer machine
- removes load-use penalty
- negative impact on NEON -> ARM transfers
- nonblocking ARM register file helps hide latency
Streaming to and from L2 memory system
- up to 8 outstanding transactions
- can receive 128 bits/cycle
- can receive data from L1 or L2 memory system
- independent NEON store buffer
NEON Media Engine Unit
- Instruction issue
- static scheduling with fire-and-forget issue
- 1 LS + 1 NINT/NFP can issue each cycle
- Execution pipelines
- All pipelines are 64-bit SIMD
- Floating-point MAC executed using both FADD and FMUL pipelines
Data Movement: NEON and Integer Unit
- Treated similar to Loads/Store and thus part of Load store permute pipeline
- Uses VMOV instructions
- Separate 32 bit buses between Load Store Unit and NEON
- Load data is loaded in the NEON Load Data Queue
- NEON stalls in M2 if load data queue is not valid
- Neon sends store data along with the integer logical register address
Neon System Registers
FPEXC Register
- Accessed through MRS/MSR instructions
- Setting the EN bit to 1 activates the NEON and VFP coprocessor. Reset clears EN to 0
- Accessible in privileged modes only
The cp10 and cp11 fields in the CP15- c1 ‘Coprocessor Access Control Register’ control access to the NEON and VFP coprocessor
- Reset clears the cp10 and cp11 fields and disables the NEON and VFP clocks
- Accessible in privileged modes only
Leave a Comment






