Cortex-A8 Features
From Texas Instruments Embedded Processors Wiki
DM357 DVSDK FAQ > Fbtest > DSP BIOS FAQ > Texas Instruments Embedded Processors Wiki:About > Cortex-A8 Features
Cortex-A8 Highlights
- Dual-issue, in-order, superscalar architecture delivering high performance
- First implementation of the ARMv7 instruction-set architecture, including the advanced SIMD media Instructions (NEON™)
- Advanced dynamic Branch prediction
- Integrated, 256 KB unified L2 cache
- Dedicated, low-latency, high-BW interface to L1 cache
- NEON™ : 64/128-bit Hybrid SIMD Engine for Multimedia
- Supports both Integer and Floating Point SIMD
- Enhanced VFPv3 – doubles number of double-precision registers and new instructions to convert between fixed and floating point
- Efficient Run Time Compilation Target
- Jazelle-RCT: Target for Java. Memory footprint reduced up to 3x
- Can also target languages such as Microsoft .NET MSIL, Perl, Python
Superscalar Cortex-A8 Core
- In-order dual instruction issue
- less complex than out-of-order
- fewer structures means lower power
- less need for custom design
- can maintain high IPC with
- fully symmetric ALU pipelines
- all critical forwarding paths supported
- dual-issue of dependent instruction pairs
- Static scheduling with instruction replay on memory stall
- low-power consumption due to early availability of gate enables
- fire-and-forget instruction issue removes critical paths from the design
- Net result
- high-frequency design with out-of-order performance, but in-order clock frequency and power consumption
- Average CPI of 0.9 across 150+ ARM and industry benchmarks
Cortex-A8 Technologies
| Cortex-A8 Technologies | Description |
| TrustZone Security | Device Integrity / Secure Transactions |
| Jazelle RCT Acceleration / Thumb 2EE Instruction Set | Fast & Responsive Java Applications |
| Thumb-2 Instruction Set | Greater Performance With Less Code Size |
| NEON™ Advanced SIMD(+VFPv3) | Enhanced Multimedia Experience |
| Superscalar ARMv7 Core | Highest-performance mobile processor |
NEON: Advanced SIMD
- 64/128-bit Hybrid SIMD architecture
- A single instruction performs the same operation on multiple elements that are packed within registers
- Independent Register file with 2 aliased views:
- 32 x 64-bit registers (D0-D31)
- 16 x 128-bit registers (Q0-Q15)
- Integer and SP Floating-point processing
- 8, 16, 32, 64-bit Integers
- Single-precision Floating-point
- Encoded in ARM and Thumb-2
- Accelerates audio, video, and 3D-graphics
NEON: SIMD Instructions
- NEON™ Instructions are based on “Packed SIMD” processing
- Registers are considered as vectors of elements of the same data type
- Data types can be: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single prec. float
- Instructions perform the same operation in all lanes
SIMD Load/Store Structure
- Native support for structures
- e.g. complex numbers, pixels, coordinates
- Memory treated as an array of structures (AoS)
- Eliminates ‘shuffling’ overhead
- Optimised memory access as single transfer
- Data arranged for efficient SIMD processing
Key NEON Capabilites
- Two Integer 64-bit ALUs operating in parallel
- Can perform 128-bit length equivalent ALU operation in 1 cycle
- 64-bit datapath with data types up to 128 bits
- Supports 128-bit data streaming from both L1D$ and L2$
- Byte permute function allows for on-the-fly data shuffling
- Two Integer Multipliers of 32x16
- Each can perform one 32x16, two 16x16 or four 8x8 operations in a single pass
- Support 32x32 operation in two passes
Thumb-2 Instruction Set
- Combined 32 and 16 bit instruction set:
- 16 bit instructions include the original Thumb instruction set
- Some new 16 bit instructions for key code size wins
- Virtually all instructions available in ARM ISA available in Thumb-2
- In principle can stand-alone as a complete ISA
- Unified assembly language for ARM and Thumb-2 targeted to either ISA
- Conditional execution made available via IT instruction
Jazelle RCT Acceleration
- Beneficial to Java and a wide range of emerging languages
- Microsoft .NET MSIL, Perl, Python etc
- Enables high performance in smallest memory footprint
- Optimal balance between speed and code density with run-time compilers
- Low cost and low power
- Less than 8K gates and small memory footprint result in lower power
Thumb-2EE: Basis of Jazelle RCT
- Thumb-2EE(Thumb2 Execution Environment) is a variant of Thumb-2 with instructions to support JIT and AOT runtime compilers
- Targets any OO bytecode language such as Java and MS .NET IL
- 16-bit instructions for common AOT/JIT compilation routines
- Smaller code
- Smaller code size means recompiled methods can be kept in memory
- Less recompilation means faster performance and no start up delays
Memory System on Cortex-A8
- Harvard Level 1 Caches – both 16KByte, 4 way set associative
- single-cycle load-use penalty
- Virtual index Physically tagged(VIPT)
- Level 1 Data cache is blocking
- Non-Neon read misses cache cause replay of subsequent instructions
- Reduces complexity in later pipeline stages
- Good for power and clock frequency
- Neon data not allocated to L1 (but will read/update in L1 if necessary)
- Integrated 256 KB unified Level 2 Cache, 8-way set associative
- Dedicated low latency, high bandwidth interface to the Level-l cache
- Line length of 64 bytes
- Physically index Physically tagged(PIPT)
- Minimum latency of 8 cycles
- Streams to the Neon processing unit; up to 16GByte/s bandwidth
- 128-bit data streaming from both L1D$ and L2$
- 64 bit AMBA AXI interconnect to external memory
- Supports multiple outstanding memory transactions to minimize memory latencies
TrustZone Security
- TrustZone adds a parallel world to run secure OS and applications�
- Normal and Secure worlds have different memory views, enforced by hardware�
- Memory tagged as secure and non-secure by the system
- Only the secure CPU can access the secure memory & peripherals
- Secure Monitor is a software “gatekeeper” between the two worlds
- Device integrity, Digital Rights Management, Electronic payment, etc
OMAP ARM Cores Performance Dhrystome V2.1
Leave a Comment



