The 64-bit ARM architecture launched recently boasts of several technical advancements compared to its predecessor. Here are some benefits offered by ARMv8A compared to ARMv7A.
Each instruction in A64 is defined with a fixed length of 32-bit. This is because the hardware has a decoding structure with contiguous bit fields for operands and immediate values. This not only simplifies the decoding table in the hardware but also provides JIT compilers with important acceleration techniques which are important to high performance applications. The independent decode also permits advanced branch prediction techniques too. The number of general-purpose registers has also increased. The virtual rename register pooling, which was introduced in the Cortex-A9 delivered an automated process of unrolling small loops, but did not provide complete benefit to the compiler to provide improved scheduling options. Hence implementing commonly used complex algorithms for software codes becomes a tough ask. The A64 ISA therefore presented thirty one 64-bit general purpose registers.
The ISA is also simplified now, than before Compared to the original RISC goals of the ARM ISA, the new version has removed the LDM/STM (load/store multiple) instructions that drove cost complexity in implementing an efficient processor’s memory system. The implementation complexity did not provide a relative benefit, which led to lesser conditional instructions. Also, the floating point unit is here to stay, at least in the near future. Hence, there would be no future checks aimed to check for its existence when providing the software with underlying hardware consistency.
The SIMD data engine’s instruction set has been revised in the new 64-bit world. It introduces double precision floating data processing to the existing SIMD capability with a simplified approach to address targeted algorithms aligned with the latest IEEE 754-2008 standard.
Advanced SIMD constitutes media and signal processing architecture that includes instructions targeted primarily at multimedia elements like audio, video, 3-D graphics, image, and even speech processing. Floating-point performs single-precision and double-precision FP operations.
Advanced SIMD and its associated implementations, along with the support software, are collectively known as NEON.
The fundamentals of MMU remains the same in AArch64, wherein 64KB minimum page size is supported along with 4KB legacy page size.
A 32-bit application will support 4GB address space. Virtual address spaces from 232 to 248 bytes in size are supported from the top and bottom of the 64-bit address space.
The ARM Hardware Debug support can be segregated into 2 basic categories:
Halting Debug view is not backwards compatible with ARMv7
ARM CortexA-57 is actually an implementation of ARMv8, 64 bit architecture, which supports 1 to 4 cores per cluster for multiple clusters. With Level 1 cache support of 32KiB for Data and 48KiB for Instructions, these processors also feature low latency configurable L2 cache (upto 2MB). For each of the cores, DSP and NEON SIMD extensions are mandatory, thus driving 20-50% better performance in floating point calculations.
Similarly, the Cortex-A53 CPU uses a simple pipeline in smaller configuration that targets efficient operating points for delivering high performance. It is one of the most power-efficient processors compared to any of its predecessors. A53 has several important features including Virtualization and a high memory reach of up to 256 TB. It is highly scalable, which means that you can even set up a CPU in combination with the A57.