APX (Advanced Performance eXtensions): Talk in a decade or so

What is APX?

It is a future update to the x64 (EM64T/AMD64) architecture (ISA) by Intel called “Advanced Performance eXtensions” that proposes:

32 vs. 16 64-bit general purpose registers (GPP) [aka 2x current]
3-operand instruction format
conditional (aka predicated) instructions
optimised register save/restore instructions

As it is a somewhat fundamental change to the existing x64 ISA, it requires complete re-compilation of existing code and naturally compiler and tools update. As a result, we don’t feel it will be useful in the short term.

Intel itself sees a more immediate improvement though run-time engines (e.g. JVM for Java, CLR for .Net) that run non-native byte-code programs (apps) and can internally be easily modified to generate different x64 code using APX.

Is it a good idea?

While x86/x64 architecture has a huge legacy overhead and could probably do with a redesign – one major strength is its compatibility with even old code. Any updates from 16-bit to 32-bit and currently 64-bit have been extensions that have maintained hardware compatibility so that older programs and entire operating systems could run on new hardware (drivers permitting) with no changes.

While the instruction set may be considered CISC (complex ISA) – internally x86/x64 CPUs have long been RISC (reduced ISA) with instructions decoded into CPUs’ micro-instructions (RISC); in effect it could be considered a “high-level ISA” with the internal micro-instructions the “low-level ISA”. Thus, while somewhat unwieldy, it is not anything that cannot be solved internally, within the CPU.

Are more registers a good idea?

x86/x64 – always had relatively few (GPP) registers compared to other (RISC) architectures; when AMD extended x86 to x64, it doubled the number (16 from original 8) and naturally extended them to 64-bit (from 32-bit) thus effectively they are 4-times bigger. At that time, it would not have made (economic) sense to add 32, 64 or perhaps 256 even though other architectures do have that many registers.

As mentioned, modern x86/x64 CPUs can and do have more (physical) internal registers (2x, 4x, 8x) that are mapped to the “logical” GPP registers using a technique called “register shadowing”. While not perfect, this technique has long allowed performance optimisations despite the relatively low number of registers in x86/x64 ISA.

While APX would allow updated compilers to generate improved code using the additional registers (32), but any (low-level) assembler code would need to be redesigned and re-written to take advantage of them. As we’re taking general-purpose code, entire programs or operating systems would need to be recompiled for the updated ISA and would not run on older hardware. While developers would be likely to have code-paths for performance crucial code – these days that is reserved for compute vectorised code (e.g. AVX, AVX2, AVX512, etc.) and not general purpose code.

Conclusion: unlikely to be useful for at least a decade after introduction, when all processors have supported it for many years and Windows XX mandates support for it. Perhaps we’ll also see optimised Linux distributions specifically compiled for APX processors.

But most likely nobody will use it and it will be abandoned…

Are 3-operand instructions a good idea?

x86/x64 – historically had 2-operand instructions, i.e. the result is stored into one of the registers used (e.g. A+B -> A) most likely to compensate for its instruction complexity and low number of registers. While flexibility is always appreciated, x86/x64 assembler-code programmers have learned to deal with it and in many cases registers do need to be reused that makes 3-op into 2-op (store to same register) instruction.

Register shadowing also helps here, with “logical” registers that are destructively (over)-written re-mapped internally to other physical registers, in effect transforming 2-op into 3-op internal instructions. Thus it is something modern CPUs already optimise – or shall we say mititgate – internally while maintaining compatibility.

This requires full recompilation to use and any current (low-level) assembler code would need a complete redesign and re-write. High-performance compute code (likely vectorised) may thus be recompiled to use APX but unlikely for old code.

Conclusion: unlikely again to be useful in the short term but good for future flexibility once significant support.

How about Conditional instructions?

Predication (aka conditional operations) are always appreciated as jumps will always be performance killers in any ISA. High-performance code (especially vectorised) must make extensive use of predication and while compilers have always tried to use predication (e.g. IA64) high-level code support is also needed for best performance.

Thus, these instructions are welcome though again they apply to general-purpose code and thus would not have a big effect on heavily vectorised code.

What about Sandra’s Java and .Net benchmarks?

With updates to run-time engines (i.e. JVM for Java, CLR for .Net), these benchmarks are likely the first to see benefits from APX as they don’t require any recompilation being architecture-agnostic byte-code and compiled to machine x64 native code when run.

Operating systems like Android, ChromeOS, etc. that use byte-code compiled apps will also see a performance improvement from APX once their run-time engines are updated.

Will Sandra be using APX?

As mentioned above, we do not have any plans to include specialised code paths for existing benchmarks to use APX in the short term.

The bulk of the benchmarks are heavily vectorised (AVX/AVX2/AVX512) and would not significantly benefit from APX.

However, legacy code benchmarks (e.g. Dhrystone / Whetstone) are likely to benefit and it may indeed be an interesting performance test that we may look into once APX is broadly supported.

Let’s meet again in 10 years (2033) and talk! 😉