What Intel needs is SVE-like variable width SIMD-AVX to solve hybrid problem

What are AVX, AVX512?

AVX or “Advanced Vector eXtensions” is a family of SIMD instructions on x86/x64 that have expanded the old SSE/2 (Streaming SIMD Extensions) that themselves have extended/expanded the original SIMD x86 instruction set: MMX (Multi-Media eXtensions). AVX2 increases the width to 256-bit while AVX512 increases the width to 512-bit. [Note: this is a simplification, there are many sub-extensions/sets in AVX512 that extend the original foundation (AVX512-F) instruction set]

What are SVE, SVE2?

SVE or “Scalar Vector Extensions” is a family of SIMD instructions on ARM that have replaced the original SIMD ARM instruction set: NEON. Unlike most SIMD instruction sets, SVE/2 are variable width that are dependent on the actual implementation – with the same code designed to run on different widths depending on implementation (assuming no bugs in the code 😉

For current mobile/tablet/phone SoCs, SVE/2 width is the same as NEON, 128-bit, but future implementations could go as high as 2048-bit! Thus, unlike on x86, there will be no need for SVE-256, SVE-512 or higher for a core to provide a wider SIMD implementation.

What is Intel’s hybrid problem?

Intel’s current hybrid designs “AlderLake” (ADL, 12th gen), soon to be released “RaptorLake” (RPL, 13th gen), etc. contain big/P “Core” cores that support AVX512 while the LITTLE/E “Atom” cores do not support it. To keep the instruction set support the same (“homogeneity”), AVX512 is disabled for Core cores, thus missing out on up to 40% performance on compute-intensive SIMD algorithms. This can be observed by disabling the LITTLE/E Atom cores – when certain BIOSes (ASUS?) then allow AVX512 to be enabled on the big/P Core cores.

Until Intel is able to deploy Atom cores that support AVX512 somehow – native, emulation/microcode, etc. – AVX512 will be missing even from flagship Intel hybrid designs.

Now that the competition (AMD) is releasing AVX512-supporting CPUs (Zen4, Series 7000) – this potentially leaves Intel with a serious issue. We will have to wait to see how they perform, but in compute-intensive SIMD algorithms Zen4 is unlikely to lose to hybrid Intel designs.

How could variable-width AVX (AVX-V?) help Intel hybrid designs?

Introducing a variable-width SIMD instruction set like SVE/2 – shall we call it AVX-V for “variable” – would allow Intel to solve the issue of mismatched instruction sets without forcing parity between cores. This way, the big/P “Core” cores could implement 512-bit width AVX while the LITTLE/E “Atom” cores could implement 256-bit or even 128-bit width SIMD.

There would be no need for “Atom” cores supporting AVX512 and thus 512-bit units (or even 256-bit units), thus the LITTLE/E cores could really be made very small and very efficient (even though not particular powerful).

Meanwhile, the (few) “Core” cores could implement even wider-width SIMD, e.g. 1024/2048/4096-bit especially in workstation/server versions for even higher SIMD performance, e.g. “Gold” 2048-bit, “Silver” 1024-bit and “Bronze”/normal 512-bit.

How could variable-width AVX (AVX-V?) help Intel workstation/server designs?

Currently, workstation and server designs contain only big/P “Core” cores in order to support AVX512. Including “Atom” LITTLE/E cores would not make sense as compute performance would be seriously impacted. At the same time, especially considering the current energy crisis – including LITTLE/E cores to handle low compute, I/O, low utilisation tasks (be they processes, VMs, etc.) might save significant amounts of energy and thus operating cost.

Thus, variable-width AVX would not only help current hybrid designs but also allow hybrid use in new markets, even workstation/server that traditionally might have shunned such designs.

Intel must now “copy” SVE as they copied big.LITTLE

Pablo Picasso is often quoted as saying: “Good artists copy, great artists steal”. Intel has done well bringing hybrid designs to the x86-world that have been around in ARM for many, many years (as “big.LITTLE” or “DynamiQ”). Now, the solution to the many SIMD instruction sets of different width (MMX, SSE/2, AVX/2, AVX512) is a single, variable-width SIMD instruction set (e.g. AVX-V) that is implementation dependent, allowing for different cores to run the same SIMD code at native width for best performance (and efficiency).

This would allow very small/efficient “Atom” LITTLE/E(efficient) cores to implement narrow width (e.g. 128-bit) SIMD while the big/P(erformance) “Core” cores implement much wider (e.g. 512-bit, 1024-bit or even wider in workstation/server) SIMD with the corresponding performance uplift.

It is time for x86/x64 to unify all the SIMD instruction sets into a single, variable-width (implementation) dependent one. Let’s hope we get one – if not, ARM designs are now ready to take on both Intel and AMD.

It is time for x86/x64 to change or die.

Tagged , , , , , , , . Bookmark the permalink.

Comments are closed.