Mythical Intel 12th Gen Core AlderLake 40C/40T E-Cores (i11-12911E) – AVX2 Extrapolated Performance ;)

What is this “AlderLake-E”?

After considering whether the recently released AlderLake (ADL) would have been better with 10 big/P(erformance) Cores and AVX512 rather than hybrid – someone asked why not the other way round? Why not just use LITTLE/E(fficient) Atom cores and no big/P cores?

Note that Intel had such exact products in the “Phi” line of (GP-GPU) accelerators, i.e. many cores with the initial version of AVX512 and 4-threads per Atom core. Later products increased the number of cores (e.g. 7000 series) to 61C / 244T, that despite being used in many supercomputers, has been discontinued by Intel in 2018.

Thus while many-core (MIC) wide (aka SIMD/VLIW) do have their uses, they don’t always make sense for general purpose computing. But since ADL’s Atom cores are far more advanced (than old “Phi”) perhaps it has a chance to perform well.

  • Intel says that 4x LITTLE/E Atom cluster = 1x big/P Core die size wise. We replace 8x Cores cores with 32 LITTLE/E Atom cores.
  • We thus have 40 Core / 40 Threads ADL-E(fficient) (no SMT, no AVX512)
  • Phi had up to 61C thus 40C is not a crazy amount. AMD ThreadRipper even has 32C big cores / 64.
  • Intel has sold server CPUs with many Atom cores (16C) for quite some time now
  • Intel claims “Gracemont” Atom core compute equivalent to old ” Skylake” (SKL) big Core
  • We’ll call this i11 since must be better than 9 – it must go to 11 (yes it’s a This is Spinal Tap reference 😉
  • Still Gen 12 but need to increase product number from 900 to… 911?
  • Must replace boring K(F) with E (efficiency) we end up with the mythical “Intel Core i11-12911E
  • It may not sound as cool as the 12999X, but it is imagined for efficiency
Mythical ADL-E with 40 LITTLE/P Atom cores / 40 Threads and NO big Cores

Mythical ADL-E with 40 LITTLE/P Atom cores / 40 Threads and NO big Cores

What would be the advantages of this mythical “AlderLake-E”?

With many Atom cores rather than few big “Core” cores we have somewhat different advantages:

  • No hybrid architecture required, thus no software changes, current software would just work.
  • No need for Windows 11! With some luck even Windows 7 might work just fine.
  • Many core designs lend themselves to server work (serving many clients) or converged virtualisation (hosting many server VMs each with a few dedicated cores). Not gaming, but still plenty of uses both home/office.
  • AVX2/FMA is still fine even for vectorised heavy compute code (after all AMD Zen3 only has AVX2/FMA3).
  • With VNNI/256 and SHA, VAES/256 a formidable crypto / AI (Artificial Intelligence / ML (Machine Learning) accelerator.
  • 40 Atom cores of Skylake (SKL) Core power should have considerable raw compute power far beyond old Atom cores.

Hardware Specifications

We are comparing the mythical top-of-the-range Gen 12 Intel with competing architectures as well as competitors (AMD) with a view to upgrading to a top-of-the-range, high performance design.

Specifications Mythical Intel Core i11-12911E 40C/40T (ADL-E) Projected
Intel Core i9-12900K 8C+8c/24T (ADL) Intel Core i9-11900K 8C/16T (RKL) AVX512
AMD Ryzen 9 5950X 16C/32T (Zen3) Comments
Arch(itecture) Gracemont / AlderLake-E Golden Cove + Gracemont / AlderLake Cypress Cove / RocketLake Zen3 / Vermeer Still very latest arch
Cores (CU) / Threads (SP) 40C / 40T 8C+8c / 24T 8C / 16T 2M / 16C / 32T More cores and threads than anybody
Rated Speed (GHz) 2.4 3.2 big / 2.4 LITTLE 3.5 3.4 Base as E-cores
All/Single Turbo Speed (GHz)
3.9 5.0 – 5.2 big / 3.7 – 3.9 LITTLE 4.8 – 5.3 4.9 Turbo as E-cores
Rated/Turbo Power (W)
150-300? 125 – 250 125 – 228 105 – 135 We increase TDP a little
L1D / L1I Caches 40x 32kB / 40x 64kB 8x 48kB/32kB + 8x 64kB/32kB 8x 48kB 12-way / 8x 32kB 8-way 16x 32kB 8-way / 16x 32kB 8-way Many L1D caches
L2 Caches 10x 2MB (20MB)
8x 1.25MB + 2x 2MB (14MB) 8x 512kB 16-way (4MB) 16 512kB 16-way (8MB) Huge L2 cache
L3 Cache(s) 30MB 16-way 30MB 16-way 16MB 16-way 2x 16MB 16-way (32MB) Keep L3 the same
Microcode (Firmware) 090672-0F [updated] 090672-0F [updated] 06A701-40 8F7100-1009 Same microcode
Special Instruction Sets
VNNI/256, SHA, VAES/256 VNNI/256, SHA, VAES/256 AVX512, VNNI/512, SHA, IFMA52, VAES/512 AVX2/FMA, SHA Still modern ISA support
SIMD Width / Units
256-bit 256-bit 512-bit (1x FMA) 256-bit Same width
Price / RRP (USD)
$799? $599 $539 $799 Let’s match AMD 5950X

Disclaimer

This is an independent article that has not been endorsed nor sponsored by any entity (e.g. Intel). All trademarks acknowledged and used for identification only under fair use.

This article contains speculation with extrapolated results; there is NO such product! It also makes some assumptions and over-simplifications.

And please, don’t forget small ISVs like ourselves in these very challenging times. Please buy a copy of Sandra if you find our software useful. Your custom means everything to us!

Assumptions

  • Scaling from 8c/8t to 40c/40t (5x) would be linear in the compute heavy SIMD algorithms tested
  • The product would not be power limited though its power use may be higher with more cores
  • The product would maintain similar base/turbo clocks despite having more cores

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets. Mythical “AlderLake” (ADL-E) does not support AVX512 just like normal ADL.

Note we only present those benchmarks that are compute heavy, vectorised and using AVX2/FMA3. Benchmarks that are memory latency or bandwidth sensitive are harder to extrapolate as the scaling is not linear, thus we would not expect all (benchmark) scores to just increase by the same ratio.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 11 x64, latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations.

Native Benchmarks Mythical Intel Core i11-12911E 40C/40T (ADL-E) Projected
Intel Core i9-12900K 8C+8c/24T big+LITTLE (ADL) Intel Core i9-11900K 8C/16T (RKL) AVX512
AMD Ryzen 9 5950X 16C/32T (Zen3) Comments
BenchCpuMM Native Integer (Int32) Multi-Media (Mpix/s) 679** [-60%] 1,699 1,688* 2,394 Atom cores no match for big Cores
BenchCpuMM Native Long (Int64) Multi-Media (Mpix/s) 258** [-63%] 695 569* 998 With a 64-bit, ADL-E still 63% slower.
BenchCpuMM Native Quad-Int (Int128) Multi-Media (Mpix/s) 51** [-61%] 131 236*/*** 198 Using 64-bit int to emulate Int128 still slow
BenchCpuMM Native Float/FP32 Multi-Media (Mpix/s) 633** [-69%] 1,981 1,774* 2,515 In this floating-point vectorised test ADL-E is 69% slower than ADL.
BenchCpuMM Native Double/FP64 Multi-Media (Mpix/s) 357** [-69%] 1,126 998* 1,470 Switching to FP64 nothing much changes.
BenchCpuMM Native Quad-Float/FP128 Multi-Media (Mpix/s) 18** [-66%] 53.2 43.68 65.79 Using FP64 to mantissa extend FP128 we’re 66% slower.
With heavily vectorised SIMD workloads – even with AVX2/FMA3 – the Atom cores are no match for big Cores from either Intel or AMD. They may be Skylake equivalent with legacy/non-SIMD code but even 40 of them (40C / 40T) won’t cut it.

It seems probable that AVX2/FMA3 are executed by 128-bit SIMD units similar to old AMD Zen cores (thus 1/2 rate). With full rate AVX512 and Hyper-threading (the “Phi” accelerator Atom cores supported 4-threads each) the performance would be much higher – we saw Zen3 double SIMD performance – but at this time it is low.

Some may be disappointed, but let’s remember they are still efficient, LITTLE, Atom cores not meant for top-end SIMD performance.

Note:* using AVX512 instead of AVX2/FMA.

Note:** extrapolated results based on ADL with 8 LITTLE/E Atom cores 8T (but big/P Cores disabled).

Note:*** using AVX512-IFMA52 to emulate 128-bit integer (int128) operations.

Note we only present those benchmarks that are compute heavy, vectorised and using AVX2/FMA3. Benchmarks that are memory latency or bandwidth sensitive are harder to extrapolate as the scaling is not linear, thus we would not expect all (benchmark) scores to just increase by the same ratio.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

Summary: Would be useful for some workloads

ADL has been designed for efficiency – not performance at any cost regardless of power usage (like RKL); however, changes to support the hybrid architecture and the loss of AVX512 are non-trivial. All in the name of efficiency.

With Intel claiming that the 4-LITTLE/E Atom cluster takes the same die space as 1 big/P Core, perhaps an efficient version (ADL-E) with 40 LITTLE/E Atom cores / 40T could be made. Just like the now defunct “Phi” accelerators, many-core/threads wide SIMD CPUs are not for everybody. But while the “Phi” Atom cores were very weak and optimised for SIMD/AVX512 – the far more powerful ADL “Gracemont” Atom cores are similar in power to old Skylake (SKL) big Core and support AVX2/FMA3, VNNI/256, SHA, VAES/256.

Extrapolating from 8c/8t to 40c/40t – thus 5x higher performance best case – is still not sufficient for heavy vectorised compute algorithms. While the Atom cores have improve considerably, even 40 of them cannot match 8 big Cores. They may match SKL in legacy/non-vectorised code but heavy compute is just not for them. Again, AVX512 and Hyperthreading would likely help here but, alas, they are not available.

With just 2 memory channels (even high speed DDR5), 40 cores would be starved for data. Algorithms that could fit their datasets in the 10x L2 caches (20MB total) or unified L3 cache (30MB) would perform well but anything latency or bandwidth sensitive would likely incur penalties.

As we mentioned, this is not your usual or “gaming” CPU – but an efficient, many core/thread version – without the cost of comparable server CPUs. It could still have a place in your home/office but handling different tasks to your usual PC. If the price were right, we’d take one.

Summary: Would be useful for some workloads

Further Articles

Please see our other articles on:

Disclaimer

This is an independent article that has not been endorsed nor sponsored by any entity (e.g. Intel). All trademarks acknowledged and used for identification only under fair use.

This article contains speculation with extrapolated results; there is NO such product! It also makes some assumptions and over-simplifications.

And please, don’t forget small ISVs like ourselves in these very challenging times. Please buy a copy of Sandra if you find our software useful. Your custom means everything to us!

Tagged , , , , , , . Bookmark the permalink.

Comments are closed.