Intel 13th Gen Core RaptorLake (i9 13900K(F)) Review & Benchmarks – Hybrid Top-End Performance

What is “RaptorLake”?

It is the “next-generation” (13th gen) Core architecture, replacing the current “AlderLake” (12th gen) and thus the 3rd generation “hybrid” (aka “big.LITTLE”) arch that Intel has released. As before it combines big/P(erformant) “Core” cores with LITTLE/E(fficient) “Atom” cores in a single package and covers everything from desktops, laptops, tablets and even (low-end) servers.

  • Desktop (S) (65-125W rated, up to 253W turbo)
    • 8C (aka big/P) + 16c (aka LITTLE/E) / 32T total (13th Gen Core i9-13900K(F))
    • 2x as many LITTLE/E cores as ADL
  • High-Performance Mobile (H/HX) (45-55W rated, up to 157W turbo)
    • 8C + 16c / 32T total
  • Mobile (P) (20-28W rated, up to 64W turbo)
    • 4/6C + 8c / 28T total
  • Ultra-Mobile/ULV (U) (9-15W rated, 29W turbo)
    • 2C + 8c / 12T total

For best performance and efficiency, this does require operating system scheduler changes – in order for threads to be assigned on the appropriate physical core/thread. For compute-heavy/low-latency this means a “big/P” core; for low compute/power-limited this means a “LITTLE/E” core.

In the Windows world, this means “Windows 11” for clients and “Windows Server vNext” (note not the recently released Server 2022 based on 21H2 Windows 10 kernel) for servers. The Windows power plans (e.g. “Balanced“, “High Performance“, etc.) contain additional settings (hidden), e.g. prefer (or require) scheduling on big/P or LITTLE/E cores and so on. But in general, the scheduler is supposed to automatically handle it all based on telemetry from the CPU.

Windows 11 also gets updated QoS (Quality of Service) API (aka functions) allowing app(lications) like Sandra to indicate which threads should use big/P cores and which LITTLE/E cores. Naturally, these means updated applications will be needed for best power efficiency.

Intel Core i9 13900K(F) (RaptorLake) 8C + 16c

Intel Core i9 13900K(F) (RaptorLake) 8C + 16c

General SoC Details

  • 10nm+++ (Intel 7+) improved process
  • Unified 36MB L3 cache (vs. 30MB on ADL thus 20% larger)
  • PCIe 5.0 (up to 64GB/s with x16 lanes) – up to x16 lanes PCIe5 + x4 lanes PCIe4
    • NVMe SSDs may thus be limited to PCIe4 or bifurcate main x16 lanes with GPU to PCIe5 x8 + x8
  • PCH up to x12 lanes PCIe4 + x16 lanes PCIe3
    • CPU to PCH DMI 4 x8 link (aka PCIe4 x8)
  • DDR5/LP-DDR5 memory controller support (e.g. 4x 32-bit channels) – up to 5600Mt/s (official, vs. 4800Mt/s ADL)
    • XMP 3.0 (eXtreme Memory Profile(s)) specification for overclocking with 3 profiles and 2 user-writable profiles (!)
  • Thunderbolt 4 (and thus USB 4)

big/P(erformance) “Core” core

  • Up to 8C/16T “Raptor Cove” (!) cores – improved from “Golden Cove” in ADL 😉
  • Disabled AVX512! in order to match Atom cores (on consumer)
    • (Server versions ADL-EX support AVX512 and new extensions like AMX and FP16 data-format)
    • Single FMA-512 unit (though disabled)
  • SMT support still included, 2x threads/core – thus 16 total
  • L1I remains at 32kB
  • L1D remains at 48kB
  • L2 increased to 2MB per core (almost 2x ADL) like server parts (ADL-EX)

LITTLE/E(fficient) “Atom” core

  • Up to 16c/16T “Gracemont” cores – thus 2x more than ADL but same core
  • No SMT support, only 1 thread/core – thus 16 total (in 4x modules of 4x threads)
  • AVX/AVX2 support – first for Atom core, but no AVX512!
    • (Recall that “Phi” GP-GPU accelerator w/AVX512 was based on Atom core)
  • L1I still at 64kB
  • L1D still at 32kB
  • L2 4MB shared by 4 cores (2x larger than ADL)

As with ADL, RPL’s big “Raptor Cove” cores have AVX512 disabled which may prove to be a (big) problem considering AMD’s upcoming Zen4 (Ryzen 7000?) will support it. Even Centaur’s “little” CNS CPU supported AVX512. Centaur has now has been bought by Intel, possibly in order to provide a little AVX512-supporting core. We may see Intel big Core + ex-Centaur LITTLE core designs.

While some are not keen on AVX512 due to relatively large power required to use (and thus lower clocks) as well as the large number of extensions (F, BW, CD, DQ, ER, IFMA, PF, VL, BF16, FP16, VAES, VNNI, etc.) – the performance gain cannot be underestimated (2x if not higher). Most processors no longer need to “clock-down” (AVX512 negative offset) and can run at full speed – power/thermal limits notwithstanding. Now that AMD and ex-Centaur support AVX512, it is no longer an Intel-only server-only instruction set (ISA).

RPL using the very same, “Gracemont” Atom cores as ADL – with no changes except 2x larger cluster L2 (4MB vs. 2MB) which is welcome especially in light of the big cores also getting a L2 cache size upgrade. While AVX2 support for Atom cores was a huge upgrade, tests have shown them not to be as power efficient as Intel would like to make us believe – which is why RPL will have more of them but lower clocked where the efficiency is greater.

As we hypothesized in our article (Mythical Intel 12th Gen Core AlderLake 10C/20T big/P Cores (i11-12999X) – AVX512 Extrapolated Performance) – ADL would have been great if Intel could have provided a version with only 10 big cores (replacing the 2x little cores cluster) that could have been an AVX512 SIMD-performance monster, trading blows with 16-core Zen3 (Ryzen 5950X). With RPL having space for 2 extra clusters – Intel could have had 10C + 8c or even 12C (big) AVX512-supporting cores that could go against Zen4…

Alas, what we are getting across SKUs is the same number of big cores (be they 8, 6, 4 or 2) and 2x clusters of Little cores (and thus 2x more little cores) but presumably at lower clocks in order to improve power efficiency. One issue with ADL across SKUs is that while TDP (on paper) is reasonable – turbo power has blown way past even AVX512-supporting “RocketLake” (!) despite the new efficiency claims. Thus, while disappointing, it is clear Intel is trying to bring power under control.

Changes in Sandra to support Hybrid

Like Windows (and other operating systems), we have had to make extensive changes to both detection, thread scheduling and benchmarks to support hybrid/big-LITTLE. Thankfully, this means we are not dependent on Windows support – you can confidently test AlderLake on older operating systems (e.g. Windows 10 or earlier – or Server 2022/2019/2016 or earlier) – although it is probably best to run the very latest operating systems for best overall (outside benchmarking) computing experience.

  • Detection Changes
    • Detect big/P and LITTLE/E cores
    • Detect correct number of cores (and type), modules and threads per core -> topology
    • Detect correct cache sizes (L1D, L1I, L2) depending on core
    • Detect multipliers depending on core
  • Scheduling Changes

    • All Threads (MT/MC)” (thus all cores + all threads – e.g. 32T
      • All Cores (MC aka big+LITTLE) Only” (both core types, no threads) – thus 24T
    • “All Threads big/P Cores Only” (only “Core” cores + their threads) – thus 16T
      • big/P Cores Only” (only “Core” cores) – thus 8T
      • LITTLE/E Cores Only” (only “Atom” cores) – thus 16T
    • Single Thread big/P Core Only” (thus single “Core” core) – thus 1T
    • Single Thread LITTLE/E Core Only” (thus single “Atom” core) – thus 1T
  • Benchmarking Changes
    • Dynamic/Asymmetric workload allocator – based on each thread’s compute power
      • Note some tests/algorithms are not well-suited for this (here P threads will finish and wait for E threads – thus effectively having only E threads). Different ways to test algorithm(s) will be needed.
    • Dynamic/Asymmetric buffer sizes – based on each thread’s L1D caches
      • Memory/Cache buffer testing using different block/buffer sizes for P/E threads
      • Algorithms (e.g. GEMM) using different block sizes for P/E threads
    • Best performance core/thread default selection – based on test type
      • Some tests/algorithms run best just using cores only (SMT threads would just add overhead)
      • Some tests/algorithms (streaming) run best just using big/P cores only (E cores just too slow and waste memory bandwidth)
      • Some tests/algorithms sharing data run best on same type of cores only (either big/P or LITTLE/E) (sharing between different types of cores incurs higher latencies and lower bandwidth)
    • Reporting the Performance Contribution & Ratio of each thread
      • Thus the big/P and LITTLE/E cores contribution for each algorithm can be presented. In effect, this allows better optimisation of algorithms tested, e.g. detecting when either big/P or LITTLE/E cores are not efficiently used (e.g. overloaded)

As per above you can be forgiven that some developers may just restrict their software to use big/Performance threads only and just ignore the LITTLE/Efficient threads at all – at least when using compute heavy algorithms.

For this reason we recommend using the very latest version of Sandra and keep up with updated versions that likely fix bugs, improve performance and stability.

But is it RaptorLake or AlderLake-Refresh?

Unfortunately, it seems that not all CPUs labelled “13th Gen” will be “RaptorLake” (RPL); some middle-range i5 and low-range i3 models will instead come with “AlderLake” (Refresh) ADL-R cores that is likely to confuse ordinary people into buying these older-gen CPUs.

What is more confusing is that the ID (aka CPUID) of these 13th Gen ADL-R/RPL models is the same (e.g. 0B067x) and does not match the old ADL (e.g. 09067x). However, the L2 cache sizes are the same as old ADL (1.25MB for big/Core and 2MB for LITTLE/Atom cluster) not the larger RPL (2MB for big/Core and 4MB for LITTLE/Atom cluster).

Note: There is still a possibility these are actually RPL cores but with L2 cache(s) reduced (part disabled/fused off) in order not to outperform higher models.

CPU (Core) Performance Benchmarking

In this article we test CPU core performance; please see our other articles on:

Hardware Specifications

We are comparing the Intel with competing desktop architectures as well as competitors (AMD) with a view to upgrading to a top-of-the-range, high performance design.

Specifications Intel Core i9 13900K(F) 8C+16c/32T (RPL) Intel Core i9 12900K(F) 8C+8c/24T (ADL) AMD Ryzen 9 7900X 2M/12C/24T (Zen4) AMD Ryzen 9 5900X 2M/12C/24T (Zen3) Comments
Arch(itecture) Raptor Cove + Gracemont / RaptorLake Golden Cove + Gracemont / AlderLake Zen4 / Raphael Zen3 / Vermeer The very latest arch
Modules (CCX) / Cores (CU) / Threads (SP) 8C+16c / 32T 8C+8c / 24T 2M / 12C / 24T 2M / 12C / 24T 8 more (2x) LITTLE cores!
Rated/Turbo Speed (GHz) 3.0 – 5.8GHz [+12%] / 2.2 – 4.3GHz [+10%]
3.2 – 5.2GHz / 2.4 – 3.9GHz 4.7 – 5.6GHz 3.7 – 4.8GHz 12% big Core, 10% Atom Turbo
Rated/Turbo Power (W)
125 – 253W [PL2][+5%] 125 – 241W [PL2] 170 – 230W [PTT] 105 – 144W [PTT]
5% higher Turbo power
L1D / L1I Caches 8x 48/32kB + 16x 32/64kB
8x 48/32kB + 8x 32/64kB 12x 32kB 8-way / 12x 32kB 8-way 12x 32kB 8-way / 12x 32kB 8-way Same L1D/L1I caches
L2 Caches 8x 2MB + 4x 4MB (32MB) [+2.3x]
8x 1.25MB + 2x 2MB (14MB) 12x 1MB 16-way (12MB) 12x 512kB 16-way (6MB) L2 is over 2x larger!
L3 Cache(s) 36MB 16-way [+20%] 30MB 16-way 2x 32MB 16-way (64MB) 2x 32MB 16-way (64MB) L3 is 20% larger
Microcode (Firmware) 0B0671-10B [B0 stepping] 090672-1E [C0 stepping] A20F10-1003 8F7100-1009 Revisions just keep on coming.
Special Instruction Sets VNNI/256, SHA, VAES/256 VNNI/256, SHA, VAES/256 AVX512, VNNI/512, SHA, VAES/512 AVX2/FMA, SHA AVX512 still MIA
SIMD Width / Units
256-bit 256-bit 512-bit (as 2x 256-bit) 256-bit Same SIMD units
Price / RRP (USD)
$599 $589 $549 $549 Price is a little higher?

Disclaimer

This is an independent review (critical appraisal) that has not been endorsed nor sponsored by any entity (e.g. Intel, etc.). All trademarks acknowledged and used for identification only under fair use.

The review contains only public information and not provided under NDA nor embargoed. At publication time, the products have not been directly tested by SiSoftware but submitted to the public Benchmark Ranker; thus the accuracy of the benchmark scores cannot be verified, however, they appear consistent and pass current validation checks.

And please, don’t forget small ISVs like ourselves in these very challenging times. Please buy a copy of Sandra if you find our software useful. Your custom means everything to us!

SiSoftware Official Ranker Scores

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets. “RaptorLake” (RPL) does not support AVX512 – but it does support 256-bit versions of some original AVX512 extensions.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 11 x64 (22H2), latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations.

Native Benchmarks Intel Core i9 13900K(F) 8C+16c/32T (RPL) Intel Core i9 12900K(F) 8C+8c/24T (ADL) AMD Ryzen 9 7900X 2M/12C/24T (Zen4) AMD Ryzen 9 5900X 2M/12C/24T (Zen3) Comments
CPU Arithmetic Benchmark Native Dhrystone Integer (GIPS) 1,060 [+53%] 694 906 631 RPL is 53% faster than ADL!
CPU Arithmetic Benchmark Native Dhrystone Long (GIPS) 1,041 [+48%] 703 944 623 A 64-bit integer workload RPL is 48% faster.
CPU Arithmetic Benchmark Native FP32 (Float) Whetstone (GFLOPS) 727 [+47%] 496 510 399 With floating-point, RPL is 47% faster
CPU Arithmetic Benchmark Native FP64 (Double) Whetstone (GFLOPS) 398 [+3%] 385 437 334 With FP64 RPL is only 3% faster.
With non-SIMD code, we see huge performance uplift in both integer (old’ Dhrystone) and floating-point (old’ Whetstone) of 38% over ADL that help push even past AMD’s new Zen4. The faster big Cores + 8 more Little Atom cores greatly help here.

Thus for normal, non-SIMD code – RPL will perform much better and provide a great upgrade over ADL and cement Intel’s domination in some workloads (Cinebench?)…

BenchCpuMM Native Integer (Int32) Multi-Media (Mpix/s) 2,930 [+36%] 2,158 3,943* 2,823 RPL is 36% faster than ADL here.
BenchCpuMM Native Long (Int64) Multi-Media (Mpix/s) 1,082 [+35%] 801 1,203* 914 With a 64-bit, RPL is 35% faster.
BenchCpuMM Native Quad-Int (Int128) Multi-Media (Mpix/s) 201 [+34%] 150 335** 173 Using 64-bit int to emulate Int128 RPL is 34% faster.
BenchCpuMM Native Float/FP32 Multi-Media (Mpix/s) 3,228 [+43%] 2,258 3,458* 2,555 In this floating-point vectorised test RPL is 43% faster
BenchCpuMM Native Double/FP64 Multi-Media (Mpix/s) 1,650 [+36%] 1,213 1,873* 1,313 Switching to FP64 RPL is 36% faster
BenchCpuMM Native Quad-Float/FP128 Multi-Media (Mpix/s) 79 [+39%] 56.66 74.16* 54.64 Using FP64 to mantissa extend FP128 RPL is 39% faster
With heavily vectorised SIMD workloads – RPL sees similar improvement, it is around 37% faster than ADL across all tests with minor variations. For older software just using AVX2/FMA3, RPL just flies past ADL as well as older CPU (Zen3, Zen2, etc.)

Unfortunately, AMD’s Zen4 supports AVX512 – which allows it to beat RPL in 5 tests and lose just 1. This shows just how much software can gain from AVX512 even when not executed full width (as Zen4 splits it into 2x 256-bit). Intel will need to find a solution for future arch as more and more software will start supporting AVX512.

Note:* using AVX512 instead of AVX2/FMA.

Note:** using AVX512-IFMA52 to emulate 128-bit integer operations (int128).

BenchCrypt Crypto AES-256 (GB/s) 34 [+4%] 31.84 28.9*** 20.32 RPL is just 4% faster.
BenchCrypt Crypto AES-128 (GB/s) 31.9 29*** 20.4 What we saw with AES-256 just repeats with AES-128.
BenchCrypt Crypto SHA2-256 (GB/s) 54 [+64%] 33.08* 43** 32.03* With SHA, RPL is 64% faster than ADL.
BenchCrypt Crypto SHA1 (GB/s) 38.2* The less compute-intensive SHA1 does not change things due to acceleration.
As streaming tests (crypto/hashing) are memory bound, RPL won’t beat ADL with the same memory speed – it would need much faster DDR5 to feed all the 24 cores!

But with SHA, RPL does manage to beat ADL by a huge 64% and thus even AVX512-enabled AMD Zen4 which is a pretty impressive. The extra Little Atom cores can help here with SIMD integer workloads.

Note***: using VAES 256-bit (AVX2) or 512-bit (AVX512)

Note**: using SHA HWA not SIMD (e.g. AVX512, AVX2, AVX, etc.)

Note*: using AVX512 not AVX2.

BenchFinance Black-Scholes float/FP32 (MOPT/s) 532 The standard financial algorithm.
BenchFinance Black-Scholes double/FP64 (MOPT/s) 735 [+58%] 464 581 453 Switching to FP64 code, RPL is 58% faster
BenchFinance Binomial float/FP32 (kOPT/s) 219 Binomial uses thread shared data thus stresses the cache & memory system;
BenchFinance Binomial double/FP64 (kOPT/s) 228 [+47%] 155 172 127 With FP64 code RPL is 47% faster.
BenchFinance Monte-Carlo float/FP32 (kOPT/s) 415 Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches
BenchFinance Monte-Carlo double/FP64 (kOPT/s) 308 [+50%] 205 245 176 Here RPL is 50% faster.
AMD’s Zen always did well on non-SIMD floating-point algorithms – but here RPL shows the times are changing; with 52% improvement over ADL, it has no problem dispatching even the latest Zen4 and all of its improvements.
BenchScience SGEMM (GFLOPS) float/FP32 679 In this tough vectorised algorithm that is widely used (e.g. AI/ML).
BenchScience DGEMM (GFLOPS) double/FP64 437 [-2%] 446 523* 235 We need to do some optimisation here.
BenchScience SFFT (GFLOPS) float/FP32 26.74 FFT is also heavily vectorised but stresses the memory sub-system more.
BenchScience DFFT (GFLOPS) double/FP64 22.47 [-20%] 28.72 19.02* 12.75 With FP64 code, RPL is memory latency bound
BenchScience SNBODY (GFLOPS) float/FP32 791 N-Body simulation is vectorised but fewer memory accesses.
BenchScience DNBODY (GFLOPS) double/FP64 300 [+32%] 227 462* 344 With FP64 RPL is 32% faster.
It seems we need to do some optimizations here as the large number of Atom cores seem to cause some performance degradation for some reason. At least in N-Body we’re back to the 32% improvement over ADL which we’re expecting.

Here, faster DDR5 memory will make a big difference, we’ll need to see what speed is the “sweet-spot” for RPL, likely DDR5-6400 for that many cores.

Note*: using AVX512 not AVX2/FMA3.

CPU Image Processing Blur (3×3) Filter (MPix/s) 8,359 [+44%] 5,823 9,413* 4,961 In this vectorised integer RPL is 44% faster.
CPU Image Processing Sharpen (5×5) Filter (MPix/s) 3,211 [+41%] 2,275 3,969* 1,893 Same algorithm but more shared data 41% faster.
CPU Image Processing Motion-Blur (7×7) Filter (MPix/s) 1,615 [+45%] 1,117 2,060* 946 Again same algorithm but even more data shared – 45% faster
CPU Image Processing Edge Detection (2*5×5) Sobel Filter (MPix/s) 2,598 [+35%] 1,926 3,169* 1,554 Different algorithm RPL is 35% faster.
CPU Image Processing Noise Removal (5×5) Median Filter (MPix/s) 241 [+54%] 157 862* 176 Still vectorised code RPL is 54% faster.
CPU Image Processing Oil Painting Quantise Filter (MPix/s) 109 [+37%] 79.78 64.45 54.6 This test has always been tough RPL is 37% faster.
CPU Image Processing Diffusion Randomise (XorShift) Filter (MPix/s) 6,704 [+10%] 6,082 4,050* 2,208 With integer workload, RPL is 10% faster.
CPU Image Processing Marbling Perlin Noise 2D Filter (MPix/s) 1,455 [+43%] 1,016 1,027* 656 In this final test we see RPL is 43% faster
These tests love SIMD vectorised compute, thus here RPL is again 38% faster than ADL – and this even allows it to beat the AVX512-enabled Zen4 in 3 out of 8 tests.

The test also showed how much Zen4 benefits from AVX512 and in effect how much RPL misses by not having AVX512 enabled. With AMD on board, AVX512 adoption is likely to increase, thus Intel had better bring support to Atom somehow, soon…

Note*: using AVX512 not AVX2/FMA3.

Intel RaptorLake 13900K(F) (8C + 16c) Inter-Thread/Core HeatMap Latency (ns)

Intel RaptorLake 13900K(F) (8C + 16c) Inter-Thread/Core HeatMap Latency (ns)

The inter-thread/core/module latencies “heat-map” shows how the latencies vary when transferring data off-thread (same L1D), off-core (same L3 for big Cores but same L2 for Little Atom cores) and different-core-type (big Core to Little Atom).

Still, judicious thread-pair scheduling is needed to keep latencies low (and conversely bandwidth high when large data is transferred.

CPU Multi-Core Benchmark Total Inter-Thread Bandwidth – Best Pairing (GB/s) 144 [+29%] 111.9 166* 144 29% more bandwidth than ADL.
With double L2 (either big Cores or Little Atom Cluster) and much bigger L3 cache, RPL has 29% more inter-core bandwidth than ADL.

However, the multi-CCX Zen competition has almost double L3 (64MB vs. 36MB) and thus Zen4 with just 12C has higher overall bandwidth – not to mention the 3D-VCache variants.

Note:* using AVX512 512-bit wide transfers.

CPU Multi-Core Benchmark Average Inter-Thread Latency (ns) 39.7 [+3%] 38.5 44.4 46.2 Overall latencies are 3% higher.
CPU Multi-Core Benchmark Inter-Thread Latency (Same Core) Latency (ns) 10.4 [-5%] 11 8.2 9.7 Inter-Thread (big Core) latency is 5% lower.
CPU Multi-Core Benchmark Inter-Core Latency (big Core, same Module) Latency (ns) 33.5 [+3%] 32.4 17.2 20.8 We see Inter-big-Core latency 3% higher.
CPU Multi-Core Benchmark Inter-Core (Little Core, same Module) Latency (ns) 45.3 [+6%] 42.9 We see Inter-Little-Atom latency 6% higher.
CPU Multi-Core Benchmark Inter-Module/CCX Latency (ns) 69.9 70.5 n/a
Due to the increased number of Little Atom cores (16 vs. 8 on ADL), the overall latency of RPL is naturally going to be higher than ADL.

We see the Inter-Thread of big Cores RPL latency (same L1D) 5% lower on RPL. However the Inter-big-Core or Inter-Little-Atom core latencies 3-6% higher, possibly due to L3 cache settings that need to be checked.

Aggregate Score (Points) 22,920 [+35%] 17,000 23,010* 13,800 Across all benchmarks, RPL is 35% faster than ADL!
Across all the benchmarks – RPL ends up a good 35% faster over ADL (!) which allows it to catch up with Zen4 – it is just 4% slower! As Sandra’s benchmark scores get significant uplift from AVX512, this means RPL is at a significant disadvantage versus Zen4.

Despite its split (2x) 256-bit AVX512 implementation, we’ve seen Zen4 get significant uplift from AVX512 – and here Intel needs to implement a solution for future arch(itecture)s (“MeterorLake” MTL?) as more and more software will add AVX512 support now that AMD is on board.

Note*: using AVX512 instead of AVX2/FMA3.

Price/RRP (USD) $599 $589 $549 $549 Price is almost the same.
Price Efficiency (Perf. vs. Cost) (Points/USD) 38.91 [+35%] 28.86 41.91 25.14 Overall 35% more performance for the price
With the price almost the same, the bang-for-buck has also increased by the same amount +35%, that makes RPL much better value than ADL. It also allows it to almost catch Zen4 which remains about 8% better value, not a lot in it.
Power/TDP – Turbo (W) 125 – 253W [PL2] [+5%] 125 – 241W [PL2] 170 – 230W [PTT] 105 – 144W [PTT]
Turbo is just 5% higher than ADL
Power Efficiency (Perf. vs. Power) (Points/W) 90.59 [+28%] 70.54 100 95.83 RPL is 28% more efficient than ADL.
With turbo power at least 5% higher than ADL, RPL is thus +28% more power efficient than ADL. This is still short of Zen4 and even the much older Zen3 – Intel still has some work to do here.

Final Thoughts / Conclusions

Summary: Much faster and battling for the top (Intel i9 13900K(F)): 8.5/10

Like every revision (ex-“tock”) Intel arch(itecture), 13th Gen(eration) “RaptorLake” (RPL) is an improved 12th Gen “AlderLake” (ADL), that ends up much faster and thus more efficient (both price-wise and even power-wise), thus returns Intel to competitiveness against AMD’s latest Zen4 – battling for the top spot.

Intel has managed to increase the clocks of the big Cores and almost double L2 cache (2MB/core vs. 1.25MB/core); it has also managed to double (16c vs. 8c) the number of Little Atom cores – as well as double their cluster L2 cache (4MB vs. 2MB). This means the 13900K(F) has twice (24C/c) the number of real cores than AMD’s 7900X (12C) and the same number of threads as AMD’s 7950X (32T).

Across all benchmarks – we see RPL (13900K(F)) 35% faster than ADL (12900K(F)) – not bad from an evolution architecture update! RPL also benefits from more mature software (e.g. Windows 11 22H2) and microcode/BIOS. However, power usage has gone up as well – though this seems to be happening to the competition too (AMD Zen4).

It is disappointing that Intel was not able to enable AVX512 on RPL – but that was very unlikely as the Atom cores are unchanged from ADL. AMD has shown with Zen4 (and even VIA/Centaur) that you don’t need a full-width 512-bit implementation to benefit from AVX512 – and Intel should consider it for future Atom cores. Sandra’s benchmark results gain significant uplift from AVX512 and here RPL is at a distinct disadvantage versus Zen4.

Talking power efficiency, to aggressively save power – you could use the large number of Little Atom cores (16!) to handle just about any workload – and keep the big Cores (8) parked. Effectively, unless gaming/benchmarking/etc. – you’d just be using an ultra efficient many threaded Atom system  – but still be able to crank it up when needed.

If you already upgraded to Socket 1700 for the new technologies (DDR5, PCIe 5.0, Thunderbolt/USB 4.o, etc.) and want more, then RPL is a nice upgrade. As the next Intel core arch “MeteorLake” MTL will use a different socket, RPL does not have a great upgrading potential and may be an idea to wait for discounts from Intel or AMD before making a choice…

Summary: Much faster and battling for the top (Intel i9 13900K(F)): 8.5/10

Further Articles

Please see our other articles on:

Disclaimer

This is an independent review (critical appraisal) that has not been endorsed nor sponsored by any entity (e.g. Intel, etc.). All trademarks acknowledged and used for identification only under fair use.

The review contains only public information and not provided under NDA nor embargoed. At publication time, the products have not been directly tested by SiSoftware but submitted to the public Benchmark Ranker; thus the accuracy of the benchmark scores cannot be verified, however, they appear consistent and pass current validation checks.

And please, don’t forget small ISVs like ourselves in these very challenging times. Please buy a copy of Sandra if you find our software useful. Your custom means everything to us!

Tagged , , , , , , , , , . Bookmark the permalink.

Comments are closed.