Intel 13th Gen Core RaptorLake (i9 13900K(F)) Review & Benchmarks – Hybrid Top-End Performance

What is “RaptorLake”?

It is the “next-generation” (13th gen) Core architecture, replacing the current “AlderLake” (12th gen) and thus the 3rd generation “hybrid” (aka “big.LITTLE”) arch that Intel has released. As before it combines big/P(erformant) “Core” cores with LITTLE/E(fficient) “Atom” cores in a single package and covers everything from desktops, laptops, tablets and even (low-end) servers.

Desktop (S) (65-125W rated, up to 253W turbo)
- 8C (aka big/P) + 16c (aka LITTLE/E) / 32T total (13th Gen Core i9-13900K(F))
- 2x as many LITTLE/E cores as ADL
High-Performance Mobile (H/HX) (45-55W rated, up to 157W turbo)
- 8C + 16c / 32T total
Mobile (P) (20-28W rated, up to 64W turbo)
- 4/6C + 8c / 28T total
Ultra-Mobile/ULV (U) (9-15W rated, 29W turbo)
- 2C + 8c / 12T total

For best performance and efficiency, this does require operating system scheduler changes – in order for threads to be assigned on the appropriate physical core/thread. For compute-heavy/low-latency this means a “big/P” core; for low compute/power-limited this means a “LITTLE/E” core.

In the Windows world, this means “Windows 11” for clients and “Windows Server vNext” (note not the recently released Server 2022 based on 21H2 Windows 10 kernel) for servers. The Windows power plans (e.g. “Balanced“, “High Performance“, etc.) contain additional settings (hidden), e.g. prefer (or require) scheduling on big/P or LITTLE/E cores and so on. But in general, the scheduler is supposed to automatically handle it all based on telemetry from the CPU.

Windows 11 also gets updated QoS (Quality of Service) API (aka functions) allowing app(lications) like Sandra to indicate which threads should use big/P cores and which LITTLE/E cores. Naturally, these means updated applications will be needed for best power efficiency.

Intel Core i9 13900K(F) (RaptorLake) 8C + 16c

General SoC Details

10nm+++ (Intel 7+) improved process
Unified 36MB L3 cache (vs. 30MB on ADL thus 20% larger)
PCIe 5.0 (up to 64GB/s with x16 lanes) – up to x16 lanes PCIe5 + x4 lanes PCIe4
- NVMe SSDs may thus be limited to PCIe4 or bifurcate main x16 lanes with GPU to PCIe5 x8 + x8
PCH up to x12 lanes PCIe4 + x16 lanes PCIe3
- CPU to PCH DMI 4 x8 link (aka PCIe4 x8)
DDR5/LP-DDR5 memory controller support (e.g. 4x 32-bit channels) – up to 5600Mt/s (official, vs. 4800Mt/s ADL)
- XMP 3.0 (eXtreme Memory Profile(s)) specification for overclocking with 3 profiles and 2 user-writable profiles (!)
Thunderbolt 4 (and thus USB 4)

big/P(erformance) “Core” core

Up to 8C/16T “Raptor Cove” (!) cores – improved from “Golden Cove” in ADL 😉
Disabled AVX512! in order to match Atom cores (on consumer)
- (Server versions ADL-EX support AVX512 and new extensions like AMX and FP16 data-format)
- Single FMA-512 unit (though disabled)
SMT support still included, 2x threads/core – thus 16 total
L1I remains at 32kB
L1D remains at 48kB
L2 increased to 2MB per core (almost 2x ADL) like server parts (ADL-EX)

LITTLE/E(fficient) “Atom” core

Up to 16c/16T “Gracemont” cores – thus 2x more than ADL but same core
No SMT support, only 1 thread/core – thus 16 total (in 4x modules of 4x threads)
AVX/AVX2 support – first for Atom core, but no AVX512!
- (Recall that “Phi” GP-GPU accelerator w/AVX512 was based on Atom core)
L1I still at 64kB
L1D still at 32kB
L2 4MB shared by 4 cores (2x larger than ADL)

As with ADL, RPL’s big “Raptor Cove” cores have AVX512 disabled which may prove to be a (big) problem considering AMD’s upcoming Zen4 (Ryzen 7000?) will support it. Even Centaur’s “little” CNS CPU supported AVX512. Centaur has now has been bought by Intel, possibly in order to provide a little AVX512-supporting core. We may see Intel big Core + ex-Centaur LITTLE core designs.

While some are not keen on AVX512 due to relatively large power required to use (and thus lower clocks) as well as the large number of extensions (F, BW, CD, DQ, ER, IFMA, PF, VL, BF16, FP16, VAES, VNNI, etc.) – the performance gain cannot be underestimated (2x if not higher). Most processors no longer need to “clock-down” (AVX512 negative offset) and can run at full speed – power/thermal limits notwithstanding. Now that AMD and ex-Centaur support AVX512, it is no longer an Intel-only server-only instruction set (ISA).

RPL using the very same, “Gracemont” Atom cores as ADL – with no changes except 2x larger cluster L2 (4MB vs. 2MB) which is welcome especially in light of the big cores also getting a L2 cache size upgrade. While AVX2 support for Atom cores was a huge upgrade, tests have shown them not to be as power efficient as Intel would like to make us believe – which is why RPL will have more of them but lower clocked where the efficiency is greater.

As we hypothesized in our article (Mythical Intel 12th Gen Core AlderLake 10C/20T big/P Cores (i11-12999X) – AVX512 Extrapolated Performance) – ADL would have been great if Intel could have provided a version with only 10 big cores (replacing the 2x little cores cluster) that could have been an AVX512 SIMD-performance monster, trading blows with 16-core Zen3 (Ryzen 5950X). With RPL having space for 2 extra clusters – Intel could have had 10C + 8c or even 12C (big) AVX512-supporting cores that could go against Zen4…

Alas, what we are getting across SKUs is the same number of big cores (be they 8, 6, 4 or 2) and 2x clusters of Little cores (and thus 2x more little cores) but presumably at lower clocks in order to improve power efficiency. One issue with ADL across SKUs is that while TDP (on paper) is reasonable – turbo power has blown way past even AVX512-supporting “RocketLake” (!) despite the new efficiency claims. Thus, while disappointing, it is clear Intel is trying to bring power under control.

Changes in Sandra to support Hybrid

Like Windows (and other operating systems), we have had to make extensive changes to both detection, thread scheduling and benchmarks to support hybrid/big-LITTLE. Thankfully, this means we are not dependent on Windows support – you can confidently test AlderLake on older operating systems (e.g. Windows 10 or earlier – or Server 2022/2019/2016 or earlier) – although it is probably best to run the very latest operating systems for best overall (outside benchmarking) computing experience.

Detection Changes
- Detect big/P and LITTLE/E cores
- Detect correct number of cores (and type), modules and threads per core -> topology
- Detect correct cache sizes (L1D, L1I, L2) depending on core
- Detect multipliers depending on core

Scheduling Changes
- “All Threads (MT/MC)” (thus all cores + all threads – e.g. 32T
  - “All Cores (MC aka big+LITTLE) Only” (both core types, no threads) – thus 24T
- “All Threads big/P Cores Only” (only “Core” cores + their threads) – thus 16T
  - “big/P Cores Only” (only “Core” cores) – thus 8T
  - “LITTLE/E Cores Only” (only “Atom” cores) – thus 16T
- “Single Thread big/P Core Only” (thus single “Core” core) – thus 1T
- “Single Thread LITTLE/E Core Only” (thus single “Atom” core) – thus 1T

Benchmarking Changes
- Dynamic/Asymmetric workload allocator – based on each thread’s compute power
  - Note some tests/algorithms are not well-suited for this (here P threads will finish and wait for E threads – thus effectively having only E threads). Different ways to test algorithm(s) will be needed.
- Dynamic/Asymmetric buffer sizes – based on each thread’s L1D caches
  - Memory/Cache buffer testing using different block/buffer sizes for P/E threads
  - Algorithms (e.g. GEMM) using different block sizes for P/E threads
- Best performance core/thread default selection – based on test type
  - Some tests/algorithms run best just using cores only (SMT threads would just add overhead)
  - Some tests/algorithms (streaming) run best just using big/P cores only (E cores just too slow and waste memory bandwidth)
  - Some tests/algorithms sharing data run best on same type of cores only (either big/P or LITTLE/E) (sharing between different types of cores incurs higher latencies and lower bandwidth)
- Reporting the Performance Contribution & Ratio of each thread
  - Thus the big/P and LITTLE/E cores contribution for each algorithm can be presented. In effect, this allows better optimisation of algorithms tested, e.g. detecting when either big/P or LITTLE/E cores are not efficiently used (e.g. overloaded)

As per above you can be forgiven that some developers may just restrict their software to use big/Performance threads only and just ignore the LITTLE/Efficient threads at all – at least when using compute heavy algorithms.

For this reason we recommend using the very latest version of Sandra and keep up with updated versions that likely fix bugs, improve performance and stability.

But is it RaptorLake or AlderLake-Refresh?

Unfortunately, it seems that not all CPUs labelled “13th Gen” will be “RaptorLake” (RPL); some middle-range i5 and low-range i3 models will instead come with “AlderLake” (Refresh) ADL-R cores that is likely to confuse ordinary people into buying these older-gen CPUs.

What is more confusing is that the ID (aka CPUID) of these 13th Gen ADL-R/RPL models is the same (e.g. 0B067x) and does not match the old ADL (e.g. 09067x). However, the L2 cache sizes are the same as old ADL (1.25MB for big/Core and 2MB for LITTLE/Atom cluster) not the larger RPL (2MB for big/Core and 4MB for LITTLE/Atom cluster).

Note: There is still a possibility these are actually RPL cores but with L2 cache(s) reduced (part disabled/fused off) in order not to outperform higher models.

CPU (Core) Performance Benchmarking

In this article we test CPU core performance; please see our other articles on:

Hardware Specifications

We are comparing the Intel with competing desktop architectures as well as competitors (AMD) with a view to upgrading to a top-of-the-range, high performance design.

Specifications	Intel Core i9 13900K(F) 8C+16c/32T (RPL)	Intel Core i9 12900K(F) 8C+8c/24T (ADL)	AMD Ryzen 9 7900X 2M/12C/24T (Zen4)	AMD Ryzen 9 5900X 2M/12C/24T (Zen3)	Comments
Arch(itecture)	Raptor Cove + Gracemont / RaptorLake	Golden Cove + Gracemont / AlderLake	Zen4 / Raphael	Zen3 / Vermeer	The very latest arch
Modules (CCX) / Cores (CU) / Threads (SP)	8C+16c / 32T	8C+8c / 24T	2M / 12C / 24T	2M / 12C / 24T	8 more (2x) LITTLE cores!
Rated/Turbo Speed (GHz)	3.0 – 5.8GHz [+12%] / 2.2 – 4.3GHz [+10%]	3.2 – 5.2GHz / 2.4 – 3.9GHz	4.7 – 5.6GHz	3.7 – 4.8GHz	12% big Core, 10% Atom Turbo
Rated/Turbo Power (W)	125 – 253W [PL2][+5%]	125 – 241W [PL2]	170 – 230W [PTT]	105 – 144W [PTT]	5% higher Turbo power
L1D / L1I Caches	8x 48/32kB + 16x 32/64kB	8x 48/32kB + 8x 32/64kB	12x 32kB 8-way / 12x 32kB 8-way	12x 32kB 8-way / 12x 32kB 8-way	Same L1D/L1I caches
L2 Caches	8x 2MB + 4x 4MB (32MB) [+2.3x]	8x 1.25MB + 2x 2MB (14MB)	12x 1MB 16-way (12MB)	12x 512kB 16-way (6MB)	L2 is over 2x larger!
L3 Cache(s)	36MB 16-way [+20%]	30MB 16-way	2x 32MB 16-way (64MB)	2x 32MB 16-way (64MB)	L3 is 20% larger
Microcode (Firmware)	0B0671-10B [B0 stepping]	090672-1E [C0 stepping]	A20F10-1003	8F7100-1009	Revisions just keep on coming.
Special Instruction Sets	VNNI/256, SHA, VAES/256	VNNI/256, SHA, VAES/256	AVX512, VNNI/512, SHA, VAES/512	AVX2/FMA, SHA	AVX512 still MIA
SIMD Width / Units	256-bit	256-bit	512-bit (as 2x 256-bit)	256-bit	Same SIMD units
Price / RRP (USD)	$599	$589	$549	$549	Price is a little higher?

Disclaimer

This is an independent review (critical appraisal) that has not been endorsed nor sponsored by any entity (e.g. Intel, etc.). All trademarks acknowledged and used for identification only under fair use.

The review contains only public information and not provided under NDA nor embargoed. At publication time, the products have not been directly tested by SiSoftware but submitted to the public Benchmark Ranker; thus the accuracy of the benchmark scores cannot be verified, however, they appear consistent and pass current validation checks.

And please, don’t forget small ISVs like ourselves in these very challenging times. Please buy a copy of Sandra if you find our software useful. Your custom means everything to us!

SiSoftware Official Ranker Scores

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets. “RaptorLake” (RPL) does not support AVX512 – but it does support 256-bit versions of some original AVX512 extensions.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 11 x64 (22H2), latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations.

Native Benchmarks		Intel Core i9 13900K(F) 8C+16c/32T (RPL)	Intel Core i9 12900K(F) 8C+8c/24T (ADL)	AMD Ryzen 9 7900X 2M/12C/24T (Zen4)	AMD Ryzen 9 5900X 2M/12C/24T (Zen3)	Comments

	Native Dhrystone Integer (GIPS)	1,060 [+53%]	694	906	631	RPL is 53% faster than ADL!
	Native Dhrystone Long (GIPS)	1,041 [+48%]	703	944	623	A 64-bit integer workload RPL is 48% faster.
	Native FP32 (Float) Whetstone (GFLOPS)	727 [+47%]	496	510	399	With floating-point, RPL is 47% faster
	Native FP64 (Double) Whetstone (GFLOPS)	398 [+3%]	385	437	334	With FP64 RPL is only 3% faster.
With non-SIMD code, we see huge performance uplift in both integer (old’ Dhrystone) and floating-point (old’ Whetstone) of 38% over ADL that help push even past AMD’s new Zen4. The faster big Cores + 8 more Little Atom cores greatly help here. Thus for normal, non-SIMD code – RPL will perform much better and provide a great upgrade over ADL and cement Intel’s domination in some workloads (Cinebench?)…

	Native Integer (Int32) Multi-Media (Mpix/s)	2,930 [+36%]	2,158	3,943*	2,823	RPL is 36% faster than ADL here.
	Native Long (Int64) Multi-Media (Mpix/s)	1,082 [+35%]	801	1,203*	914	With a 64-bit, RPL is 35% faster.
	Native Quad-Int (Int128) Multi-Media (Mpix/s)	201 [+34%]	150	335**	173	Using 64-bit int to emulate Int128 RPL is 34% faster.
	Native Float/FP32 Multi-Media (Mpix/s)	3,228 [+43%]	2,258	3,458*	2,555	In this floating-point vectorised test RPL is 43% faster
	Native Double/FP64 Multi-Media (Mpix/s)	1,650 [+36%]	1,213	1,873*	1,313	Switching to FP64 RPL is 36% faster
	Native Quad-Float/FP128 Multi-Media (Mpix/s)	79 [+39%]	56.66	74.16*	54.64	Using FP64 to mantissa extend FP128 RPL is 39% faster
With heavily vectorised SIMD workloads – RPL sees similar improvement, it is around 37% faster than ADL across all tests with minor variations. For older software just using AVX2/FMA3, RPL just flies past ADL as well as older CPU (Zen3, Zen2, etc.) Unfortunately, AMD’s Zen4 supports AVX512 – which allows it to beat RPL in 5 tests and lose just 1. This shows just how much software can gain from AVX512 even when not executed full width (as Zen4 splits it into 2x 256-bit). Intel will need to find a solution for future arch as more and more software will start supporting AVX512. Note:* using AVX512 instead of AVX2/FMA. Note:** using AVX512-IFMA52 to emulate 128-bit integer operations (int128).

	Crypto AES-256 (GB/s)	34 [+4%]	31.84	28.9***	20.32	RPL is just 4% faster.
	Crypto AES-128 (GB/s)	–	31.9	29***	20.4	What we saw with AES-256 just repeats with AES-128.
	Crypto SHA2-256 (GB/s)	54 [+64%]	33.08*	43**	32.03*	With SHA, RPL is 64% faster than ADL.
	Crypto SHA1 (GB/s)	–	–	–	38.2*	The less compute-intensive SHA1 does not change things due to acceleration.
As streaming tests (crypto/hashing) are memory bound, RPL won’t beat ADL with the same memory speed – it would need much faster DDR5 to feed all the 24 cores! But with SHA, RPL does manage to beat ADL by a huge 64% and thus even AVX512-enabled AMD Zen4 which is a pretty impressive. The extra Little Atom cores can help here with SIMD integer workloads. Note*: using VAES 256-bit (AVX2) or 512-bit (AVX512) Note: using SHA HWA not SIMD (e.g. AVX512, AVX2, AVX, etc.) Note*: using AVX512 not AVX2.

	Black-Scholes float/FP32 (MOPT/s)	–	–	–	532	The standard financial algorithm.
	Black-Scholes double/FP64 (MOPT/s)	735 [+58%]	464	581	453	Switching to FP64 code, RPL is 58% faster
	Binomial float/FP32 (kOPT/s)	–	–	–	219	Binomial uses thread shared data thus stresses the cache & memory system;
	Binomial double/FP64 (kOPT/s)	228 [+47%]	155	172	127	With FP64 code RPL is 47% faster.
	Monte-Carlo float/FP32 (kOPT/s)	–	–	–	415	Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches
	Monte-Carlo double/FP64 (kOPT/s)	308 [+50%]	205	245	176	Here RPL is 50% faster.
AMD’s Zen always did well on non-SIMD floating-point algorithms – but here RPL shows the times are changing; with 52% improvement over ADL, it has no problem dispatching even the latest Zen4 and all of its improvements.

	SGEMM (GFLOPS) float/FP32	–	–	–	679	In this tough vectorised algorithm that is widely used (e.g. AI/ML).
	DGEMM (GFLOPS) double/FP64	437 [-2%]	446	523*	235	We need to do some optimisation here.
	SFFT (GFLOPS) float/FP32	–	–	–	26.74	FFT is also heavily vectorised but stresses the memory sub-system more.
	DFFT (GFLOPS) double/FP64	22.47 [-20%]	28.72	19.02*	12.75	With FP64 code, RPL is memory latency bound
	SNBODY (GFLOPS) float/FP32	–	–	–	791	N-Body simulation is vectorised but fewer memory accesses.
	DNBODY (GFLOPS) double/FP64	300 [+32%]	227	462*	344	With FP64 RPL is 32% faster.
It seems we need to do some optimizations here as the large number of Atom cores seem to cause some performance degradation for some reason. At least in N-Body we’re back to the 32% improvement over ADL which we’re expecting. Here, faster DDR5 memory will make a big difference, we’ll need to see what speed is the “sweet-spot” for RPL, likely DDR5-6400 for that many cores. Note*: using AVX512 not AVX2/FMA3.

	Blur (3×3) Filter (MPix/s)	8,359 [+44%]	5,823	9,413*	4,961	In this vectorised integer RPL is 44% faster.
	Sharpen (5×5) Filter (MPix/s)	3,211 [+41%]	2,275	3,969*	1,893	Same algorithm but more shared data 41% faster.
	Motion-Blur (7×7) Filter (MPix/s)	1,615 [+45%]	1,117	2,060*	946	Again same algorithm but even more data shared – 45% faster
	Edge Detection (2*5×5) Sobel Filter (MPix/s)	2,598 [+35%]	1,926	3,169*	1,554	Different algorithm RPL is 35% faster.
	Noise Removal (5×5) Median Filter (MPix/s)	241 [+54%]	157	862*	176	Still vectorised code RPL is 54% faster.
	Oil Painting Quantise Filter (MPix/s)	109 [+37%]	79.78	64.45	54.6	This test has always been tough RPL is 37% faster.
	Diffusion Randomise (XorShift) Filter (MPix/s)	6,704 [+10%]	6,082	4,050*	2,208	With integer workload, RPL is 10% faster.
	Marbling Perlin Noise 2D Filter (MPix/s)	1,455 [+43%]	1,016	1,027*	656	In this final test we see RPL is 43% faster
These tests love SIMD vectorised compute, thus here RPL is again 38% faster than ADL – and this even allows it to beat the AVX512-enabled Zen4 in 3 out of 8 tests. The test also showed how much Zen4 benefits from AVX512 and in effect how much RPL misses by not having AVX512 enabled. With AMD on board, AVX512 adoption is likely to increase, thus Intel had better bring support to Atom somehow, soon… Note*: using AVX512 not AVX2/FMA3.
Intel RaptorLake 13900K(F) (8C + 16c) Inter-Thread/Core HeatMap Latency (ns)
The inter-thread/core/module latencies “heat-map” shows how the latencies vary when transferring data off-thread (same L1D), off-core (same L3 for big Cores but same L2 for Little Atom cores) and different-core-type (big Core to Little Atom). Still, judicious thread-pair scheduling is needed to keep latencies low (and conversely bandwidth high when large data is transferred.

	Total Inter-Thread Bandwidth – Best Pairing (GB/s)	144 [+29%]	111.9	166*	144	29% more bandwidth than ADL.
With double L2 (either big Cores or Little Atom Cluster) and much bigger L3 cache, RPL has 29% more inter-core bandwidth than ADL. However, the multi-CCX Zen competition has almost double L3 (64MB vs. 36MB) and thus Zen4 with just 12C has higher overall bandwidth – not to mention the 3D-VCache variants. Note:* using AVX512 512-bit wide transfers.

	Average Inter-Thread Latency (ns)	39.7 [+3%]	38.5	44.4	46.2	Overall latencies are 3% higher.
	Inter-Thread Latency (Same Core) Latency (ns)	10.4 [-5%]	11	8.2	9.7	Inter-Thread (big Core) latency is 5% lower.
	Inter-Core Latency (big Core, same Module) Latency (ns)	33.5 [+3%]	32.4	17.2	20.8	We see Inter-big-Core latency 3% higher.
	Inter-Core (Little Core, same Module) Latency (ns)	45.3 [+6%]	42.9	–	–	We see Inter-Little-Atom latency 6% higher.
	Inter-Module/CCX Latency (ns)	–	–	69.9	70.5	n/a
Due to the increased number of Little Atom cores (16 vs. 8 on ADL), the overall latency of RPL is naturally going to be higher than ADL. We see the Inter-Thread of big Cores RPL latency (same L1D) 5% lower on RPL. However the Inter-big-Core or Inter-Little-Atom core latencies 3-6% higher, possibly due to L3 cache settings that need to be checked.

	Aggregate Score (Points)	22,920 [+35%]	17,000	23,010*	13,800	Across all benchmarks, RPL is 35% faster than ADL!
Across all the benchmarks – RPL ends up a good 35% faster over ADL (!) which allows it to catch up with Zen4 – it is just 4% slower! As Sandra’s benchmark scores get significant uplift from AVX512, this means RPL is at a significant disadvantage versus Zen4. Despite its split (2x) 256-bit AVX512 implementation, we’ve seen Zen4 get significant uplift from AVX512 – and here Intel needs to implement a solution for future arch(itecture)s (“MeterorLake” MTL?) as more and more software will add AVX512 support now that AMD is on board. Note*: using AVX512 instead of AVX2/FMA3.

	Price/RRP (USD)	$599	$589	$549	$549	Price is almost the same.

	Price Efficiency (Perf. vs. Cost) (Points/USD)	38.91 [+35%]	28.86	41.91	25.14	Overall 35% more performance for the price
With the price almost the same, the bang-for-buck has also increased by the same amount +35%, that makes RPL much better value than ADL. It also allows it to almost catch Zen4 which remains about 8% better value, not a lot in it.

	Power/TDP – Turbo (W)	125 – 253W [PL2] [+5%]	125 – 241W [PL2]	170 – 230W [PTT]	105 – 144W [PTT]	Turbo is just 5% higher than ADL

	Power Efficiency (Perf. vs. Power) (Points/W)	90.59 [+28%]	70.54	100	95.83	RPL is 28% more efficient than ADL.
With turbo power at least 5% higher than ADL, RPL is thus +28% more power efficient than ADL. This is still short of Zen4 and even the much older Zen3 – Intel still has some work to do here.

Final Thoughts / Conclusions

Summary: Much faster and battling for the top (Intel i9 13900K(F)): 8.5/10

Like every revision (ex-“tock”) Intel arch(itecture), 13th Gen(eration) “RaptorLake” (RPL) is an improved 12th Gen “AlderLake” (ADL), that ends up much faster and thus more efficient (both price-wise and even power-wise), thus returns Intel to competitiveness against AMD’s latest Zen4 – battling for the top spot.

Intel has managed to increase the clocks of the big Cores and almost double L2 cache (2MB/core vs. 1.25MB/core); it has also managed to double (16c vs. 8c) the number of Little Atom cores – as well as double their cluster L2 cache (4MB vs. 2MB). This means the 13900K(F) has twice (24C/c) the number of real cores than AMD’s 7900X (12C) and the same number of threads as AMD’s 7950X (32T).

Across all benchmarks – we see RPL (13900K(F)) 35% faster than ADL (12900K(F)) – not bad from an evolution architecture update! RPL also benefits from more mature software (e.g. Windows 11 22H2) and microcode/BIOS. However, power usage has gone up as well – though this seems to be happening to the competition too (AMD Zen4).

It is disappointing that Intel was not able to enable AVX512 on RPL – but that was very unlikely as the Atom cores are unchanged from ADL. AMD has shown with Zen4 (and even VIA/Centaur) that you don’t need a full-width 512-bit implementation to benefit from AVX512 – and Intel should consider it for future Atom cores. Sandra’s benchmark results gain significant uplift from AVX512 and here RPL is at a distinct disadvantage versus Zen4.

Talking power efficiency, to aggressively save power – you could use the large number of Little Atom cores (16!) to handle just about any workload – and keep the big Cores (8) parked. Effectively, unless gaming/benchmarking/etc. – you’d just be using an ultra efficient many threaded Atom system – but still be able to crank it up when needed.

If you already upgraded to Socket 1700 for the new technologies (DDR5, PCIe 5.0, Thunderbolt/USB 4.o, etc.) and want more, then RPL is a nice upgrade. As the next Intel core arch “MeteorLake” MTL will use a different socket, RPL does not have a great upgrading potential and may be an idea to wait for discounts from Intel or AMD before making a choice…

Summary: Much faster and battling for the top (Intel i9 13900K(F)): 8.5/10

Further Articles

Please see our other articles on:

Disclaimer

And please, don’t forget small ISVs like ourselves in these very challenging times. Please buy a copy of Sandra if you find our software useful. Your custom means everything to us!