AMD Ryzen 7 5800X-3D (Zen3 V-Cache) Review & Benchmarks – CPU & Cache Performance

What is “Zen3” (Ryzen 5000)?

AMD’s Zen3 (“Vermeer”) is the 3rd generation ZEN core – aka the new 5000-series of CPUs from AMD, that introduces further refinements of the ZEN(2) core and layout. An APU version (with integrated graphics) is also available. The CPU/APUs remain socket AM4 compatible on desktop – thus allowing in-place upgrade (subject to BIOS upgrade as always) – but series 500-chipsets are recommended to enable all features (e.g. PCIe4, etc.). [Note this is the last CPU that will fit AM4 socket; future CPUs supporting DDR5 need a new socket]

Unlike ZEN2, the main changes are to the core/cache layout but they could still prove significant considering the cache/memory latencies issues that have impacted ZEN designs:

(AMD) Claims +19% IPC (instructions per clock) overall improvement vs. ZEN2
- Higher base and turbo clocks +7% [for 5800X vs. 3700X]
Still built around “chiplets” CCX (“core complexes”) but now of 8C/16T and larger L3 cache (still 7nm)
- Same central I/O hub with memory controller(s) and PCIe 4.0 bridges connected through IF (“Infinity Fabric”) (12nm)
- Still up to 2 chiplets on desktop platform thus up to 2x 8C (16C/32T 5950X)
L3 is still the same 32MB but now unified (not 2x 16MB) still up to 64MB on 5950X
- 3D V-Cache L3 is 96MB unified, thus 3x (!) larger than original Zen3
20 PCIe 4.0 lanes
2x DDR4 memory controllers up to 3200Mt/s official (4266Mt/s max) [future AM5 socket for DDR5 support]

What is the new Zen3-3D V-Cache (Ryzen 5000-3D)?

It is a version of Zen3+ chiplet with vertically stacked (thus the 3D(imensions) moniker) L3 cache that is 3x larger (thus 96MB). The latency is expected to be slightly higher (+4 clock) and bandwidth also slightly lower (~10% less).

However, the sheer size of the L3 cache allows many (desktop) workloads’ data sets to be fulfilled directly from the L3 cache thus avoiding main memory access (with higher latencies and lower bandwidth). Inter-core/thread transfers of relatively large data sets (12MB/core) can also be fulfilled directly by the L3 cache.

Until recently, top-end 8-core Intel CPUs (e.g. 11900K, 10700K, etc.) had only 16MB L3 cache (1/2x normal Ryzen, 1/6x 3D Ryzen) – with only recent Intel “AlderLake” (ADL) 16-core (8C+8c) having a comparable 30MB L3 cache.

To upgrade from standard Zen3 or not?

Except the new L3 3D/V-Cache cache, there are no other major changes:

Minor stepping update (S2 vs. S0) with no major fixes
Requires AGESA V2 1.2.0.6+ for support – update BIOS before installing
Base and Turbo clocks are lower than normal Zen3 (5800X), thus raw compute power is lower

It all depends on the data set(s) of the workload(s) you are running:

Data sets that either entirely fit or can be significantly served in the 96MB L3 cache – will see significant uplift
Inter-core/thread data transfers that can entirely fit in the 3D L3 cache – will see significant uplift
Streaming workloads or with very large data sets may not show uplift but be slower due to lower base/turbo clocks
Compute heavy algorithms with small data sets will be slower due to lower base/turbo clocks

Review

In this article we test CPU core performance; please see our other articles on:

Hardware Specifications

We are comparing the top-range Ryzen 7 5000-series (Zen3 8-core) with previous generation Ryzen 7 3000-series (Zen2 8-core) and competing architectures with a view to upgrading to a top-range, high performance design.

CPU Specifications	AMD Ryzen 7 5800X-3D 8C/16T (Vermeer-3D)	AMD Ryzen 7 5800X 8C/16T (Vermeer)	Intel Core i7 11700K 8C/16T (RocketLake)	Intel Core i7 12700 8C+4c / 20T (AlderLake)	Comments
Cores (CU) / Threads (SP)	8C / 16T	8C / 16T	8C / 16T	8C + 4c / 20T	Core counts remain the same.
Topology	1 chiplet, 1 CCX, each 8 core (8C) + I/O hub	1 chiplet, 1 CCX, each 8 core (8C) + I/O hub	Monolithic die	Monolithic die	Same topology
Speed (Min / Max / Turbo) (GHz)	3.4 / 4.5GHz	3.8 / 4.7GHz	3.6 / 5GHz	2.1+1.6 / 4.8+3.6	Both base and turbo are down
Power (TDP / Turbo) (W)	105 / 135W (PL2)	105 / 135W (PL2)	125 / 175W (PL2)	65 / 180W (PL2)	Same TDP
L1D / L1I Caches (kB)	8x 32kB 8-way / 8x 32kB 8-way	8x 32kB 8-way / 8x 32kB 8-way	8x 32kB 8-way / 8x 32kB 8-way	8x 32k+4x 48kB / 8x 48kB + 4x 32kB	No changes to L1
L2 Caches (MB)	8x 512kB (4MB) 8-way inclusive	8x 512kB (4MB) 8-way inclusive	8x 512kB (4MB)	8x 1.25MB + 2MB	No changes to L2
L3 Caches (MB)	96MB 16-way exclusive [+3x]	32MB 16-way exclusive	16MB 16-way	25MB 11-way	3x larger L3
Mitigations for Vulnerabilities	BTI/”Spectre”, SSB/”Spectre v4″ hardware	BTI/”Spectre”, SSB/”Spectre v4″ hardware	BTI/”Spectre”, SSB/”Spectre v4″ software/firmware	BTI/”Spectre”, SSB/”Spectre v4″ software/firmware	No new fixes required… yet!
Microcode (MU)	A20F12-05	A20F10-16	0A0671-50	090672-15	The latest microcodes have been loaded.
SIMD Units	256-bit AVX/FMA3/AVX2	256-bit AVX/FMA3/AVX2	512-bit AVX512	256-bit AVX/FMA3/AVX2	Same SIMD widths
Price/RRP (USD)	$449	$449	$399	$349	Same price as normal version

Disclaimer

This is an independent review (critical appraisal) that has not been endorsed nor sponsored by any entity (e.g. AMD, etc.). All trademarks acknowledged and used for identification only under fair use.

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (AVX2, FMA3, AVX, etc.). Zen3 supports all modern instruction sets including AVX2, FMA3 and even more like SHA HWA but not AVX-512.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations. All mitigations for vulnerabilities (Meltdown, Spectre, L1TF, MDS, etc.) were enabled as per Windows default where applicable.

Native Benchmarks		AMD Ryzen 7 5800X-3D 8C/16T (Vermeer-3D)	AMD Ryzen 7 5800X 8C/16T (Vermeer)	Intel Core i7 11700K 8C/16T (RocketLake)	Intel Core i7 12700 8C+4c / 20T (AlderLake)	Comments

	Total Inter-Thread Bandwidth – Best Pairing (GB/s)	99.2 [+8%]	91.55	77.89*	94.69	3D Zen3 has 8% more overall bandwidth
As the 3D L3 is the “star of the show” – we start with the inter-thread benchmark – where we see a +8% overall bandwidth improvement over the original Zen3, as even large data blocks transfers between threads can be fulfilled by the 3D L3 cache and do not need to go through much slower system memory anymore. This should benefit all algorithms where larger data blocks are processed that cannot fit even in the generous 32MB of the original Zen3 L3 cache. Let’s note that most but the very recent CPUs only had up to 16MB L3 if not much less, even the original Zen3 has the largest L3 in the business. Note:* using AVX512 512-bit wide transfers.

	Average Inter-Thread Latency (ns)	22.6 [+13%]	20	29.3	42.6	3D Zen3 is 13% slower.
	Inter-Thread Latency (Same Core) Latency (ns)	10.5 [+9%]	9.6	13.4	14.6	Inter-module is also 9% slower.
	Inter-Core Latency (big Core, same Module) Latency (ns)	23.5 [+13%]	20.8	30.4	38.9	Similar 13% slower than Zen3.
	Inter-Core (Little Core, same Module) Latency (ns)	–	–	–	51.2	n/a
	Inter-Big-Little Latency (Same Module) Latency (ns)	–	–	–	56.4	n/a
Surprisingly, we see 3D Zen3’s inter-core latencies somewhat higher than we’d expect just by clock difference (+5%), most likely it is some configuration issue. In any case, they are still much lower than the competition (Intel) and this has not changed.

	Native Dhrystone Integer (GIPS)	325 [-4%]	339	224	378	3D Zen3 is 4% slower than the normal version.
	Native Dhrystone Long (GIPS)	338 [-1%]	343	207	377	With a 64-bit integer workload, it’s 1% slower
	Native FP32 (Float) Whetstone (GFLOPS)	276 [-5%]	290	165	280	Floating-point performance is 5W slower
	Native FP64 (Double) Whetstone (GFLOPS)	227 [-7%]	243	139	207	With FP64 we’re down 7%, still beating Intel
3D Zen3 is about 4-5% slower than normal Zen3 – that is exactly what we’d expect from the lower clocks (-5%) in these legacy integer/floating-point benchmarks – that fit entirely in the L1/L2 and won’t take any advantage of the immense new L3 cache. Against the competition, the situation does not change much, with Zen3 still competitive against Intel’s ADL.

	Native Integer (Int32) Multi-Media (Mpix/s)	1,894 [-5%]	1,997	1,428*	1,361	3D Zen3 is again 5% slower than Zen3 as expected
	Native Long (Int64) Multi-Media (Mpix/s)	650 [-6%]	691	363*	540	With a 64-bit integer workload nothing changes.
	Native Quad-Int (Int128) Multi-Media (Mpix/s)	115 [-6%]	122	78.8*	98.2	This is a tough test using Long integers to emulate Int128 nothing changes.
	Native Float/FP32 Multi-Media (Mpix/s)	1,712 [-7%]	1,847	890*	1,413	In this floating-point test, we’re 7% slower.
	Native Double/FP64 Multi-Media (Mpix/s)	876 [-7%]	946	446*	787	Switching to FP64 code, nothing changes.
	Native Quad-Float/FP128 Multi-Media (Mpix/s)	36.23 [-5%]	38.3	22.5*	40.5	In this heavy algorithm using FP64 to mantissa extend FP128, nothing changes.
Even in heavy compute SIMD vectorised algorithms we see the same results, ~5% slower than normal Zen3 as expected. This is due to the relatively small data set (Mandelbrot fractal bitmap) that already fits in “normal size” L3 caches. If we were to use a much larger data set (e.g. 64MB) that would have overwhelmed the smaller caches – but fit in the new 3D V-Cache, we will see a benefit. We are looking to provide benchmark configuration with larger datasets in order to show this benefit if such caches become mainstream. *Note:** using AVX512 instead of AVX2/FMA.

	Crypto AES-256 (GB/s)	20.43 [+5%]	19.39	23.53***	18.5***	3D Zen3 sees a 5% improvement over normal Zen3.
	Crypto AES-128 (GB/s)	20.43 [+5%]	19.39	23.71***	18.5***	What we saw with AES-256 just repeats with AES-128.
	Crypto SHA2-256 (GB/s)	25.01** [-5%]	26.25**	14.45*	22.41	With SHA/HWA we return to 5% slower.
	Crypto SHA1 (GB/s)		28.56**	39.94*		The less compute-intensive SHA1 does not change things due to acceleration.
While streaming tests (crypto/hashing) are memory bound, 3D Zen3 does see a small improvement (+5%) in AES but the same drop (-5%) in SHA – thus overall pretty much tied with original Zen3. Again, should our dataset be able to fit entirely in L3 cache or significantly serviced by it – we would see a big improvement over the orignal Zen3. But with large dataset (up to 16GB total on 32GB systems) the size of the L3 cache is of little benefit. Again, perhaps allowing configurable size data sets is an idea should these large L3 caches become mainstream. Note*: using VAES 256-bit (AVX2) or 512-bit (AVX512) Note: using SHA HWA not SIMD (e.g. AVX512, AVX2, AVX, etc.) Note*: using AVX512 not AVX2.

	Black-Scholes float/FP32 (MOPT/s)	–	371	–	–	The standard financial algorithm.
	Black-Scholes double/FP64 (MOPT/s)	312 [-6%]	332	150	347	Switching to FP64 code, 3D Zen3 is 6% slower.
	Binomial float/FP32 (kOPT/s)	–	162	–	–	Binomial uses thread shared data thus stresses the cache & memory system;
	Binomial double/FP64 (kOPT/s)	91.5 [-7%]	98.7	41.5	105	With FP64 code 3D Zen3 is 7% slower.
	Monte-Carlo float/FP32 (kOPT/s)	–	292	–	–	Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches;
	Monte-Carlo double/FP64 (kOPT/s)	124 [-6%]	132	54.4	138	No improvement here either.
Ryzen always did well on non-SIMD floating-point algorithms and here 3D Zen3 performs as expected, it is about 6% slower than normal Zen3. Again, we need updated algorithms that can buffer into the L3 cache now that it is so big in order to see improvements.

	SGEMM (GFLOPS) float/FP32	–	410	553*	–	In this tough vectorised algorithm that is widely used (e.g. AI/ML).
	DGEMM (GFLOPS) double/FP64	332 [+74%]	191	211*	178	With FP64 3D Zen finally sees big uplift.
	SFFT (GFLOPS) float/FP32	–	23.7	30.6*	–	FFT is also heavily vectorised but stresses the memory sub-system more.
	DFFT (GFLOPS) double/FP64	12.37 [=]	12.43	14.57*	11.24	With FP64 code, scores are tied.
	SNBODY (GFLOPS) float/FP32	–	518	606*	–	N-Body simulation is vectorised but fewer memory accesses.
	DNBODY (GFLOPS) double/FP64	217 [-6%]	231	108*	165	With FP64 precision 3D Zen3 is 6% slower.
The main news here is that with a dataset that fits in the 3D L3 cache in GEMM – we see a 74% improvement over normal Zen3. GEMM is already using the L1D caches to buffer the tiles for higher performance – but here we see the huge improvement the L3 cache makes if the whole dataset fits the L3 cache. Note*: using AVX512 not AVX2/FMA3.

	Blur (3×3) Filter (MPix/s)	3,469 [-5%]	3,642	3,803*	3,430	In this vectorised integer workload 3D Zen3 is 5% slower.
	Sharpen (5×5) Filter (MPix/s)	1,299 [-5%]	1,372	1,907*	1,353	Same algorithm but more shared data no changes.
	Motion-Blur (7×7) Filter (MPix/s)	667 [-5%]	703	981*	679	Again same algorithm but even more data shared – no change.
	Edge Detection (2*5×5) Sobel Filter (MPix/s)	1,094 [-6%]	1,162	1,523*	1,146	Different algorithm but still vectorised no change.
	Noise Removal (5×5) Median Filter (MPix/s)	115 [-7%]	123	236*	103	Still vectorised code but no change
	Oil Painting Quantise Filter (MPix/s)	35.81 [-5%]	37.89	72.7*	54.54	This test has always been tough but still no change.
	Diffusion Randomise (XorShift) Filter (MPix/s)	3,917 [+23%]	3,190	3,739*	3,755	With integer workload, we see an unexpected 23% improvement
	Marbling Perlin Noise 2D Filter (MPix/s)	490 [-3%]	507	951*	764	In this final test we see little change.
Again, if the dataset is too small and thus can fit in the normal L3 caches (e.g. 32MB) – you’re not going to see benefit from the much larger 3D V-Cache. In all other respects, 3D Zen3 performs as expected. Note*: using AVX512 not AVX2/FMA3.

	Aggregate Score (Points)	12,050 [+3%]	11,740	9,610*	10,790	Across all benchmarks, 3D Zen3 is 3% faster!
Despite being 5% slower in most compute benchmarks, the cache-sensitive benchmarks (Inter-Core Transfer, Crypto AES) do manage to bring 3D Zen3 to 3% faster than the normal Zen3 – which is a great result. Note*: using AVX512 note AVX2/FMA3.

	Price/RRP (USD)	$449 [=]	$449	$399	$349	Price stays the same.

	Price Efficiency (Perf. vs. Cost) (Points/USD)	26.84 [+3%]	26.15	24.09	30.92	Small 3% efficiency in line with performance.
As AMD has kept the cost the same – 3D Zen3 sees the same improvement as overall performance: +3%. This means it is still below Intel’s latest ADL competition – that is much cheaper and thus more “bang-per-buck” despite lower overall performance. How the tables have turned!

	Power/TDP (W)	105-135 [=]	105-135	125-175	65-180	TDP has remained the same.

	Power Efficiency (Perf. vs. Power) (Points/W)	114.7 [+3%]	111.8	76.88	166	As TDP is the same, we see same improvement.
As AMD has kept the TDP the same – and lowered clocks to make sure that actual power consumed is kept in check – we see the same performance uplift as overall performance. Perhaps AMD could have just kept the clocks the same to ensure an outright victory over the normal Zen3.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

Summary: Recommended if moving to Zen3 from older versions of Ryzen (8/10).

Perhaps the biggest issue with the 3D V-Cache Zen3 is that the original (standard) Zen3 is too good – that the enormous 3D cache does not make more of a difference. The original Zen3’s L3 cache (32MB) is already large (compared to all but most recent CPUs especially Intel competition), provides good bandwidth, it has reasonably low latencies – and is already unified!

As 3D Zen3 has lower base/turbo clocks, it is already at a bit of disadvantage over original Zen3 – and in raw compute workloads it is naturally ~5% slower. In workloads with small data sets that already fit in the original L3 cache (32MB) – the higher latency and slightly lower bandwidth of the 3D L3 cache – makes it slightly slower than original Zen3 yet again.

We do see good gains in inter-thread transfer bandwidth (when larger blocks are transferred between threads) of about +9% overall and overall cache & memory bandwidth is overall +20% higher (when larger blocks are read/written) which can improve some algorithms (e.g. GEMM) by over 70%. But it all depends on the dataset size.

If you work with datasets comparable to the new 3D L3 cache size – you will thus see a big uplift in performance. Otherwise, you may well small decrease in performance. Thus it is a very niche product – but at the same price point & TDP – it is one we’d choose over the original if moving to Zen3 from older versions. In effect, it is the “top-end” 8-core AM4 socket Ryzen!

But for “top-end” AMD4 socket performance – there are higher core Zen3 versions, all the way up to the 16-core 5950X – which may also be “upgraded” to 3D V-Cache at some point – that also have larger (2x 32MB aka 64MB) total L3 cache. With more cores/threads, the 3D Zen3 cannot be expected to match/beat them just with a L3 cache upgrade.

Please see the other reviews on different Ryzen variants: