AMD Ryzen 7 7800X-3D (Zen4 V-Cache) Review & Benchmarks – VCache for the Win!

What is “Zen4” (Ryzen 7000)?

AMD’s Zen4 (“Raphael”) is the 4rd generation ZEN core – aka the new 7000-series of CPUs from AMD – that brings brand new features like AVX512 ISA (instruction set support), DDR5 and PCIe5. These do require a brand new platform (AM5) almost a decade since the current AM4 platform was launched before even the 1st generation Ryzen. With any luck, it will remain for the next 4 or even more CPU generations, unlike the 2 generation support on competitor (Intel) platform.

Zen4 contains only big/P(erformance) cores and it is not a hybrid design. It remains to be seen if AMD will launch such hybrid (big/LITTLE) products that, in our opinion, are too problematic on desktop platforms for the benefits they bring. Even on mobile platforms where efficiency is a top priority – workloads do not easily lend to a hybrid design despite huge work done on the Windows scheduler for Windows 11. In this regard, a non-hybrid design like Zen4 is very much preferred.

AVX512 is a huge boost for compute performance as we’ve seen on Intel since SKL-X (Skylake-X). There is a reason it exists + all the extensions (IFMA, VNNI, VAES, etc.) and it is not unexpected that even basic usage can bring up to 100% (2x) performance improvement and even higher with specific instructions. While originally CPUs would reduce clocks due to the power generated – this has pretty much been mitigated in modern designs. Even Centaur (before Intel bought them) had AVX512-enabled (LITTLE) cores.

While here AMD has implemented it as 2x 256-bit ops (similar to previous AVX2/FMA3 in Zen1/1+/2 implemented as 2x 128-bit) – we still benefit from 2x more registers + 2x wider registers (4x overall), arguably better instruction specification, optimised extensions (IFMA, VNNI, VAES, etc.) that overall can still build up to a big improvement over old AVX2/FMA3.

5nm process (TSMC) for CCX (vs. 7nm on Zen3) for better efficiency and clocks
6nm process (TSMC) for I/O hub (vs. 12nm for Zen3) for better memory speeds
- claimed 13% IPC increase vs. Zen3 + clock increase uplift => ~29% total uplift vs. Zen 3
AVX512 instruction support, with potential 100%+ improvement in optimised workloads
- Executed as 2x 256-bit (not true 512-bit like Intel) but still many benefits over AVX2/FMA3
- Specific AVX512 extensions (IFMA, VNNI, VAES, etc.) can bring well over 100% improvement
DDR5 support up to 5200Mt/s (official) for much higher memory bandwidth vs. DDR4 Zen3
- Unofficial support for at least 6400Mt/s with XMP3/EXPO profiles
- AMD says 6000Mt/s is the “sweet-spot” for performance/value
1MB L2 per core (2x vs. 512kB on Zen3)
Standard L3 is the same 32MB, V-Cache the same 96MB
PCIe5 support, up to 24 lanes (2x bandwidth vs. PCIe4)
Still up to 2 chiplets (at launch) thus up to 2x 8C big/P cores (16C/32T on 7950X)
Much higher both base and turbo speeds in most variants, e.g. 7950X
- Higher base 4.5GHz of standard CCX (vs. 3.4GHz on 5950X +32% clock uplift)
- Higher base 4.2GHz of V-Cache CCX (vs. 3.4GHz on 5950X +24% clock uplift)
- Higher turbo 5.7GHz (vs. 4.9GHz on 5950X +17% clock uplift)
TDP has increased to 120W (vs. 105W on 5950X) thus 14% higher
- Turbo (PPT aka PL2) around 160W (vs. 142W on 5950X) thus 14% higher
- Note that other models (e.g. 7700X) have kept the same TDP/Turbo
Built-in Radeon Graphics (RDNA2) core
- 2CU / 128SP 400-2.2GHz cores for very basic graphics

AMD Zen4-3D (Ryzen 7800X-3D), V-Cache CCX + I/O

What is the new Zen4-3D V-Cache (Ryzen 7000-3D)?

It is a version of Zen4+ chiplet/CCX with vertically stacked (thus the 3D(imensions) moniker) L3 cache that is 3x larger (thus 96MB). The latency is expected to be slightly higher (+4 clock) and bandwidth also slightly lower (~10% less).

Originally, AMD launched the asymmetric/hybrid (VCache CCX + Standard CCX) dual CCX processors (7950X-3D, 7900X-3D) – likely to benefit from early adopters. Now we finally have the cheaper, single-VCache CCX version (7800X-3D).

Similar to Zen3-3D – the clocks (Base) of the cores on the V-Cache CCX (5.25GHz) are lower than the standard CCX (5.7GHz).

To upgrade from standard Zen4 or not?

Except the new L3 3D/V-Cache cache, there are no other major changes:

Minor stepping update (S2 vs. S0) with no major fixes
- Base and Turbo clocks of standard CCX are the same as original Zen4 (e.g. 7950X)
- Base clocks of V-Cache CCX are lower than original Zen4, thus raw compute power is lower
AMD provided Windows driver to migrate threads to the “proper” CCX while parking other CCX
- Games scheduled on V-Cache/slow CCX
- Normal workloads scheduled on standard/fast CCX
- This assumes the workload uses 16-threads or less

It all depends on the data set(s) of the workload(s) you are running:

Data sets that either entirely fit or can be significantly served in the 96MB L3 cache – will see significant uplift
Inter-core/thread data transfers that can entirely fit in the 3D L3 cache – will see significant uplift
Streaming workloads or with very large data sets may not show uplift but be slower due to lower base/turbo clocks
Compute heavy algorithms with small data sets will be slower due to lower base/turbo clocks

Review

In this article we test CPU core performance; please see our other articles on:

Hardware Specifications

We are comparing the top-range Ryzen 9 7000-series (Zen4 3D) with standard Ryzen 9 and competing architectures with a view to upgrading to a top-range, high performance design.

CPU Specifications	AMD Ryzen 7 7800X-3D 8C/16T (Raphael-3D)	AMD Ryzen 7 5800X-3D 8C/16T (Vermeer-3D)	AMD Ryzen 7 7800X 8C/16T (Raphael)	Intel Core i7 13700K 8C+8c/24T (Raptor Lake)	Comments
Cores (CU) / Threads (SP)	8C / 16T	8C / 16T	8C / 16T	8C+8c / 24T	Core counts remain the same.
Topology	3D/CCX 8C + I/O hub	3D/CCX 8C + I/O hub	CCX 8C + I/O hub	Monolithic die	Same topology but asymmetric
Speed (Min / Max / Turbo) (GHz)	4.2 [+23%] – 5.0GHz [+11%]	3.4 – 4.5GHz	4.5 – 5.7GHz	3.4 – 5.4GHz / 2.5 – 4.2GHz	Base up 23%, turbo 11%
Power (TDP / Turbo) (W)	120 – 253W [+14%]	105 – 135W	105 – 142W	125 – 253W	TDP is 14% higher
L1D / L1I Caches (kB)	8x 32kB 8-way / 8x 32kB 8-way	8x 32kB 8-way / 8x 32kB 8-way	8x 32kB 8-way / 8x 32kB 8-way	8x 64kB + 8x 32kB / 8x 32kB + 8x 48kB	No changes to L1
L2 Caches (MB)	8x 1MB (8MB) 8-way	8x 512kB (4MB) 8-way	8x 1MB (8MB) 8-way	8x 2MB + 2x 4MB [24MB]	L2 is 2x larger
L3 Caches (MB)	96MB 16-way exclusive	96MB 16-way exclusive	32MB 16-way exclusive	20MB 16-way	L3 is the same
Mitigations for Vulnerabilities	BTI/”Spectre”, SSB/”Spectre v4″ hardware	BTI/”Spectre”, SSB/”Spectre v4″ hardware	BTI/”Spectre”, SSB/”Spectre v4″ hardware	BTI/”Spectre”, SSB/”Spectre v4″ hardware	No new fixes required… yet!
Microcode (MU)	A60F12-1203	A20F12-1205	A60F12-03	0B0671-10E	The latest microcodes have been loaded.
SIMD Units	2x 256-bit (512-bit total) AVX512+	256-bit AVX/FMA3/AVX2	2x 256-bit (512-bit total) AVX512+	256-bit AVX/FMA3/AVX2	Same SIMD widths
Price/RRP (USD)	$449	$449	$399	$419	Same price as old 3D

Disclaimer

This is an independent review (critical appraisal) that has not been endorsed nor sponsored by any entity (e.g. AMD, etc.). All trademarks acknowledged and used for identification only under fair use.

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets. Zen4 supports all modern instruction sets including AVX2/FMA3 and crypto SHA HWA but also AVX-512 and extensions (IFMA, VNNI, VAES, etc.)

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 11 x64 (21H2), latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations. All mitigations for vulnerabilities (Meltdown, Spectre, L1TF, MDS, etc.) were enabled as per Windows default where applicable.

Native Benchmarks		AMD Ryzen 7 7800X-3D 8C/16T (Raphael-3D)	AMD Ryzen 7 5800X-3D 8C/16T (Vermeer-3D)	AMD Ryzen 7 7800X 8C/16T (Raphael)	Intel Core i7 13700K 8C+8c/24T (Raptor Lake)	Comments

	Total Inter-Thread Bandwidth – Best Pairing (GB/s)	116* [+17%]	99.2	110*	128	3D Zen4 has 17% more bandwidth
As the 3D/V-Cache L3 is the “star of the show” – we start with the inter-thread benchmark – where we see a +17% overall bandwidth improvement over the Zen3-3D and also higher than the Zen4-standard. Even large data blocks transfers between threads can be fulfilled by the 3D L3 cache and do not need to go through much slower system memory anymore. RPL still has higher bandwidth – but that is largely thanks to the extra 8 little cores with their own L1D and shared L2 caches. Let’s note that most but the very recent CPUs only had up to 16MB L3 if not much less and 128MB total L3 is huge for desktop processors. Note:* using AVX512 512-bit wide transfers.

	Average Inter-Thread Latency (ns)	19.1 [-15%]	22.6	16.8	36.3	15% less latency than Zen3-3D
	Inter-Thread Latency (Same Core) Latency (ns)	8.9 [-15%]	10.5	7.9	11.4	15% lower latency than Zen3-3D
	Inter-Core Latency (big Core, same Module) Latency (ns)	19.8 [-16%]	23.5	17.4	33	16% lower latency than Zen3-3D
	Inter-Core (Little Core, same Module) Latency (ns)	–	–	–	42.5	n/a
	Inter-Module Latency (ns)	–	–	–	–	Single CCX
Overall, Zen4-3D has 15% lower latencies than Zen3-3D which is a big result, but naturally higher latencies than the standard Zen4 that runs at higher clocks.

	Native Dhrystone Integer (GIPS)	538 [+65%]	326	612	211	Z4-3D is 65% faster than Z3-3D.
	Native Dhrystone Long (GIPS)	551 [+63%]	338	637	180	With a 64-bit integer workload, nothing changes
	Native FP32 (Float) Whetstone (GFLOPS)	319 [+16%]	276	350	175	Floating-point performance is 16% faster
	Native FP64 (Double) Whetstone (GFLOPS)	263 [+16%]	227	295	147	With FP64 data nothing changes
Zen4-3D is about 40% faster than the old Zen3-3D – but naturally cannot beat the standard Zen4 with its much higher clocks. In these legacy integer/floating-point benchmarks the V-Cache does not help and clocks rule. In any case, Zen4 (3D or not) soundly beats Intel’s latest RPL (13700K) that is equivalent to ADL’s top-of-the-range ADL (12900K) of yesteryear.

	Native Integer (Int32) Multi-Media (Mpix/s)	2,367* [+25%]	1,894	2,648*	2,207	Z4-3D is 25% faster than old Z3-3D
	Native Long (Int64) Multi-Media (Mpix/s)	710* [+9%]	650	816*	727	With a 64-bit integer workload it’s 9% faster
	Native Quad-Int (Int128) Multi-Media (Mpix/s)	200* [+74%]	115	227*	135	In this 128-bit int emulation, Z4-3D is 75% faster!
	Native Float/FP32 Multi-Media (Mpix/s)	2,118* [+24%]	1,712	2,341*	2,199	In this floating-point test, it’s 24% faster.
	Native Double/FP64 Multi-Media (Mpix/s)	1,159* [+32%]	876	1,271*	1,123	Switching to FP64 code, it’s 32% faster.
	Native Quad-Float/FP128 Multi-Media (Mpix/s)	46* [+26%]	36.23	50.49*	52	Emulating 128-bit floats, Z4-3D is 26% faster.
Even in heavy compute SIMD vectorised algorithms we see similar results, Zen4-3D is 32% faster than the old Zen3-3D but cannot beat the standard Zen4 due to the relatively small data set (Mandelbrot fractal bitmap) that already fits in the standard size L3 caches. If we were to use a much larger data set (e.g. 96-128MB) that would have overwhelmed the smaller caches – but fit in the new 3D V-Cache, we will see a benefit. *Note:** using AVX512 instead of AVX2/FMA.

	Crypto AES-256 (GB/s)	25*** [+22%]	20.43	26.49***	28	Z4-3D sees a 22% improvement Z3-3D
	Crypto AES-128 (GB/s)	25*** [+22%]	20.43		27	What we saw with AES-256 just repeats with AES-128.
	Crypto SHA2-256 (GB/s)	26* [2%]	25.01**	27.92*	31**	With SHA/HWA it is 2% faster
	Crypto SHA1 (GB/s)	–	–	—		The less compute-intensive SHA1 does not change things due to acceleration.
While streaming tests (crypto/hashing) are memory bound, Zen4-3D does not really see an uplift. Again, should our dataset be able to fit entirely in L3 cache(s) or significantly serviced by it – we would see a big improvement over the standard Zen4. But with large dataset (up to 16GB total on 32GB systems) the size of the L3 cache is of little benefit. Again, perhaps allowing configurable size data sets is an idea should these large L3 caches become mainstream. Note*: using VAES 256-bit (AVX2) or 512-bit (AVX512) Note: using SHA HWA not SIMD (e.g. AVX512, AVX2, AVX, etc.) Note*: using AVX512 not AVX2.

	Black-Scholes float/FP32 (MOPT/s)					The standard financial algorithm.
	Black-Scholes double/FP64 (MOPT/s)	344 [+10%]	312	391	461	Switching to FP64 code, Z4-3D is 10% faster.
	Binomial float/FP32 (kOPT/s)					Binomial uses thread shared data thus stresses the cache & memory system.
	Binomial double/FP64 (kOPT/s)	94.9 [+4%]	91.47	117	140	With FP64 code Z4-3D is 4% faster
	Monte-Carlo float/FP32 (kOPT/s)					Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches;
	Monte-Carlo double/FP64 (kOPT/s)	150 [+21%]	124	165	182	Again, we see around 20% improvement.
Ryzen always did well on non-SIMD floating-point algorithms and here 3D Zen4 performs as expected; naturally it cannot beat the faster standard Zen4 and thus we see no uplift from the bigger L3 cache. Again, we need updated algorithms that can buffer into the L3 cache now that it is so big in order to see improvements.

	SGEMM (GFLOPS) float/FP32					In this tough vectorised algorithm that is widely used (e.g. AI/ML).
	DGEMM (GFLOPS) double/FP64	*472 [+42%]**	332	340*	325	With FP64 Z4-3D 42% faster!
	SFFT (GFLOPS) float/FP32					FFT is also heavily vectorised but stresses the memory sub-system more.
	DFFT (GFLOPS) double/FP64	15.52* [+25%]	12.37	14.92*	19.8	With FP64 code, Z4-3D is 25% faster
	SNBODY (GFLOPS) float/FP32					N-Body simulation is vectorised but fewer memory accesses.
	DNBODY (GFLOPS) double/FP64	282 [+30%]	217	318	213	With FP64 precision Z4-3D is 30% faster
The main news here is that with a dataset that fits in the 3D L3 cache in GEMM – we see a 42% improvement over standard Zen4. GEMM is already using the L1D caches to buffer the tiles for higher performance – but here we see the huge improvement the L3 cache makes if the whole dataset fits the L3 cache. Note*: using AVX512 not AVX2/FMA3.

	Blur (3×3) Filter (MPix/s)	6,088* [+75%]	3,469	6,237*	5,130	In this vectorised integer workload Z4-3D is 75% faster!
	Sharpen (5×5) Filter (MPix/s)	2,355* [+81%]	1,299	2,673*	1,943	Same algorithm but more shared data 81% faster
	Motion-Blur (7×7) Filter (MPix/s)	1,215* [+82%]	667	1,385*	966	Again same algorithm but even more data shared – 82% faster
	Edge Detection (2*5×5) Sobel Filter (MPix/s)	1,873* [+71%]	1,094	2,109*	1,624	Different algorithm but still vectorised no change.
	Noise Removal (5×5) Median Filter (MPix/s)	253* [+2.2x]	115	284*	141	Still vectorised code but over 2x faster
	Oil Painting Quantise Filter (MPix/s)	39* [+9%]	35.81	43.37*	72	This test has always been tough – 9% faster
	Diffusion Randomise (XorShift) Filter (MPix/s)	*5,608 [+43%]**	3,917	4,841*	5,539	With integer workload, we see an unexpected 43% faster
	Marbling Perlin Noise 2D Filter (MPix/s)	633* [+29%]	490	692*	986	In this final test we see little change.
Again, if the dataset is too small and thus can fit in the normal L3 caches (e.g. 64MB) – you’re not going to see benefit from the much larger 3D V-Cache. Interestingly, we see 2 tests where the improvement is a huge 40%! This is not a fluke, as we’ve seen similar improvement in Zen3-3D vs. standard Zen3. Note*: using AVX512 not AVX2/FMA3.

	Aggregate Score (Points)	15,430* [+28%]	12,050	16,390*	16,020	Zen4-3D is 28% faster than Zen3-3D
As with standard Zen4, Zen4-3D is a huge 28% faster across all benchmarks than the older Zen3-3D; still without using the VCache-optimised datasets the normal Zen4 performs better. Note*: using AVX512 note AVX2/FMA3.

	Price/RRP (USD)	$449 [=]	$449	$399	$419	Same price

	Price Efficiency (Perf. vs. Cost) (Points/USD)	34.37 [+28%]	26.84	41.08	38.23	Zen4-3D is 28% better than the old one
AMD has kept the same price, thus the new Zen4-3D provides 28% better performance/price. However, due to lower price, the standard Zen4 is more efficient and even Intel’s RPL seems more efficient.

	Power/TDP (W)	128 – 253W [+18%]	105 – 135W	105 – 142W	125 – 253W	TDP is 18% higher

	Power Efficiency (Perf. vs. Power) (Points/W)	128 [+12%]	115	156	128	As the TDP is a bit higher, Zen4-3D is 12% better
With the slightly higher TDP, Zen4-3D comes up just 12% more power efficient than the old Zen3-3D but that is bound to increase going forward as apps are updated to support it. Still, the standard Zen4 reigns supreme due to both lower power and higher performance – at least on our benchmarks.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

Summary: The 8-Core King Returns: 9/10

Even with the original 3D V-Cache Zen3 (5800X-3D) – the biggest issue was that the standard Zen3 was too good/performant and the huge L3 cache only made a difference in some workloads (notably games!). The standard 32MB L3 CCX cache is already large enough and fast enough especially considering the competition (Intel). Still, the 3D model had 3x (three times) larger L3 that can be a big asset.

Unlike the multi-CCX designs with asymmetric/hybrid L3 cache of different sizes – the 7800X-3D brings back “normality” with a single, unified 3D-VCache. No need for special drivers for games and other applications to schedule threads on the “right” CCX for best performance and turn off other cores/CCX for best power efficiency.

Due to much higher bandwidth on AM5/DDR5 platform (e.g. standard DDR5-6500 memory) vs. old AM4/DDR4 (e.g. common DDR4-3200 memory) – Zen4-3D takes less of a hit going to main memory than Zen3-3D though L3 bandwidth is still 10x higher than DDR5.

In the end – it all depends on your workloads: if you game regularly and thus want a 3D/V-Cache Zen4 but also regularly need more cores/threads for other tasks than a (future) 7800X-3D can provide, then these 7950X-3D/7900X-3D could work for you.

Otherwise you’re better off with the standard Zen4 (7700X), or if you are still on AM4 platform, the Zen3-3D (5800X-3D) has come down in price and is still very much competitive.

Please see the other reviews on other Ryzen variants: