AMD Ryzen 7 3700X (Zen2) Review & Benchmarks – CPU 8-core/16-thread Performance

What is “ZEN2”?

AMD’s Zen2 (“Matisse”) is the “true” 2nd generation ZEN core on 7nm process shrink while the previous ZEN+ (“Pinnacle Ridge”) core was just an optimisation of the original ZEN (“Summit Ridge”) core that while socket compatible it introduces many design improvements over both previous cores. An APU version (with integrated “Navi” graphics) is scheduled to be launched later.

While new chipsets (500 series) will also be introduced and required to support some new features (PCIe 4.0), with an BIOS/firmware update older boards may support them thus allowing upgrades to existing systems adding more cores and thus performance. [Note: older boards will not be enabled for PCIe 4.0 after all]

The list of changes vs. previous ZEN/ZEN+ is extensive thus performance delta is likely to be very different also:

Built around “chiplets” of up to 2 CCX (“core complexes”) each of 4C/8T and 8MB L3 cache (7nm)
Central I/O hub with memory controller(s) and PCIe 4.0 bridges connected through IF (“Infinity Fabric”) (12nm)
Up to 2 chiplets on desktop platform thus up to 2x2x4C (16C/32T 3950X) (same amount as old ThreadRipper 1950X/2950X)
2x larger L3 cache per CCX thus up to 2x2x16MB (64MB) L3 cache (3900X+)
20 PCIe 4.0 lanes (2x higher transfer rate over PCIe 3.0)
2x DDR4 memory controllers up to 3200Mt/s official (4266Mt/s max)

To upgrade from Ryzen+/Ryzen1 or not?

Micro-architecturally there are more changes that should improve performance:

256-bit (single-op) SIMD units 2x Fmacs (fixing a major deficiency in ZEN/ZEN+ cores)
TLB (2nd level) increased (should help out-of-page access latencies that are somewhat high on ZEN/ZEN+)
Memory latencies claim to be reduced through higher-speed memory (note all requests go through IF to Central I/O hub with memory controllers)
Load/Store 32bytes/cycle (2x ZEN/ZEN+) to keep up with the 256-bit SIMD units (L1D bandwidth should be 2x)
L3 cache is 2x ZEN/ZEN+ but higher latency (cache is exclusive)
Infinity Fabric is 512-bit (2x ZEN/ZEN+) and can run 1x or 1/2x vs. DRAM clock (when higher than 3733Mt/s)
AMD processors have thankfully not been affected by most of the vulnerabilities bar two (BTI/”Spectre”, SSB/”Spectre v4″) that have now been addressed in hardware.
HWM-P (hardware performance state management) transitions latencies reduced (ACPI/CPPCv2)

In this article we test CPU core performance; please see our other articles on:

Hardware Specifications

We are comparing the middle-of-the-range Ryzen2 (3700X) with previous generation Ryzen+ (2700X) and competing architectures with a view to upgrading to a mid-range high performance design.

CPU Specifications	AMD Ryzen 9 3900X (Matisse)	AMD Ryzen 7 3700X (Matisse)	AMD Ryzen 7 2700X (Pinnacle Ridge)	Intel i9 9900K (Coffeelake-R)	Intel i9 7900X (Skylake-X)	Comments
Cores (CU) / Threads (SP)	12C / 24T	8C / 16T	8C / 16T	8C / 16T	10C / 20T	Core counts remain the same.
Topology	2 chiplets, each 2 CCX, each 3 cores (1 disabled) (12C)	1 chiplet, 2 CCX, each 4 cores (8C)	2 CCX, each 4 cores (8C)	Monolithic die	Monolithic die	1 chiplet+1 sio rather than 1 die
Speed (Min / Max / Turbo)	3.8 / 4.6GHz	3.6 / 4.4GHz	3.7 / 4.2GHz	3.6 / 5GHz	3.3 / 4.3GHz	3700x base clock is lower than 2700x but turbo is higher
Power (TDP / Turbo)	105 / 135W	65 / 90W	105 / 135W	95 / 135W	140 / 308W	TDP has been greatly reduced vs. ZEN+
L1D / L1I Caches	12x 32kB 8-way / 12x 32kB 8-way	8x 32kB 8-way / 8x 32kB 8-way	8x 32kB 8-way / 8x 64kB 4-way	8x 32kB 8-way / 8x 32kB 8-way	10x 32kB 8-way / 10x 32kB 8-way	L1I has been halved but better no. ways
L2 Caches	12x 512kB (6MB) 8-way	8x 512kB (4MB) 8-way	8x 512kB (4MB) 8-way	8x 256kB (2MB) 16-way	10x 1MB (10MB) 16-way	No changes to L2
L3 Caches	2x2x 16MB (64MB) 16-way	2x 16MB (32MB) 16-way	2x 8MB (16MB) 16-way	16MB 16-way	13.75MB 11-way	L3 is 2x ZEN+
Mitigations for Vulnerabilities	BTI/”Spectre”, SSB/”Spectre v4″ hardware	BTI/”Spectre”, SSB/”Spectre v4″ hardware	BTI/”Spectre”, SSB/”Spectre v4″ software/firmware	RDCL/”Meltdown”, L1TF hardware, BTI/”Spectre”, MDS/”Zombieload”, software/firmware	RDCL/”Meltdown” , L1TF, BTI/”Spectre”, MDS/”Zombieload”, all software/firmware	Ryzen2 addresses the remaining 2 vulnerabilities while Intel was forced to add MDS to its long list…
Microcode	MU-8F7100-11	MU-8F7100-11	MU-8F0802-04	MU-069E0C-9E	MU-065504-49	The latest microcodes included in the respective BIOS/Windows have been loaded.
SIMD Units	256-bit AVX/FMA3/AVX2	256-bit AVX/FMA3/AVX2	128bit AVX/FMA3/AVX2	256-bit AVX/FMA3/AVX2	512-bit AVX512	ZEN2 SIMD units are 2x wider than ZEN+

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (AVX2, FMA3, AVX, etc.). Ryzen2 supports all modern instruction sets including AVX2, FMA3 and even more like SHA HWA but not AVX-512.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations. All mitigations for vulnerabilities (Meltdown, Spectre, L1TF, MDS, etc.) were enabled as per Windows default where applicable.

Native Benchmarks		AMD Ryzen 7 3700X (Matisse)	AMD Ryzen 7 2700X (Pinnacle Ridge)	Intel i9 9900K (Coffeelake-R)	Intel i9 7900X (Skylake-X)	Comments

	Native Dhrystone Integer (GIPS)	336 [=]	334	400	485	We start with no improvement over ZEN+
	Native Dhrystone Long (GIPS)	339 [=]	335	393	485	With a 64-bit integer workload nothing much changes.
	Native FP32 (Float) Whetstone (GFLOPS)	202 [+2%]	198	236	262	Floating-point performance does not change delta either – only 2% faster
	Native FP64 (Double) Whetstone (GFLOPS)	170 [=]	169	196	223	With FP64 nothing much changes again.
In the legacy integer/floating-point benchmarks ZEN2 is not any faster than ZEN+ despite the change in clocks. Perhaps future microcode updates will help?

	Native Integer (Int32) Multi-Media (Mpix/s)	1023 [+78%]	574	985	1590	ZEN2 is ~80% faster than ZEN+ despite what we’ve seen before.
	Native Long (Int64) Multi-Media (Mpix/s)	374 [+2x]	187	414	581	With a 64-bit AVX2 integer vectorised workload, ZEN2 is now 2x faster.
	Native Quad-Int (Int128) Multi-Media (Mpix/s)	6.56 [+13%]	5.8	6.75	7.56	This is a tough test using Long integers to emulate Int128 without SIMD; here ZEN2 is still 13% faster.
	Native Float/FP32 Multi-Media (Mpix/s)	100 [+68%]	596	914	1760	In this floating-point AVX/FMA vectorised test, ZEN2 is ~70% faster.
	Native Double/FP64 Multi-Media (Mpix/s)	618 [+84%]	335	535	533	Switching to FP64 SIMD code, ZEN2 is now ~90% faster than ZEN+
	Native Quad-Float/FP128 Multi-Media (Mpix/s)	24.22 [+55%]	15.6	23	40.3	In this heavy algorithm using FP64 to mantissa extend FP128, ZEN2 is still 55% faster
With its brand-new 256-bit SIMD units, ZEN2 is anywhere from 55% to 100% faster than ZEN+/ZEN1 a huge upgrade from one generation to the next. For SIMD loads upgrading to ZEN2 gives a huge performance uplift.

	Crypto AES-256 (GB/s)	18 [+12%]	16.1	17.63	23	With AES/HWA support all CPUs are memory bandwidth bound but ZEN2 manages a 12% improvement.
	Crypto AES-128 (GB/s)	18.76 [+17%]	16.1	17.61	23	What we saw with AES-256 just repeats with AES-128; ZEN2 is now 17% faster.
	Crypto SHA2-256 (GB/s)	20.21 [+9%]	18.6	12	26	With SHA/HWA ZEN2 similarly powers through hashing tests leaving Intel in the dust – and is still ~10% faster than ZEN+
	Crypto SHA1 (GB/s)	20.41 [+6%]	19.3	22.9	38	The less compute-intensive SHA1 does not change things due to acceleration.
	Crypto SHA2-512 (GB/s)		3.77	9	21	–
ZEN2 with AES/SHA HWA is memory bound like all other CPUs, but it still manages 6-17% better performance than ZEN+ using the same memory. But as ZEN2 is rated for faster memory – using such memory would greatly improve the results.

	Black-Scholes float/FP32 (MOPT/s)	–	257	276	309	–
	Black-Scholes double/FP64 (MOPT/s)	229 [+5%]	219	238	277	Switching to FP64 code, ZEN2 is just 5% faster.
	Binomial float/FP32 (kOPT/s)	–	107	59.9	70.5	Binomial uses thread shared data thus stresses the cache & memory system;
	Binomial double/FP64 (kOPT/s)	57.98 [-4%]	60.6	61.6	68	With FP64 code ZEN2 is 4% slower.
	Monte-Carlo float/FP32 (kOPT/s)	–	54.2	56.5	63	Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches;
	Monte-Carlo double/FP64 (kOPT/s)	46.34 [+13%]	41	44.5	50.5	Switching to FP64 nothing much changes, ZEN2 is 13% faster.
Ryzen always did well on non-SIMD floating-point algorithms and here it does not disappoint: ZEN2 does not improve much and is pretty much tied with ZEN+ – thus for non SIMD workloads you might as well stick with the older versions.

	SGEMM (GFLOPS) float/FP32	263 [-12%]	300	375	413	In this tough vectorised algorithm ZEN2 is strangely slower.
	DGEMM (GFLOPS) double/FP64	193 [+63%]	119	209	212	With FP64 vectorised code, ZEN2 comes back to be over 60% faster.
	SFFT (GFLOPS) float/FP32	22.78 [+2.5x]	9	22.33	28.6	FFT is also heavily vectorised but stresses the memory sub-system more; ZEN2 is 2.5x (times) faster.
	DFFT (GFLOPS) double/FP64	11.16 [+41%]	7.92	11.21	14.6	With FP64 code, ZEN2 is ~40% faster.
	SNBODY (GFLOPS) float/FP32	612 [+2.2x]	280	557	638	N-Body simulation is vectorised but fewer memory accesses; ZEN2 is over 2x faster.
	DNBODY (GFLOPS) double/FP64	220 [+2x]	113	171	195	With FP64 precision ZEN2 is almost 2x faster.
With highly vectorised SIMD code ZEN2 improves greatly over ZEN2 sometimes managing to be over 2x faster using the same memory.

	Blur (3×3) Filter (MPix/s)	2049 [+42%]	1440	2560	4880	In this vectorised integer workload ZEN2 starts over 40% faster than ZEN+.
	Sharpen (5×5) Filter (MPix/s)	950 [+52%]	627	1000	1920	Same algorithm but more shared data makes ZEN2 over 50% faster.
	Motion-Blur (7×7) Filter (MPix/s)	495 [+52%]	325	519	1000	Again same algorithm but even more data shared still 50% faster
	Edge Detection (2*5×5) Sobel Filter (MPix/s)	826 [+67%]	495	827	1500	Different algorithm but still vectorised workload ZEN2 is almost 70% faster.
	Noise Removal (5×5) Median Filter (MPix/s)	89.68 [+24%]	72.1	78	221	Still vectorised code now ZEN2 drops to just 25% faster.
	Oil Painting Quantise Filter (MPix/s)	25.05 [+5%]	23.9	42.2	66.7	This test has always been tough for Ryzen so ZEN2 does not improve much.
	Diffusion Randomise (XorShift) Filter (MPix/s)	1763 [+76%]	1000	4000	4070	With integer workload, Intel CPUs seem to do much better but ZEN2 is still almost 80% faster.
	Marbling Perlin Noise 2D Filter (MPix/s)	321 [+32%]	243	596	777	In this final test again with integer workload ZEN2 is 32% faster
As we’ve seen before, the new SIMD units are anywhere from 5% (worst-case) to 2x faster than ZEN+/1, a huge performance improvement.

	Aggregate Score (Points)	8,200 [+40%]	5,850	7,930	11,810	Across all benchmarks, ZEN2 is ~40% faster than ZEN+.
Aggregating all the various scores, the result was never in doubt: ZEN2 (3700X) is 40% faster than the old ZEN+ (2700X) that itself improved over the original 1700X.

ZEN2’s 256-bit wide SIMD units are a big upgrade and show their power in every SIMD workload; otherwise there is only minor improvement.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

Executive Summary: For SIMD workloads you really have to upgrade to Ryzen2; otherwise stick with Ryzen+ unless lower power is preferred. 9/10 overall.

The big change in Ryzen2 are the 256-bit wide SIMD units and all vectorised workloads (Multi-Media, Scientific, Image processing, AI/Machine Learning, etc.) using AVX/FMA will greatly benefit – anything between 50-100% which is a significant increase from just one generation to the next.

But for all other workloads (e.g. Financial, legacy, etc.) there is not much improvement over Ryzen+/1 which were already doing very well against competition.

Naturally it all comes at lower TDP (65W vs 95) which may help with overclocking and also lower noise (from the cooling system) and power consumption (if electricity is expensive or you are running it continuously) thus the performance/W(att) is still greatly improved.

Overall the 3700X does represent a decent improvement over the old 2700X (which is no slouch and was a nice upgrade over 1700X due to better Turbo speeds) and should still be usable in older AM4 300/400-series mainboards with just a BIOS upgrade (without PCIe 4.0).

However, while 2700X (and 1700X/1800X) were top-of-the-line, 3700X is just middle-ground, with the new top CPUs being the 3900X and even the 3950X with twice (2x) more cores and thus potentially huge performance rivaling HEDT Threadripper. The goad-posts have thus moved and thus far higher performance can be yours with just upgrading the CPU. The future is bright…