AMD Ryzen 7 3700X (Zen2) Review & Benchmarks – CPU 8-core/16-thread Performance

What is “ZEN2”?

AMD’s Zen2 (“Matisse”) is the “true” 2nd generation ZEN core on 7nm process shrink while the previous ZEN+ (“Pinnacle Ridge”) core was just an optimisation of the original ZEN (“Summit Ridge”) core that while socket compatible it introduces many design improvements over both previous cores. An APU version (with integrated “Navi” graphics) is scheduled to be launched later.

While new chipsets (500 series) will also be introduced and required to support some new features (PCIe 4.0), with an BIOS/firmware update older boards may support them thus allowing upgrades to existing systems adding more cores and thus performance. [Note: older boards will not be enabled for PCIe 4.0 after all]

The list of changes vs. previous ZEN/ZEN+ is extensive thus performance delta is likely to be very different also:

  • Built around “chiplets” of up to 2 CCX (“core complexes”) each of 4C/8T and 8MB L3 cache (7nm)
  • Central I/O hub with memory controller(s) and PCIe 4.0 bridges connected through IF (“Infinity Fabric”) (12nm)
  • Up to 2 chiplets on desktop platform thus up to 2x2x4C (16C/32T 3950X) (same amount as old ThreadRipper 1950X/2950X)
  • 2x larger L3 cache per CCX thus up to 2x2x16MB (64MB) L3 cache (3900X+)
  • 20 PCIe 4.0 lanes (2x higher transfer rate over PCIe 3.0)
  • 2x DDR4 memory controllers up to 3200Mt/s official (4266Mt/s max)

To upgrade from Ryzen+/Ryzen1 or not?

Micro-architecturally there are more changes that should improve performance:

  • 256-bit (single-op) SIMD units 2x Fmacs (fixing a major deficiency in ZEN/ZEN+ cores)
  • TLB (2nd level) increased (should help out-of-page access latencies that are somewhat high on ZEN/ZEN+)
  • Memory latencies claim to be reduced through higher-speed memory (note all requests go through IF to Central I/O hub with memory controllers)
  • Load/Store 32bytes/cycle (2x ZEN/ZEN+) to keep up with the 256-bit SIMD units (L1D bandwidth should be 2x)
  • L3 cache is 2x ZEN/ZEN+ but higher latency (cache is exclusive)
  • Infinity Fabric is 512-bit (2x ZEN/ZEN+) and can run 1x or 1/2x vs. DRAM clock (when higher than 3733Mt/s)
  • AMD processors have thankfully not been affected by most of the vulnerabilities bar two (BTI/”Spectre”, SSB/”Spectre v4″) that have now been addressed in hardware.
  • HWM-P (hardware performance state management) transitions latencies reduced (ACPI/CPPCv2)

In this article we test CPU core performance; please see our other articles on:

Hardware Specifications

We are comparing the middle-of-the-range Ryzen2 (3700X) with previous generation Ryzen+ (2700X) and competing architectures with a view to upgrading to a mid-range high performance design.

CPU Specifications AMD Ryzen 9 3900X (Matisse)
AMD Ryzen 7 3700X (Matisse) AMD Ryzen 7 2700X (Pinnacle Ridge) Intel i9 9900K (Coffeelake-R) Intel i9 7900X (Skylake-X) Comments
Cores (CU) / Threads (SP) 12C / 24T 8C / 16T 8C / 16T 8C / 16T 10C / 20T Core counts remain the same.
Topology 2 chiplets, each 2 CCX, each 3 cores (1 disabled) (12C) 1 chiplet, 2 CCX, each 4 cores (8C) 2 CCX, each 4 cores (8C) Monolithic die Monolithic die 1 chiplet+1 sio rather than 1 die
Speed (Min / Max / Turbo) 3.8 / 4.6GHz 3.6 / 4.4GHz 3.7 / 4.2GHz 3.6 / 5GHz 3.3 / 4.3GHz 3700x base clock is lower than 2700x but turbo is higher
Power (TDP / Turbo) 105 / 135W 65 / 90W 105 / 135W 95 / 135W 140 / 308W TDP has been greatly reduced vs. ZEN+
L1D / L1I Caches 12x 32kB 8-way / 12x 32kB 8-way 8x 32kB 8-way / 8x 32kB 8-way 8x 32kB 8-way / 8x 64kB 4-way 8x 32kB 8-way / 8x 32kB 8-way 10x 32kB 8-way / 10x 32kB 8-way L1I has been halved but better no. ways
L2 Caches 12x 512kB (6MB) 8-way 8x 512kB (4MB) 8-way 8x 512kB (4MB) 8-way 8x 256kB (2MB) 16-way 10x 1MB (10MB) 16-way No changes to L2
L3 Caches 2x2x 16MB (64MB) 16-way 2x 16MB (32MB) 16-way 2x 8MB (16MB) 16-way 16MB 16-way 13.75MB 11-way L3 is 2x ZEN+
Mitigations for Vulnerabilities BTI/”Spectre”, SSB/”Spectre v4″ hardware BTI/”Spectre”, SSB/”Spectre v4″ hardware BTI/”Spectre”, SSB/”Spectre v4″ software/firmware RDCL/”Meltdown”, L1TF hardware, BTI/”Spectre”, MDS/”Zombieload”, software/firmware RDCL/”Meltdown” , L1TF, BTI/”Spectre”, MDS/”Zombieload”, all software/firmware Ryzen2 addresses the remaining 2 vulnerabilities while Intel was forced to add MDS to its long list…
Microcode MU-8F7100-11 MU-8F7100-11 MU-8F0802-04 MU-069E0C-9E MU-065504-49 The latest microcodes included in the respective BIOS/Windows have been loaded.
SIMD Units 256-bit AVX/FMA3/AVX2 256-bit AVX/FMA3/AVX2 128bit AVX/FMA3/AVX2 256-bit AVX/FMA3/AVX2 512-bit AVX512 ZEN2 SIMD units are 2x wider than ZEN+

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (AVX2, FMA3, AVX, etc.). Ryzen2 supports all modern instruction sets including AVX2, FMA3 and even more like SHA HWA but not AVX-512.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations. All mitigations for vulnerabilities (Meltdown, Spectre, L1TF, MDS, etc.) were enabled as per Windows default where applicable.

Native Benchmarks AMD Ryzen 7 3700X (Matisse)
AMD Ryzen 7 2700X (Pinnacle Ridge)
Intel i9 9900K (Coffeelake-R)
Intel i9 7900X (Skylake-X)
Comments
CPU Arithmetic Benchmark Native Dhrystone Integer (GIPS) 336 [=] 334 400 485 We start with no improvement over ZEN+
CPU Arithmetic Benchmark Native Dhrystone Long (GIPS) 339 [=] 335 393 485 With a 64-bit integer workload nothing much changes.
CPU Arithmetic Benchmark Native FP32 (Float) Whetstone (GFLOPS) 202 [+2%] 198 236 262 Floating-point performance does not change delta either – only 2% faster
CPU Arithmetic Benchmark Native FP64 (Double) Whetstone (GFLOPS) 170 [=] 169 196 223 With FP64 nothing much changes again.
In the legacy integer/floating-point benchmarks ZEN2 is not any faster than ZEN+ despite the change in clocks. Perhaps future microcode updates will help?
BenchCpuMM Native Integer (Int32) Multi-Media (Mpix/s) 1023 [+78%] 574 985 1590 ZEN2 is ~80% faster than ZEN+ despite what we’ve seen before.
BenchCpuMM Native Long (Int64) Multi-Media (Mpix/s) 374 [+2x] 187 414 581 With a 64-bit AVX2 integer vectorised workload, ZEN2 is now 2x faster.
BenchCpuMM Native Quad-Int (Int128) Multi-Media (Mpix/s) 6.56 [+13%] 5.8 6.75 7.56 This is a tough test using Long integers to emulate Int128 without SIMD; here ZEN2 is still 13% faster.
BenchCpuMM Native Float/FP32 Multi-Media (Mpix/s) 100 [+68%] 596 914 1760 In this floating-point AVX/FMA vectorised test, ZEN2 is ~70% faster.
BenchCpuMM Native Double/FP64 Multi-Media (Mpix/s) 618 [+84%] 335 535 533 Switching to FP64 SIMD code, ZEN2 is now ~90% faster than ZEN+
BenchCpuMM Native Quad-Float/FP128 Multi-Media (Mpix/s) 24.22 [+55%] 15.6 23 40.3 In this heavy algorithm using FP64 to mantissa extend FP128, ZEN2 is still 55% faster
With its brand-new 256-bit SIMD units, ZEN2 is anywhere from 55% to 100% faster than ZEN+/ZEN1 a huge upgrade from one generation to the next. For SIMD loads upgrading to ZEN2 gives a huge performance uplift.
BenchCrypt Crypto AES-256 (GB/s) 18 [+12%] 16.1 17.63 23 With AES/HWA support all CPUs are memory bandwidth bound  but ZEN2 manages a 12% improvement.
BenchCrypt Crypto AES-128 (GB/s) 18.76 [+17%] 16.1 17.61 23 What we saw with AES-256 just repeats with AES-128; ZEN2 is now 17% faster.
BenchCrypt Crypto SHA2-256 (GB/s) 20.21 [+9%] 18.6 12 26 With SHA/HWA ZEN2 similarly powers through hashing tests leaving Intel in the dust – and is still ~10% faster than ZEN+
BenchCrypt Crypto SHA1 (GB/s) 20.41 [+6%] 19.3 22.9 38 The less compute-intensive SHA1 does not change things due to acceleration.
BenchCrypt Crypto SHA2-512 (GB/s) 3.77 9 21
ZEN2 with AES/SHA HWA is memory bound like all other CPUs, but it still manages 6-17% better performance than ZEN+ using the same memory. But as ZEN2 is rated for faster memory – using such memory would greatly improve the results.
BenchFinance Black-Scholes float/FP32 (MOPT/s) 257 276 309
BenchFinance Black-Scholes double/FP64 (MOPT/s) 229 [+5%] 219 238 277 Switching to FP64 code, ZEN2 is just 5% faster.
BenchFinance Binomial float/FP32 (kOPT/s) 107 59.9 70.5 Binomial uses thread shared data thus stresses the cache & memory system;
BenchFinance Binomial double/FP64 (kOPT/s) 57.98 [-4%] 60.6 61.6 68 With FP64 code ZEN2 is 4% slower.
BenchFinance Monte-Carlo float/FP32 (kOPT/s) 54.2 56.5 63 Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches;
BenchFinance Monte-Carlo double/FP64 (kOPT/s) 46.34 [+13%] 41 44.5 50.5 Switching to FP64 nothing much changes, ZEN2 is 13% faster.
Ryzen always did well on non-SIMD floating-point algorithms and here it does not disappoint: ZEN2 does not improve much and is pretty much tied with ZEN+ – thus for non SIMD workloads you might as well stick with the older versions.
BenchScience SGEMM (GFLOPS) float/FP32 263 [-12%] 300 375 413 In this tough vectorised algorithm ZEN2 is strangely slower.
BenchScience DGEMM (GFLOPS) double/FP64 193 [+63%] 119 209 212 With FP64 vectorised code, ZEN2 comes back to be over 60% faster.
BenchScience SFFT (GFLOPS) float/FP32 22.78 [+2.5x] 9 22.33 28.6 FFT is also heavily vectorised but stresses the memory sub-system more; ZEN2 is 2.5x (times) faster.
BenchScience DFFT (GFLOPS) double/FP64 11.16 [+41%] 7.92 11.21 14.6 With FP64 code, ZEN2 is ~40% faster.
BenchScience SNBODY (GFLOPS) float/FP32 612 [+2.2x] 280 557 638 N-Body simulation is vectorised but fewer memory accesses; ZEN2 is over 2x faster.
BenchScience DNBODY (GFLOPS) double/FP64 220 [+2x] 113 171 195 With FP64 precision ZEN2 is almost 2x faster.
With highly vectorised SIMD code ZEN2 improves greatly over ZEN2 sometimes managing to be over 2x faster using the same memory.
CPU Image Processing Blur (3×3) Filter (MPix/s) 2049 [+42%] 1440 2560 4880 In this vectorised integer workload ZEN2 starts over 40% faster than ZEN+.
CPU Image Processing Sharpen (5×5) Filter (MPix/s) 950 [+52%] 627 1000 1920 Same algorithm but more shared data makes ZEN2 over 50% faster.
CPU Image Processing Motion-Blur (7×7) Filter (MPix/s) 495 [+52%] 325 519 1000 Again same algorithm but even more data shared still 50% faster
CPU Image Processing Edge Detection (2*5×5) Sobel Filter (MPix/s) 826 [+67%] 495 827 1500 Different algorithm but still vectorised workload ZEN2 is almost 70% faster.
CPU Image Processing Noise Removal (5×5) Median Filter (MPix/s) 89.68 [+24%] 72.1 78 221 Still vectorised code now ZEN2 drops to just 25% faster.
CPU Image Processing Oil Painting Quantise Filter (MPix/s) 25.05 [+5%] 23.9 42.2 66.7 This test has always been tough for Ryzen so ZEN2 does not improve much.
CPU Image Processing Diffusion Randomise (XorShift) Filter (MPix/s) 1763 [+76%] 1000 4000 4070 With integer workload, Intel CPUs seem to do much better but ZEN2 is still almost 80% faster.
CPU Image Processing Marbling Perlin Noise 2D Filter (MPix/s) 321 [+32%] 243 596 777 In this final test again with integer workload ZEN2 is 32% faster
As we’ve seen before, the new SIMD units are anywhere from 5% (worst-case) to 2x faster than ZEN+/1, a huge performance improvement.
Aggregate Score (Points) 8,200 [+40%] 5,850 7,930 11,810 Across all benchmarks, ZEN2 is ~40% faster than ZEN+.
Aggregating all the various scores, the result was never in doubt: ZEN2 (3700X) is 40% faster than the old ZEN+ (2700X) that itself improved over the original 1700X.

ZEN2’s 256-bit wide SIMD units are a big upgrade and show their power in every SIMD workload; otherwise there is only minor improvement.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

Executive Summary: For SIMD workloads you really have to upgrade to Ryzen2; otherwise stick with Ryzen+ unless lower power is preferred. 9/10 overall.

The big change in Ryzen2 are the 256-bit wide SIMD units and all vectorised workloads (Multi-Media, Scientific, Image processing, AI/Machine Learning, etc.) using AVX/FMA will greatly benefit – anything between 50-100% which is a significant increase from just one generation to the next.

But for all other workloads (e.g. Financial, legacy, etc.) there is not much improvement over Ryzen+/1 which were already doing very well against competition.

Naturally it all comes at lower TDP (65W vs 95) which may help with overclocking and also lower noise (from the cooling system) and power consumption (if electricity is expensive or you are running it continuously) thus the performance/W(att) is still greatly improved.

Overall the 3700X does represent a decent improvement over the old 2700X (which is no slouch and was a nice upgrade over 1700X due to better Turbo speeds) and should still be usable in older AM4 300/400-series mainboards with just a BIOS upgrade (without PCIe 4.0).

However, while 2700X (and 1700X/1800X) were top-of-the-line, 3700X is just middle-ground, with the new top CPUs being the 3900X and even the 3950X with twice (2x) more cores and thus potentially huge performance rivaling HEDT Threadripper. The goad-posts have thus moved and thus far higher performance can be yours with just upgrading the CPU. The future is bright…

Tagged , , . Bookmark the permalink.

Comments are closed.