AMD Ryzen 2 Mobile (2500U) Vega 8 GP(GPU) Performance

What is “Ryzen2” ZEN+ Mobile?

It is the long-awaited Ryzen2 APU mobile “Bristol Ridge” version of the desktop Ryzen 2 with integrated Vega graphics (the latest GPU architecture from AMD) for mobile devices. While on desktop we had the original Ryzen1/ThreadRipper – there was no (at least released) APU version or a mobile version – leaving only the much older designs that were never competitive against Intel’s ULV and H APUs.

After the very successful launch of the original “Ryzen1”, AMD has been hard at work optimising and improving the design in order to hit TDP (15-35W) range for mobile devices. It has also added the brand-new Vega graphics cores to the APU that have been incredibly performant in the desktop space. Note that mobile versions have a single CCX (compute unit) thus do not require operating system kernel patches for best thread scheduling/power optimisation.

Here’s what AMD says it has done for Ryzen2 mobile:

  • Process technology optimisations (12nm vs 14nm) – lower power but higher frequencies
  • Radeon RX Vega graphics core (DirectX 12.1)
  • Optimised boost (aka Turbo) algorithm – sharing between CPU & GPU cores

In this article we test GP(GPU) integrated graphics performance; please see our other articles on:

Hardware Specifications

We are comparing the graphics units of Ryzen2 mobile with competitive APUs with integrated graphics  to determine whether they are good enough for modest use, especially for compute (GPGPU) use supporting the CPU.

GPGPU Specifications AMD Radeon RX Vega 8 (2500U)
Intel UHD 630 (7200U)
Intel HD Iris 520 (6500U)
Intel HD Iris 540 (6550U)
Comments
Arch Chipset GCN1.5 GT2 / EV9.5 GT2 / EV9 GT3 / EV9 All graphics cores are minor revisions of previous cores with extra functionality.
Cores (CU) / Threads (SP) 8 / 512 24 / 192 24 / 192 48 / 384 Vega has the most SPs though only a few but powerful CUs
ROPs / TMUs 8 / 32 8 / 16 8 / 16 16 / 24 Vega has less ROPs than GT3 but more TMUs.
Speed (Min-Turbo) 300-1100 300-1000 300-1000 300-950 Turbo boost puts Vega in top position power permitting.
Power (TDP) 25-35W 15-25W 15-25W 15-25W TDP is about the same for all though both Ryzen2 and CFL-U have somewhat higher TDP (25W).
Constant Memory 2.7GB 1.6GB 1.6GB 3.2GB There is no dedicated constant memory thus a large chunk is available to use (GB) unlike a dedicated video card with very fast but small (kB).
Shared (Local) Memory 32kB 64kB 64kB 64kB Intel has 2x larger shared/local memory but slow (likely non dedicated) unlike Vega.
Global Memory 2.7 / 3GB 1.6 / 3.2GB 1.6 / 3.2GB 3.2 / 6.4GB About 50% of main memory can be used as global memory – thus pretty large workloads can be run.
Memory System 128-bit DDR4 2400Mt/s 128-bit DDR3L 1866Mt/s 128-bit DDR3L 1866Mt/s 128-bit DDR4 2133MT/s Ryzen2’s memory controller is rated for faster data rates thus should be able to use faster (laptop) memory.
Memory Bandwidth (GB/s)
36 30 30 33 The high data rate of DDR4 can result in higher bandwidth useful for the GPU cores.
L2 Cache ? 512kB 512kB 1MB L2 is comparable to Intel units.
FP64/double ratio Yes, 1/16x Yes, 1/8x Yes, 1/8 Yes, 1/8x FP64 is supported and at good ratio but lower than Intel’s.
FP16/half ratio
Yes, 2x Yes, 2x Yes, 2x Yes, 2x FP16 is also now supported at twice the rate – again unlike gimped dedicated cards.

Processing Performance

We are testing both OpenCL performance using the latest SDK / libraries / drivers from both AMD and competition.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest Intel drivers, OpenCL 2.x. Turbo / Boost was enabled on all configurations.

Processing Benchmarks Intel UHD 630 (7200U) Intel HD Iris 520 (6500U) Intel HD Iris 540 (6550U) AMD Radeon RX Vega 8 (2500U) Comments
GPGPU Arithmetic Benchmark Mandel FP16/Half (Mpix/s) 831 927 1630 2000 [+23%] Thanks to FP16 support we see double the performance over FP32 but Vega is only 23% faster than GT3.
GPGPU Arithmetic Benchmark Mandel FP32/Single (Mpix/s) 476 478 865 1350 [+56%] Vega rules FP32 and is over 50% faster than GT3.
GPGPU Arithmetic Benchmark Mandel FP64/Double (Mpix/s) 113 122 209 111 [-47%] FP64 lower rate makes Vega 1/2 the speed of GT3 and only matching GT2 units.
GPGPU Arithmetic Benchmark Mandel FP128/Quad (Mpix/s) 5.71 6.29 10.78 7.11 [-34%] Emulated FP128 precision depends entirely on FP64 performance thus not a lot changes.
Vega is over 50% faster than Intel’s top-end Iris/GT3 graphics but only in FP32 precision – while it gains from FP16 Intel scales better reducing the lead to just 25% or so. In FP64 precision though it’s relatively low 1/16x ratio means it only ties with GT2 low-end-models while GT3 is 2x (twice) as fast. Pity.
GPGPU Crypto Benchmark Crypto AES-256 (GB/s) 0.858 0.87 1.23 2.58 [+2.1x] No wonder AMD is crypto-king: Vega is over 2x faster than even GT3.
GPGPU Crypto Benchmark Crypto AES-128 (GB/s) 1 1.08 1.52 3.3 [+2.17x] Nothing changes here, Vega is over 2.2x faster.
GPGPU Crypto Benchmark Crypto SHA2-256 (GB/s) 2.72 3 4.7 14.29 [+3x] In this heavy integer workload, Vega is now 3x faster no wonder it’s used for crypto mining.
GPGPU Crypto Benchmark Crypto SHA1 (GB/s) 6 6.64 11.59 18.77 [+62%] SHA1 is less compute intensive allowing Intel to catch up but Vega is still over 60% faster.
GPGPU Crypto Benchmark Crypto SHA2-512 (GB/s) 1.019 1.08 1.86 3.36 [+81%] With 64-bit integer workload, Vega does better and is 80% (almost 2x) faster than GT3.
Nobody will be using integrated graphics for crypto-mining any time soon, but if you needed to (perhaps using encrypted containers, VMs, etc.) then Vega is your choice – even GT3 is left in the dust despite big improvement over low-end GT2. Intel would need at least 2x more cores to be competitive here.
GPGPU Finance Benchmark Black-Scholes half/FP16 (MOPT/s) 1000 1140 1470 1720 [+17%] If 16-bit precision is sufficient for financial work, Vega is 20% faster than GT3.
GPGPU Finance Benchmark Black-Scholes float/FP32 (MOPT/s) 694 697 794 829 [+4%] In this relatively simple FP32 financial workload Vega is just 4% faster than GT3.
GPGPU Finance Benchmark Black-Scholes double/FP64 (MOPT/s) 142 154 281 185 [-33%] Switching to FP64 precision, Vega is 33% slower than GT3.
GPGPU Finance Benchmark Binomial half/FP16 (kOPT/s) 86 95 155 270 [+74%] Switching to 16-bit precision allows Vega to gain over GT3 and is almost 2x faster.
GPGPU Finance Benchmark Binomial float/FP32 (kOPT/s) 92 93 153 254 [+66%] Binomial uses thread shared data thus stresses the internal memory sub-system, and here Vega shows its power – it is 66% faster than GT3.
GPGPU Finance Benchmark Binomial double/FP64 (kOPT/s) 18 18.86 32 15.67 [-51%] With FP64 precision Vega loses again vs. GT3 at 1/2 the speed and just matches GT2 units.
GPGPU Finance Benchmark Monte-Carlo half/FP16 (kOPT/s) 211 236 395 584 [+48%] With 16-bit precision, Vega dominates again and is almost 50% faster than GT3.
GPGPU Finance Benchmark Monte-Carlo float/FP32 (kOPT/s) 223 236 412 362 [-12%] Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure – but Vega somehow loses against GT3.
GPGPU Finance Benchmark Monte-Carlo double/FP64 (kOPT/s) 29.5 33.36 58.7 47.13 [-20%] Switching to FP64 precision as expected Vega is slower.
Financial algorithms perform well on Vega – at least in FP16 & FP32 precision but FP64 is too “gimped” (1/16x FP32 rate) and thus loses against GT3 despite more powerful cores.
GPGPU Science Benchmark HGEMM (GFLOPS) half/FP16 127 140 236 884 [+3.75x] With 16-bit precision Vega runs away with GEMM and is almost 4x faster than GT3.
GPGPU Science Benchmark SGEMM (GFLOPS) float/FP32 105 107 175 214 [+79%] GEMM makes heavy use of shared/local memory which is likely why Vega is 80% faster than GT3.
GPGPU Science Benchmark DGEMM (GFLOPS) double/FP64 38.8 41.69 70 62.6 [-11%] As expected, due to gimped FP64 rate Vega falls behind GT3 but only by just 11%.
GPGPU Science Benchmark HFFT (GFLOPS) half/FP16 34.2 34.7 45.85 61.34 [+34%] 16-bit precision helps reduce memory bandwidth pressure thus Vega is 34% faster.
GPGPU Science Benchmark SFFT (GFLOPS) float/FP32 20.9 21.45 29.69 31.48 [+6%] FFT is memory access bound but Vega does well to beat GT3.
GPGPU Science Benchmark DFFT (GFLOPS) double/FP64 4.3 5.4 6.07 14.19 [+2.34x] Despite the FP64 rate, Vega manages its memory accesses better beating GT3 by over 2x (two times).
GPGPU Science Benchmark HNBODY (GFLOPS) half/FP16 270 284 449 623 [+39%] 16-bit precision still benefits N-Body and here Vega is 40% faster than GT3.
GPGPU Science Benchmark SNBODY (GFLOPS) float/FP32 162 181 291 537 [+85%] Back to FP32 and Vega has a pretty large 85% lead – almost 2x GT3.
GPGPU Science Benchmark DNBODY (GFLOPS) double/FP64 22.73 26.1 43.34 44 [+2%] With FP64 precision, Vega and GT3 are pretty much tied.
Vega performs well on compute heavy scientific algorithms (making heavy use of shared/local memory) and also benefits from half/FP16 to reduce memory bandwidth pressure, but FP64 rate comes back to haunt it where it loses against Intel’s GT3. Pity.
GPGPU Image Processing Blur (3×3) Filter half/FP16 (MPix/s) 888 937 1390 2273 [+64%] With 16-bit precision Vega doubles its lead to 64% over GT3 despite its gain over FP32.
GPGPU Image Processing Blur (3×3) Filter single/FP32 (MPix/s) 461 491 613 781 [+27%] In this 3×3 convolution algorithm, Vega does well but only 30% faster than GT3.
GPGPU Image Processing Sharpen (5×5) Filter half/FP16 (MPix/s) 279 302 409 582 [+42%] Again a huge gain by using FP16, over 40% faster than GT3.
GPGPU Image Processing Sharpen (5×5) Filter single/FP32 (MPix/s) 100 107 144 157 [+9%] Same algorithm but more shared data reduces the gap to 9%.
GPGPU Image Processing Motion Blur (7×7) Filter half/FP16 (MPix/s) 254 272 396 619 [+56%] Large gain again by switching to FP16 with 3x performance over FP32.
GPGPU Image Processing Motion Blur (7×7) Filter single/FP32 (MPix/s) 103 111 156 161 [+3%] With even more shared data the gap falls to just 3%.
GPGPU Image Processing Edge Detection (2*5×5) Sobel Filter half/FP16 (MPix/s) 259 281 363 595 [+64%] Another huge gain and over 3x improvement over FP32.
GPGPU Image Processing Edge Detection (2*5×5) Sobel Filter single/FP32 (MPix/s) 99 106 145 155 [+7%] Still convolution but with 2 filters – the gap is similar to 5×5 – Vega is 7% faster.
GPGPU Image Processing Noise Removal (5×5) Median Filter half/FP16 (MPix/s) 7.39 9.4 8.56 7.688 [-18%] Big gain but not enough to beat GT3 here.
GPGPU Image Processing Noise Removal (5×5) Median Filter single/FP32 (MPix/s) 7 7.57 7.08 4 [-47%] Vega does not like this algorithm (lots of branching causing divergence) and is 1/2 GT3 speed.
GPGPU Image Processing Oil Painting Quantise Filter half/FP16 (MPix/s) 8.55 9.32 9.22 <BSOD> This test would cause BSOD; we are investigating.
GPGPU Image Processing Oil Painting Quantise Filter single/FP32 (MPix/s) 8 8.65 6.77 2.59 [-70%] Vega does not like this algorithms either (complex branching) and neither does GT3.
GPGPU Image Processing Diffusion Randomise (XorShift) Filter half/FP16 (MPix/s) 941 967 1580 2091 [+32%] In order to prevent artifacts most of this test runs in FP32 thus not much gain here.
GPGPU Image Processing Diffusion Randomise (XorShift) Filter single/FP32 (MPix/s) 878 952 1550 2100 [+35%] This algorithm is 64-bit integer heavy allowing Vega 35% better performance over GT3.
GPGPU Image Processing Marbling Perlin Noise 2D Filter half/FP16 (MPix/s) 341 390 343 1046 [+2.5x] Switching to FP16 makes a huge difference to Vega which is over 2x faster.
GPGPU Image Processing Marbling Perlin Noise 2D Filter single/FP32 (MPix/s) 384 425 652 608 [-7%] One of the most complex and largest filters, Vega is a bit slower than GT3 by 7%.
For image processing Vega generally performs well in FP32 beating GT3 hands down; but there are a few algorithms that may need to be optimised for it that don’t perform as well as expected. Switching to FP16 though doubles/triples scores – thus Vega may be starved of memory.

Memory Performance

We are testing both OpenCL performance using the latest SDK / libraries / drivers from both AMD and competition.

Results Interpretation: Higher values (MB/s, etc.) mean better performance. Lower time values (ns, etc.) mean better performance.

Environment: Windows 10 x64, latest Intel drivers, OpenCL 2.x. Turbo / Boost was enabled on all configurations.

Memory Benchmarks Intel UHD 630 (7200U) Intel HD Iris 520 (6500U) Intel HD Iris 540 (6550U) AMD Radeon RX Vega 8 (2500U) Comments
GPGPU Memory Bandwidth Internal Memory Bandwidth (GB/s) 12.17 21.2 24 27.32 [+14%] With higher speed DDR4 memory, Vega has 14% more bandwidth.
GPGPU Memory Bandwidth Upload Bandwidth (GB/s) 6 10.4 11.7 4.74 [-60%] The GPU<>CPU link seems a bit slow here at 1/2 bandwidth of Intel.
GPGPU Memory Bandwidth Download Bandwidth (GB/s) 6 10.5 11.75 5 [-57%] Download bandwidth shows a similar issue, 1/2 bandwidth expected.
All designs have to rely on the shared memory controller and Vega performs as expected with good internal bandwidth due to higher speed DDR4 memory. But – transfer up/down speeds are disappointing possibly due to the driver as “zero-copy” mode should be engaged and working on such transfers (APU mode).
GPGPU Memory Latency Global (In-Page Random Access) Latency (ns) 246 244 288 412 [+49%] Similarly with CPU data latencies, global “in-page/random” (aka “TLB hit”) latencies are a bit high though not by a huge amount.
GPGPU Memory Latency Global (Full Range Random Access) Latency (ns) 365 372 436 519 [+19%] Due to faster memory clock but increased timings “full/random” latencies appear a bit higher.
GPGPU Memory Latency Global (Sequential Access) Latency (ns) 156 158 213 201 [-6%] Sequential access latencies are less than competition by 6%.
GPGPU Memory Latency Constant Memory (In-Page Random Access) Latency (ns) 245 243 252 411 [+63%] None have dedicated constant memory thus we see a similar picture to global memory: somewhat high latencies.
GPGPU Memory Latency Shared Memory (In-Page Random Access) Latency (ns) 82 84 100 22.5 [1/5x] Vega has dedicated shared/local memory and it shows – it’s about 5x faster than Intel’s designs.
GPGPU Memory Latency Texture (In-Page Random Access) Latency (ns) 1152 1157 1500 278 [1/5x] Texture access is also very fast on Vega, with latencies 5x lower (aka 1/5) than Intel’s designs.
GPGPU Memory Latency Texture (Full Range Random Access) Latency (ns) 1178 1162 1533 418 [1/3x] Even full/random accesses are fast, 3x (three times) faster than Intel’s.
GPGPU Memory Latency Texture (Sequential Access) Latency (ns) 1077 1081 1324 122 [1/10x] With sequential access we see a crazy 10x lower latency as if AMD uses prefetchers and Intel does not.
As we’ve seen in Ryzen 2’s data latency tests – “in-page/random” latencies are higher than competition but the rest are comparative, with sequential (prefetched) latencies especially small. But dedicated shared/local memory is far faster (5x) and texture accesses are also very fast (3-5x) which should greatly help algorithms making use of them.
Plotting the global (or constant) memory latencies together we see that the “in-page/random” access latencies should perhaps peak somewhat lower but still nothing close to what we’ve seen in the (CPU) data memory latencies article. It is not very clear (unlike the texture latencies graph) where the caches are located.
The texture latencies graph is far clearer where we can see each level’s caches; unlike the global (or constant) latencies we see “in-page/random” latency peak and hold at a somewhat lower level (4MB).

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

Vega mobile, as its desktop big siblings, is undoubtedly powerful and a good upgrade from the older integrated GPU cores; it also supports modern features like half/FP16 compute (which needs vectorisation what the driver reports as “optimised width”) and relishes complex algorithms making use of shared/local memory which is efficient. However Intel’s GT3 EV9.x can get close to it in some workloads and due to better FP64 ratio (1/8x vs 1/16x) even beat it in most FP64 precision tests which is somewhat disappointing.

Luckily for AMD, GT3 variant is very rare and thus Vega has an easy job defeating GT2 in just about all tests; but it shows that should Intel “get serious” and continue to improve integrated graphics (and CPUs) like they used to do before Skylake (SKL/KBL) – AMD might have more serious competition on its hands.

Note that until recently (2019) Ryzen2 mobile APUs were not supported by AMD’s main drivers (“Adrenalin”) and had to rely on pretty old OEM (HP, etc.) drivers that were somewhat problematic especially with Windows 10 changing every 6 months while the drivers were almost 1 year old. Thankfully this has now changed and users (and us) can benefit from updated, stable and performant drivers.

In any case if you want a laptop/ultraportable with just an APU and no dedicated graphics, then Vega is pretty much your only choice which means a Ryzen2 system. That pretty much means it is worthy of a recommendation.

In a word: Highly Recommended

In this article we test GP(GPU) integrated graphics performance; please see our other articles on:

AMD Ryzen 2 Mobile 2500U Review & Benchmarks – Cache & Memory Performance

What is “Ryzen2” ZEN+ Mobile?

It is the long-awaited Ryzen2 APU mobile “Bristol Ridge” version of the desktop Ryzen 2 with integrated Vega graphics (the latest GPU architecture from AMD) for mobile devices. While on desktop we had the original Ryzen1/ThreadRipper – there was no (at least released) APU version or a mobile version – leaving only the much older designs that were never competitive against Intel’s ULV and H APUs.

After the very successful launch of the original “Ryzen1”, AMD has been hard at work optimising and improving the design in order to hit TDP (15-35W) range for mobile devices. It has also added the brand-new Vega graphics cores to the APU that have been incredibly performant in the desktop space. Note that mobile versions have a single CCX (compute unit) thus do not require operating system kernel patches for best thread scheduling/power optimisation.

Here’s what AMD says it has done for Ryzen2:

  • Process technology optimisations (12nm vs 14nm) – lower power but higher frequencies
  • Improvements for cache & memory speed & latencies (we shall test that ourselves!)
  • Multi-core optimised boost (aka Turbo) algorithm – XFR2 – higher speeds

Why review it now?

With Ryzen3 soon to be released later this year (2019) – with a corresponding Ryzen3 APU mobile – it is good to re-test the platform especially in light of the many BIOS/firmware updates, many video/GPU driver updates and not forgetting the many operating system (Windows) vulnerabilities (“Spectre”) mitigations that have greatly affected performance – sometimes for the good (firmware, drivers, optimisations) sometimes for the bad (mitigations).

In this article we test CPU Cache and Memory performance; please see our other articles on:

Hardware Specifications

We are comparing the top-of-the-range Ryzen2 (2700X, 2600) with previous generation (1700X) and competing architectures with a view to upgrading to a mid-range high performance design.

 

CPU Specifications AMD Ryzen2 2500U Bristol Ridge Intel i7 6500U (Skylake ULV) Intel i7 7500U (Kabylake ULV) Intel i5 8250U (Coffeelake ULV) Comments
L1D / L1I Caches 4x 32kB 8-way / 4x 64kB 4-way 2x 32kB 8-way / 2x 32kB 8-way 2x 32kB 8-way / 2x 32kB 8-way 4x 32kB 8-way / 4x 32kB 8-way Ryzen2 icache is 2x of Intel with matching dcache.
L2 Caches 4x 512kB 8-way 2x 256kB 16-way 2x 256kB 16-way 4x 256kB 16-way Ryzen2 L2 cache is 2x bigger than Intel and thus 4x larger than older SKL/KBL-U.
L3 Caches 4MB 16-way 4MB 16-way 4MB 16-way 6MB 16-way Here CFL-U brings 50% bigger L3 cache (6 vs 4MB) which may help some workloads.
TLB 4kB pages
64 full-way / 1536 8-way 64 8-way / 1536 6-way 64 8-way / 1536 6-way 64 8-way / 1536 6-way No TLB changes.
TLB 2MB pages
64 full-way / 1536 2-way 8 full-way  / 1536 6-way 8 full-way  / 1536 6-way 8 full-way  / 1536 6-way No TLB changes, same as 4kB pages.
Memory Controller Speed (MHz) 600 2600 (400-3100) 2700 (400-3500) 1600 (400-3400) Ryzen2’s memory controller runs at memory clock (MCLK) base rate thus depends on memory installed. Intel’s UNC (uncore) runs between min and max CPU clock thus perhaps faster.
Memory Speed (MHz) Max
1200-2400 (2667) 1033-1866 (2133) 1067-2133 (2400) 1200-2400 (2533) Ryzen2 now supports up to 2667MHz (officially) which should improve its performance quite a bit – unfortunately fast DDR4 is very expensive right now.
Memory Channels / Width
2 / 128-bit 2 / 128-bit 2 / 128-bit 2 / 128-bit All have 128-bit total channel width.
Memory Timing (clocks)
17-17-17-39 8-56-18-9 1T 14-17-17-40 10-57-16-11 2T 15-15-15-36 4-51-17-8 2T 19-19-19-43 5-63-21-9 2T Timings naturally depend on memory which for laptops is somewhat limited and quite expensive.
Memory Controller Firmware
2.1.0 3.6.0 3.6.4 Firmware is the same as on desktop devices.

Core Topology and Testing

As discussed in the previous articles (Ryzen1 and Ryzen2 reviews), cores on Ryzen are grouped in blocks (CCX or compute units) each with its own L3 cache – but connected via a 256-bit bus running at memory controller clock. However – unlike desktop/workstations – so far all Ryzen2 mobile designs have a single (1) CCX thus all the issues that “plagued” the desktop/workstation Ryzen designs do note apply here.

However, AMD could have released higher-core mobile designs to go against Intel’s H-line (beefed to 6-core / 12-threads with CFL-H) that would have likely required 2 CCX blocks. At this time (start 2019) considering that Ryzen3 (mobile) will launch soon that seems unlikely to happen…

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (AVX2, AVX, etc.). Ryzen2 mobile supports all modern instruction sets including AVX2, FMA3 and even more.

Results Interpretation: Higher rate values (GOPS, MB/s, etc.) mean better performance. Lower latencies (ns, ms, etc.) mean better performance.

Environment: Windows 10 x64, latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations.

Native Benchmarks AMD Ryzen2 2500U Bristol Ridge Intel i7 6500U (Skylake ULV) Intel i7 7500U (Kabylake ULV) Intel i5 8250U (Coffeelake ULV) Comments
CPU Multi-Core Benchmark Total Inter-Core Bandwidth – Best (GB/s) 18.65 [-21%] 16.81 18.93 23.65 Ryzen2 L1D is not as wide as Intel’s designs (512-bit) thus inter-core transfers in L1D are 20% slower.
CPU Multi-Core Benchmark Total Inter-Core Bandwidth – Worst (GB/s) 9.29 [=] 6.62 7.4 9.3 Using the unified L3 caches – both Ryzen2 and CFL-U manage the same bandwidths.
CPU Multi-Core Benchmark Inter-Unit Latency – Same Core (ns) 16 [-24%] 21 18 19 Within the same core (share L1D) Ryzen2 has lower latencies by 24% than all Intel CPUs.
CPU Multi-Core Benchmark Inter-Unit Latency – Same Compute Unit (ns) 46 [-23%] 61 54 56 Within the same compute unit (shareL3) Ryzen2 again yields 23% lower latencies.
CPU Multi-Core Benchmark Inter-Unit Latency – Different Compute Unit (ns) n/a n/a n/a n/a With a single CCX we have no latency issues.
While the L1D cache on Ryzen2 is not as wide as on Intel SKL/KBL/CFL-U to yield the same bandwidth (20% lower), both it and L3 manage lower latencies by a relatively large ~25%. With a single CCX design we have none of the issues seen on the desktop/workstation CPUs.
Aggregated L1D Bandwidth (GB/s) 267 [-67%] 315 302 628 Ryzen2’s L1D is just not wide enough – even 2-core SKL/KBL-U have more bandwidth and CFL-U has almost 3x more.
Aggregated L2 Bandwidth (GB/s) 225 [-29%] 119 148 318 The 2x larger L2 caches (512 vs 256kB) perform better but still CFL-U manages 30% more bandwidth.
Aggregated L3 Bandwidth (GB/s) 130 [-31%] 90 95 188 CFL-U not only has 50% bigger L3 (6 vs 4MB) but also somehow manages 30% more bandwidth too while SKL/KBL-U are left in the dust.
Aggregated Memory (GB/s) 24 [=]
21 21 24 With the same memory clock, Ryzen2 ties with CFL-U which means good bandwidth for the cores.
While we saw big improvements on Ryzen2 (desktop) for all caches L1D/L2/L3 – more work needs to be done: in particular the L1D caches are not wide enough compared to Intel’s CPUs – and even L2/L3 need to be wider. Most likely Ryzen3 with native wide 256-bit SIMD (unlike 128-bit as Ryzen1/2) will have twice as wide L1D/L2 that should be sufficient to match Intel.

The memory controller performs well matching CFL-U and is officially rated for higher DDR4 memory – though on laptops the choices are more limited and more expensive.

Data In-Page Random Latency (ns) 91.8 [4-13-32] [+2.75x] 34.6 [3-10-17] 27.6 [4-12-22] 24.5 As on desktop Ryzen1/2 in-page random latencies are large compared to the competition while L1D/L2 are OK but L3 also somewhat large.
Data Full Random Latency (ns) 117 [4-13-32] [-16%] 108 [3-10-27] 84.7 [4-12-33] 139 Out-of-page latencies are not much different which means Ryzen2 is a lot more competitive but still somewhat high.
Data Sequential Latency (ns) 4.1 [4-6-7] [-31%]
5.6 [3-10-11] 6.5 [4-12-13] 5.9 Ryzen’s prefetchers are working well with sequential access with lower latencies than Intel
Ryzen1/2 desktop issues were high memory latencies (in-page/full random) and nothing much changes here. “In-Page/Random pattern” (TLB hit) latencies are almost 3x higher – actually not much lower compared to “Full/Random pattern” (TBL miss) – which are comparable to Intel’s SKL/KBL/CFL. On the other hand “Sequential pattern” yields lower latencies (30% less) than Intel thus simple access patterns work better than complex/random access patterns.
Looking at the data access latencies’ graph for Ryzen2 mobile – we see the “in-page/random” following the “full/random” latencies all the way to 8MB block where they plateau; we would have expected them to plateau at a lower value. See the “code access latencies” graph below.
Code In-Page Random Latency (ns) 17.6 [5-9-25] [+14%] 13.3 [2-9-18] 14.9 [2-11-21] 15.5 Code latencies were not a problem on Ryzen1/2 and they are OK here, 14% higher.
Code Full Random Latency (ns) 108 [5-15-48] [+19%] 91.8 [2-10-38] 90.4 [2-11-45] 91 Out-of-page latency is also competitive and just 20% higher.
Code Sequential Latency (ns) 8.2 [5-13-20] [+37%] 5.9 [2-4-8] 7.8 [2-4-9] 6 Ryzen’s prefetchers are working well with sequential access pattern latency but not as fast as Intel.
Unlike data, code latencies (any pattern) are competitive with Intel though CFL-U does have lower latencies (between 15-20%) but in exchange you get a 2x bigger L1I (64 vs 32kB) which should help complex software.
This graph for code access latencies is what we expected to see for data: “in-page/random” latencies plateau much earlier than “full/random” thus “TLB hit” latencies being much lower than “TLB miss” latencies.
Memory Update Transactional (MTPS) 7.17 [-7%] 6.5 7.72 7.2 As none of Intel’s CPUs have HLE enabled Ryzen2 performs really well with just 7% less transactions/second.
Memory Update Record Only (MTPS) 5.66 [+5%] 4.66 5.25 5.4 With only record updates it manages to be 5% faster.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

We saw good improvement on Ryzen2 (desktop/workstation) but still not enough to beat Intel and a lot more work is needed both on L1/L2 cache bandwidth/widening and memory latency (“in-page” aka “TBL hit” random access pattern) that cannot be improved with firmware/BIOS updates (AGESA firmware). Ryzen2 mobile does have the potential to use faster DDR4 memory (officially rated 2667MHz) thus could overtake Intel using faster memory – but laptop DDR4 SODIMM choice is limited.

Regardless of these differences – the CPU results we’ve seen are solid thus sufficient to recommend Ryzen2 mobile especially when at a much lower cost than competing designs. Even if you do choose Intel – you will be picking up a better design due to Ryzen2 mobile competition – just compare the SKL/KBL-U and CFL/WHL-U results.

We are looking forward to see what improvements Ryzen3 mobile brings to the mobile platform.

In a word: Recommended – with reservations

In this article we tested CPU Cache and Memory performance; please see our other articles on:

AMD Ryzen 2 Mobile 2500U Review & Benchmarks – CPU Performance

What is “Ryzen2” ZEN+ Mobile?

It is the long-awaited Ryzen2 APU mobile “Bristol Ridge” version of the desktop Ryzen 2 with integrated Vega graphics (the latest GPU architecture from AMD) for mobile devices. While on desktop we had the original Ryzen1/ThreadRipper – there was no (at least released) APU version or a mobile version – leaving only the much older designs that were never competitive against Intel’s ULV and H APUs.

After the very successful launch of the original “Ryzen1”, AMD has been hard at work optimising and improving the design in order to hit TDP (15-35W) range for mobile devices. It has also added the brand-new Vega graphics cores to the APU that have been incredibly performant in the desktop space. Note that mobile versions have a single CCX (compute unit) thus do not require operating system kernel patches for best thread scheduling/power optimisation.

Here’s what AMD says it has done for Ryzen2:

  • Process technology optimisations (12nm vs 14nm) – lower power but higher frequencies
  • Improvements for cache & memory speed & latencies (we shall test that ourselves!)
  • Multi-core optimised boost (aka Turbo) algorithm – XFR2 – higher speeds

Why review it now?

With Ryzen3 soon to be released later this year (2019) – with a corresponding Ryzen3 APU mobile – it is good to re-test the platform especially in light of the many BIOS/firmware updates, many video/GPU driver updates and not forgetting the many operating system (Windows) vulnerabilities (“Spectre”) mitigations that have greatly affected performance – sometimes for the good (firmware, drivers, optimisations) sometimes for the bad (mitigations).

In this article we test CPU core performance; please see our other articles on:

Hardware Specifications

We are comparing the top-of-the-range Ryzen2 mobile (2500U) with competing architectures (Intel gen 6, 7, 8) with a view to upgrading to a mid-range but high performance design.

 

CPU Specifications AMD Ryzen2 2500U Bristol Ridge
Intel i7 6500U (Skylake ULV)
Intel i7 7500U (Kabylake ULV)
Intel i5 8250U (Coffeelake ULV)
Comments
Cores (CU) / Threads (SP) 4C / 8T 2C / 4T 2C / 4T 4C / 8T Ryzen has double the cores of ULV Skylake/Kabylake and only recently Intel has caught up by also doubling cores.
Speed (Min / Max / Turbo) 1.6-2.0-3.6GHz (16x-20x-36x) 0.4-2.6-3.1GHz (4x-26x-31x) 0.4-2.7-3.5GHz (4x-27x-35x) 0.4-1.6-3.4GHz (4x-16x-34x) Ryzen2 has higher base and turbo than CFL-U and higher turbo than all Intel competition.
Power (TDP) 25-35W 15-25W 15-25W 25-35W Both Ryzen2 and CFL-U have higher TDP at 25W and turbo up to 35W depending on configuration while older devices were mostly 15W with turbo 20-25W.
L1D / L1I Caches 4x 32kB 8-way / 4x 64kB 4-way 2x 32kB 8-way / 2x 32kB 8-way 2x 32kB 8-way / 2x 32kB 8-way 4x 32kB 8-way / 4x 32kB 8-way Ryzen2 icache is 2x of Intel with matching dcache.
L2 Caches 4x 512kB 8-way 2x 256kB 16-way 2x 256kB 16-way 4x 256kB 16-way Ryzen2 L2 cache is 2x bigger than Intel and thus 4x larger than older SKL/KBL-U.
L3 Caches 4MB 16-way 4MB 16-way 4MB 16-way 6MB 16-way Here CFL-U brings 50% bigger L3 cache (6 vs 4MB) which may help some workloads.
Microcode (Firmware) MU8F1100-0B MU064E03-C6 MU068E09-8E MU068E09-96 On Intel you can see just how many updates the platforms have had – we’re now at CX versions but even Ryzen2 has had a few.

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (AVX2, AVX, etc.). Ryzen supports all modern instruction sets including AVX2, FMA3 and even more like SHA HWA (supported by Intel’s Atom only) but has dropped all AMD’s variations like FMA4 and XOP likely due to low usage.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations.

Native Benchmarks AMD Ryzen2 2500U Bristol Ridge Intel i7 6500U (Skylake ULV) Intel i7 7500U (Kabylake ULV) Intel i5 8250U (Coffeelake ULV) Comments
CPU Arithmetic Benchmark Native Dhrystone Integer (GIPS) 103 [-6%] 52 73 109 Right off Ryzen2 does not beat CFL-U but is very close, soundly beating the older Intel designs.
CPU Arithmetic Benchmark Native Dhrystone Long (GIPS) 102 [-4%] 51 74 106 With a 64-bit integer workload – the difference drops to 4%.
CPU Arithmetic Benchmark Native FP32 (Float) Whetstone (GFLOPS) 79 [+18%] 39 45 67 Somewhat surprisingly, Ryzen2 is almost 20% faster than CFL-U here.
CPU Arithmetic Benchmark Native FP64 (Double) Whetstone (GFLOPS) 67 [+22%] 33 37 55 With FP64 nothing much changes, with Ryzen2 over 20% faster.
You can see why Intel needed to double the cores for ULV: otherwise even top-of-the-line i7 SKL/KBL-U are pounded into dust by Ryzen2. CFL-U does trade blows with it and manages to pull ahead in Dhrystone but Ryzen2 is 20% faster in floating-point. Whatever you choose you can thank AMD for forcing Intel’s hand.
BenchCpuMM Native Integer (Int32) Multi-Media (Mpix/s) 239 [-32%] 183 193 350 In this vectorised AVX2 integer test Ryzen2 starts 30% slower than CFL-U but does beat the older designs.
BenchCpuMM Native Long (Int64) Multi-Media (Mpix/s) 53.4 [-58%] 68.2 75 127 With a 64-bit AVX2 integer vectorised workload, Ryzen2 is even slower.
BenchCpuMM Native Quad-Int (Int128) Multi-Media (Mpix/s) 2.41 [+12%] 1.15 1.12 2.15 This is a tough test using Long integers to emulate Int128 without SIMD; here Ryzen2 has its 1st win by 12% over CFL-U.
BenchCpuMM Native Float/FP32 Multi-Media (Mpix/s) 222 [-20%] 149 159 277 In this floating-point AVX/FMA vectorised test, Ryzen2 is still slower but only by 20%.
BenchCpuMM Native Double/FP64 Multi-Media (Mpix/s) 126 [-22%] 88.3 94.8 163 Switching to FP64 SIMD code, nothing much changes still 20% slower.
BenchCpuMM Native Quad-Float/FP128 Multi-Media (Mpix/s) 6.23 [-16%] 3.79 4.04 7.4 In this heavy algorithm using FP64 to mantissa extend FP128 with AVX2 – Ryzen2 is less than 20% slower.
Just as on desktop, we did not expect AMD’s Ryzen2 mobile to beat 4-core CFL-U (with Intel’s wide SIMD units) and it doesn’t: but it remains very competitive and is just 20% slower. In any case, it soundly beats all older but ex-top-of-the-line i7 SKL/KBL-U thus making them all obsolete at a stroke.
BenchCrypt Crypto AES-256 (GB/s) 10.9 [+1%] 6.29 7.28 10.8 With AES/HWA support all CPUs are memory bandwidth bound – here Ryzen2 ties with CFL-U and soundly beats older versions.
BenchCrypt Crypto AES-128 (GB/s) 10.9 [+1%] 8.84 9.07 10.8 What we saw with AES-256 just repeats with AES-128; Ryzen2 is marginally faster but the improvement is there.
BenchCrypt Crypto SHA2-256 (GB/s) 6.78 [+60%] 2 2.55 4.24 With SHA/HWA Ryzen2 similarly powers through hashing tests leaving Intel in the dust; SHA is still memory bound but Ryzen2 is 60% faster than CFL-U.
BenchCrypt Crypto SHA1 (GB/s) 7.13 [+2%] 3.88 4.07 7.02 Ryzen also accelerates the soon-to-be-defunct SHA1 but CFL-U with AVX2 has caught up.
BenchCrypt Crypto SHA2-512 (GB/s) 1.48 [-44%] 1.47 1.54 2.66 SHA2-512 is not accelerated by SHA/HWA thus Ryzen2 falls behind here.
Ryzen2 mobile (like its desktop brother) gets a boost from SHA/HWA but otherwise ties with CFL-U which is helped by its SIMD units. As before older 2-core i7 SKL/KBL-U are left with no hope and cannot even saturate the memory bandwidth.
BenchFinance Black-Scholes float/FP32 (MOPT/s) 93.3 [-4%] 44.7 49.3 97 In this non-vectorised test we see Ryzen2 matches CFL-U.
BenchFinance Black-Scholes double/FP64 (MOPT/s) 77.8 [-8%] 39 43.3 84.7 Switching to FP64 code, nothing much changes, Ryzen2 is 8% slower.
BenchFinance Binomial float/FP32 (kOPT/s) 35.5 [+61%] 10.4 12.3 22 Binomial uses thread shared data thus stresses the cache & memory system; here the arch(itecture) improvements do show, Ryzen2 is 60% faster than CFL-U.
BenchFinance Binomial double/FP64 (kOPT/s) 19.5 [-7%] 10.1 11.4 21 With FP64 code Ryzen2 drops back from its previous win.
BenchFinance Monte-Carlo float/FP32 (kOPT/s) 20.1 [+1%] 9.24 9.87 19.8 Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches; Ryzen2 cannot match its previous gain.
BenchFinance Monte-Carlo double/FP64 (kOPT/s) 15.3 [-3%] 7.38 7.88 15.8 Switching to FP64 nothing much changes, Ryzen2 matches CFL-U.
Unlike desktop where Ryzen2 is unstoppable, here we are a more mixed result – with CFL-U able to trade blows with it except one test where Ryzen2 is 60% faster. Otherwise CFL-U does manage to be just a bit faster in the other tests but nothing significant.
BenchScience SGEMM (GFLOPS) float/FP32 107 [+16%] 92 76 85 In this tough vectorised AVX2/FMA algorithm Ryzen2 manages to be almost 20% faster than CFL-U.
BenchScience DGEMM (GFLOPS) double/FP64 47.2 [-6%] 44.2 31.7 50.5 With FP64 vectorised code, Ryzen2 drops down to 6% slower.
BenchScience SFFT (GFLOPS) float/FP32 3.75 [-53%] 7.17 7.21 8 FFT is also heavily vectorised (x4 AVX2/FMA) but stresses the memory sub-system more; Ryzen2 does not like it much.
BenchScience DFFT (GFLOPS) double/FP64 4 [-7%] 3.23 3.95 4.3 With FP64 code, Ryzen2 does better and is just 7% slower.
BenchScience SNBODY (GFLOPS) float/FP32 112 [-27%] 96.6 104.9 154 N-Body simulation is vectorised but many memory accesses and not a Ryzen2 favourite.
BenchScience DNBODY (GFLOPS) double/FP64 45.3 [-30%] 29.6 30.64 64.8 With FP64 code nothing much changes.
With highly vectorised SIMD code Ryzen2 remains competitive but finds some algorithms tougher than others. Just as with desktop Ryzen1/2 it may require SIMD code changes for best performance due to its 128-bit units; Ryzen3 with 256-bit units should fix that.
CPU Image Processing Blur (3×3) Filter (MPix/s) 532 [-39%] 418 474 872 In this vectorised integer AVX2 workload Ryzen2 is quite a bit slower than CFL-U.
CPU Image Processing Sharpen (5×5) Filter (MPix/s) 146 [-58%] 168 191 350 Same algorithm but more shared data makes Ryzen2 even slower, 1/2 CFL-U.
CPU Image Processing Motion-Blur (7×7) Filter (MPix/s) 123 [-32%] 87.6 98 181 Again same algorithm but even more data shared reduces the delta to 1/3.
CPU Image Processing Edge Detection (2*5×5) Sobel Filter (MPix/s) 185 [-37%] 136 164 295 Different algorithm but still AVX2 vectorised workload still Ryzen2 is ~35% slower.
CPU Image Processing Noise Removal (5×5) Median Filter (MPix/s) 26.5 [-1%] 13.3 14.4 26.7 Still AVX2 vectorised code but here Ryzen2 ties with CFL-U.
CPU Image Processing Oil Painting Quantise Filter (MPix/s) 9.38 [-38%] 7.21 7.63 15.09 Again we see Ryzen2 fall behind CFL-U.
CPU Image Processing Diffusion Randomise (XorShift) Filter (MPix/s) 660 [-53%] 730 764 1394 With integer AVX2 workload, Ryzen2 falls behind even SKL/KBL-U.
CPU Image Processing Marbling Perlin Noise 2D Filter (MPix/s) 94.1 [-55%] 99.6 105 209 In this final test again with integer AVX2 workload Ryzen2 is 1/2 speed of CFL-U.

With all the modern instruction sets supported (AVX2, FMA, AES and SHA/HWA) Ryzen2 does extremely well in all workloads – and makes all older i7 SKL/KBL-U designs obsolete and unable to compete. As we said – Intel pretty much had to double the number of cores in CFL-U to stay competitive – and it does – but it is all thanks to AMD.

Even then Ryzen2 does beat CFL-U in non-SIMD tests with the latter being helped tremendously by its wide (256-bit) SIMD units and greatly benefits from AVX2/FMA workloads. But Ryzen3 with double-width SIMD units should be much faster and thus greatly beating Intel designs.

Software VM (.Net/Java) Performance

We are testing arithmetic and vectorised performance of software virtual machines (SVM), i.e. Java and .Net. With operating systems – like Windows 10 – favouring SVM applications over “legacy” native, the performance of .Net CLR (and Java JVM) has become far more important.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest drivers. .Net 4.7.x (RyuJit), Java 1.9.x. Turbo / Boost was enabled on all configurations.

VM Benchmarks AMD Ryzen2 2500U Bristol Ridge Intel i7 6500U (Skylake ULV) Intel i7 7500U (Kabylake ULV) Intel i5 8250U (Coffeelake ULV) Comments
BenchDotNetAA .Net Dhrystone Integer (GIPS) 22.7 [+39%] 9.58 12.1 16.36 .Net CLR integer starerts great – Ryzen2 is 40% faster than CFL-U.
BenchDotNetAA .Net Dhrystone Long (GIPS) 22 [+34%] 9.24 12.1 16.4 64-bit integer workloads also favour Ryzen2, still 35% faster.
BenchDotNetAA .Net Whetstone float/FP32 (GFLOPS) 40.5 [+9%] 18.7 22.5 37.1 Floating-Point CLR performance is also good but just about 10% faster than CFL-U.
BenchDotNetAA .Net Whetstone double/FP64 (GFLOPS) 49.6 [+6%] 23.7 28.8 46.8 FP64 performance is also great (CLR seems to promote FP32 to FP64 anyway) with Ryzen2 faster by 6%.
.Net CLR performance was always incredible on Ryzen1 and 2 (desktop/workstation) and here is no exception – all Intel designs are left in the dust with even CFL-U soundly beated by anything between 10-40%.
BenchDotNetMM .Net Integer Vectorised/Multi-Media (MPix/s) 43.23 [+20%] 21.32 25 35 Just as we saw with Dhrystone, this integer workload sees a big 20% improvement for Ryzen2.
BenchDotNetMM .Net Long Vectorised/Multi-Media (MPix/s) 44.71 [+21%] 21.27 26 37 With 64-bit integer workload we see a similar story – 21% better.
BenchDotNetMM .Net Float/FP32 Vectorised/Multi-Media (MPix/s) 137 [+46%] 78.17 94 56 Here we make use of RyuJit’s support for SIMD vectors thus running AVX2/FMA code – Ryzen2 does even better here 50% faster than CFL-U.
BenchDotNetMM .Net Double/FP64 Vectorised/Multi-Media (MPix/s) 75.2 [+45%] 43.59 52 35 Switching to FP64 SIMD vector code – still running AVX2/FMA – we see a similar gain
As before Ryzen2 dominates .Net CLR performance – even when using RyuJit’s SIMD instructions we see big gains of 20-45% over CFL-U.
Java Arithmetic Java Dhrystone Integer (GIPS) 222 [+13%] 119 150 196 We start JVM integer performance with a 13% lead over CFL-U.
Java Arithmetic Java Dhrystone Long (GIPS) 208 [+12%] 101 131 185 Nothing much changes with 64-bit integer workload – Ryzen2 still faster.
Java Arithmetic Java Whetstone float/FP32 (GFLOPS) 50.9 [+9%] 23.13 27.8 46.6 With a floating-point workload Ryzen2 performance improvement drops a bit.
Java Arithmetic Java Whetstone double/FP64 (GFLOPS) 54 [+13%] 23.74 28.7 47.7 With FP64 workload Ryzen2 gets back to 13% faster.
Java JVM performance delta is not as high as .Net but still decent just over 10% over CFL-U similar to what we’ve seen on desktop.
Java Multi-Media Java Integer Vectorised/Multi-Media (MPix/s) 48.74 [+15%] 20.5 24 42.5 Oracle’s JVM does not yet support native vector to SIMD translation like .Net’s CLR but Ryzen2 is still 15% faster.
Java Multi-Media Java Long Vectorised/Multi-Media (MPix/s) 46.75 [+4%] 20.3 24.8 44.8 With 64-bit vectorised workload Ryzen2’s lead drops to 4%.
Java Multi-Media Java Float/FP32 Vectorised/Multi-Media (MPix/s) 38.2 [+9%] 14.59 17.6 35 Switching to floating-point we return to a somewhat expected 9% improvement.
Java Multi-Media Java Double/FP64 Vectorised/Multi-Media (MPix/s) 35.7 [+2%] 14.59 17.4 35 With FP64 workload Ryzen2’s lead somewhat unexplicably drops to 2%.
Java’s lack of vectorised primitives to allow the JVM to use SIMD instruction sets allow Ryzen2 to do well and overtake CFL-U between 2-15%.

Ryzen2 on desktop dominated the .Net and Java benchmarks – and Ryzen2 mobile does not disappoint – it is consistently faster than CFL-U which does not bode well for Intel. If you mainly run .Net and Java apps on your laptop then Ryzen2 is the one to get.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

Ryzen2 was a worthy update on the desktop and Ryzen2 mobile does not disappoint; it instantly obsoleted all older Intel designs (SKL/KBL-U) with only the very latest 4-core ULV (CFL/WHL-U) being able to match it. You can see from the results how AMD forced Intel’s hand to double cores in order to stay competitive.

Even then Ryzen2 manages to beat CFL-U in non-SIMD workloads and remains competitive in SIMD AVX2/FMA workloads (only 20% or so slower) while soundly beating SKL/KBL-U with their 2-cores and wide SIMD units. With soon-to-be-released Ryzen3 with wide SIMD units (256-bit as CFL/WHL-U) – Intel will need AVX512 to stay competitive – however it has its own issues which may be problematic in mobile/ULV space.

Both Ryzen2 mobile and CFL/WHL-U have increased TDP (~25W) in order to manage the increased number of cores (instead of 15W with older 2-core designs) and turbo short-term power as much as 35W. This means while larger 14/15″ designs with good cooling are able to extract top performance – smaller 12/13″ designs are forced to use lower cTDP of 15W (20-25W turbo) thus with lower multi-threaded performance.

Also consider than Ryzen2 is not affected by most “Spectre” vulnerabilities and not by “Meltdown” either thus does not need KVA (kernel pages virtualisation) that greatly impacts I/O workloads. Only the very latest Whiskey-Lake ULV (WHL-U gen 8-refresh) has hardware “Meltdown” fixes – thus there is little point buying CFL-U (gen 8 original) and even less point buying older SKL/KBL-U.

In light of the above – Ryzen2 mobile is a compelling choice especially as it comes at a (much) lower price-point: its competition is really only the very latest WHL-U i5/i7 which do not come cheap – with most vendors still selling CFL-U and even KBL-U inventory. The only issue is the small choice of laptops available with it – hopefully the vendors (Dell, HP, etc.) will continue to release more versions especially with Ryzen 3 mobile.

In a word: Highly Recommended!

Please see our other articles on: