Intel Core Gen10 CometLake ULV (i7-10510U) Review & Benchmarks – CPU Performance

What is “CometLake”?

It is one of the 10th generation Core arch (CML) from Intel – the latest revision of the venerable (6th gen!) “Skylake” (SKL) arch; it succeeds the “WhiskyLake”/”CofeeLake” 8/9-gen current architectures for mobile (ULV U/Y) devices. The “real” 10th generation Core arch is “IceLake” (ICL) that does bring many changes but has not made its mainstream debut yet.

As a result there ar no major updates vs. previous Skylake designs, save increase in core count top end versions and hardware vulnerability mitigations which can still make a big difference:

  • Up to 6C/12T (from 4C/8T WhiskyLake/CoffeeLake or 2C/4T Skylake/KabyLake)
  • Increase Turbo ratios
  • 2-channel LP-DDR4 support and DDR4-2667 (up from 2400)
  • WiFi6 (802.11ax) AX201 integrated (from WiFi5 (802.11ac) 9560)
  • Thunderbolt 3 integrated
  • Hardware fixes/mitigations for vulnerabilities (“Meltdown”, “MDS”, various “Spectre” types)

The 3x (three times) increase in core count (6C/12T vs. Skylake/KabyLake 2C/8T) in the same 15-28W power envelope is pretty significant considering that Core ULV designs since 1st gen have always had 2C/4T; unfortunately it is limited to top-end thus even i7-10510U still has 4C/8T.

LP-DDR4 support is important as many thin & light laptops (e.g. Dell XPS, Lenovo Carbon X1, etc.) have been “stuck” with slow LP-DDR3 memory instead of high-bandwidth DDR4 memory in order to save power. Note the Y-variants (4.5-6W) will not support this.

WiFi is now integrated in the PCH and has been updated to WiFi6/AX (2×2 streams, up to 2400Mbps with 160MHz-wide channel) from WiFi5/AX (1733Mbps); this also means no simple WiFi-card upgrade in the future as with older laptops (except those with “whitelists” like HP, Lenovo, etc.)

Why review it now?

Until “IceLake” makes its public debut, “CometLake” latest ULV APUs from Intel you can buy today; despite being just a revision of “Skylake” due to increased core counts/Turbo ratios they may still prove worthy competitors not just in cost but also performance.

As they contain hardware fixes/mitigations for vulnerabilities discovered since original “Skylake” has launched, the operating system & applications do not need to deploy slower mitigations that can affect performance (especially I/O) on the older designs.

In this article we test CPU core performance; please see our other articles on:

Hardware Specifications

We are comparing the top-of-the-range Intel ULV with competing architectures (gen 8, 7, 6) as well as competiors (AMD) with a view to upgrading to a mid-range but high performance design.

CPU Specifications AMD Ryzen2 2500U Bristol Ridge
Intel i7 7500U (Kabylake ULV)
Intel i7 8550U (Coffeelake ULV)
Intel Core i7 10510U (CometLake ULV)
Comments
Cores (CU) / Threads (SP) 4C / 8T 2C / 4T 4C / 8T 4C / 8T N0 change in cores count on i3/i5/i7.
Speed (Min / Max / Turbo) 1.6-2.0-3.6GHz 0.4-2.7-3.5GHz 0.4-1.8-4.0GHz
(1.8 @ 15W, 2GHz @ 25W)
0.4-1.8-4.9GHz
(1.8GHz @ 15W, 2.3GHz @ 25W)
CML has +22% faster turbo.
Power (TDP) 15-35W 15-25W 15-35W 15-35W Same power envelope.
L1D / L1I Caches 4x 32kB 8-way / 4x 64kB 4-way 2x 32kB 8-way / 2x 32kB 8-way 4x 32kB 8-way / 4x 32kB 8-way 4x 32kB 8-way / 4x 32kB 8-way No L1 changes
L2 Caches 4x 512kB 8-way 2x 256kB 16-way 4x 256kB 16-way 4x 256kB 16-way No L2 changes
L3 Caches 4MB 16-way 4MB 16-way 6MB 16-way 6MB 16-way And no L3 changes
Microcode (Firmware) MU8F1100-0B MU068E09-8E MU068E09-AE MU068E0C-BE Revisions just keep on coming.

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (AVX2, AVX, etc.). “CometLake” (CML) supports all modern instruction sets including AVX2, FMA3 but not AVX512 (like “IceLake”) or SHA HWA (like Atom, Ryzen).

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations.

Native Benchmarks AMD Ryzen2 2500U Bristol Ridge
Intel i7 7500U (Kabylake ULV)
Intel i7 8550U (Coffeelake ULV)
Intel Core i7 10510U (CometLake ULV)
Comments
CPU Arithmetic Benchmark Native Dhrystone Integer (GIPS) 103 73.15 125 134 [+8%] CML starts off 7% faster than CFL a good start.
CPU Arithmetic Benchmark Native Dhrystone Long (GIPS) 102 74.74 115 135 [+17%] With a 64-bit integer workload – increases to 17%.
CPU Arithmetic Benchmark Native FP32 (Float) Whetstone (GFLOPS) 79 45 67.29 84.95 [+26%] With floating-point workload CML is 26% faster!
CPU Arithmetic Benchmark Native FP64 (Double) Whetstone (GFLOPS) 67 37 57 70.63 [+24%] With FP64 we see a similar 24% improvement.
With integer (legacy) workloads, CML-U brings a modest improvement of about 10% over CFL-U, cementing its top position. But with floating-points (also legacy) workloads we see a larger 25% increase which allows it to beat the competition (Ryzen Mobile) that was beating older designs (CFL-U, WHL-U, KBL-U, etc.)
BenchCpuMM Native Integer (Int32) Multi-Media (Mpix/s) 239 193 306 409 [+34%] In this vectorised AVX2 integer test  CML-U is 34% faster than CFL-U.
BenchCpuMM Native Long (Int64) Multi-Media (Mpix/s) 53.4 75 117 149 [+27%] With a 64-bit AVX2 integer workload the difference drops to 27%.
BenchCpuMM Native Quad-Int (Int128) Multi-Media (Mpix/s) 2.41 1.12 2.21 2.54 [+15%] This is a tough test using Long integers to emulate Int128 without SIMD; here CML-U is still 15% faster.
BenchCpuMM Native Float/FP32 Multi-Media (Mpix/s) 222 160 266 328 [+23%] In this floating-point AVX/FMA vectorised test, CML-U is 23% faster.
BenchCpuMM Native Double/FP64 Multi-Media (Mpix/s) 127 94.8 155.9 194.4 [+25%] Switching to FP64 SIMD code, nothing much changes still 20% slower.
BenchCpuMM Native Quad-Float/FP128 Multi-Media (Mpix/s) 6.23 4.04 6.51 8.22 [+26%] In this heavy algorithm using FP64 to mantissa extend FP128 with AVX2 – we see 26% improvement.
With heavily vectorised SIMD workloads CML-U is 25% faster than previous CFL-U that may be sufficient to see future competition from Gen3 Ryzen Mobile with improved (256-bit) SIMD units, something that CFL/WHL-U may not beat. IcyLake (ICL) with AVX512 should improve over this despite lower clocks.
BenchCrypt Crypto AES-256 (GB/s) 10.9 7.28 13.11 12.11 [-8%] With AES/HWA support all CPUs are memory bandwidth bound.
BenchCrypt Crypto AES-128 (GB/s) 10.9 9.07 13.11 12.11 [-8%] No change with AES128.
BenchCrypt Crypto SHA2-256 (GB/s) 6.78 2.55 3.97 4.28 [+8%] Without SHA/HWA Ryzen Mobile beats even CML-U.
BenchCrypt Crypto SHA1 (GB/s) 7.13 4.07 7.19 Less compute intensive SHA1 allows CML-U to catch up.
BenchCrypt Crypto SHA2-512 (GB/s) 1.48 1.54 SHA2-512 is not accelerated by SHA/HWA CML-U does better.
The memory sub-system is crucial here, and CML-U can improve over older designs when using faster memory (which we were not able to use here). Without SHA/HWA supported by Ryzen Mobile, it cannot beat it and improves marginally over older CFL-U.
BenchFinance Black-Scholes float/FP32 (MOPT/s) 93.34 49.34 73.02 With non vectorised CML-U needs to cath up.
BenchFinance Black-Scholes double/FP64 (MOPT/s) 77.86 43.33 75.24 87.17 [+16%] Using FP64 CML-U is 16% faster finally beating Ryzen Mobile.
BenchFinance Binomial float/FP32 (kOPT/s) 35.49 12.3 16.2 Binomial uses thread shared data thus stresses the cache & memory system.
BenchFinance Binomial double/FP64 (kOPT/s) 19.46 11.4 19.31 20.99 [+9%] With FP64 code CML-U is 9% faster than CFL-U.
BenchFinance Monte-Carlo float/FP32 (kOPT/s) 20.11 9.87 14.61 Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches.
BenchFinance Monte-Carlo double/FP64 (kOPT/s) 15.32 7.88 14.54 16.54 [+14%] Switching to FP64 nothing much changes, CML-U is 14% faster.
With non-SIMD financial workloads, CML-U modestly improves (10-15%) over the older CFL-U but this does allow it to beat the competition (Ryzen Mobile) which dominated older CFL-U designs. This may just be enough to match future Gen3 Ryzen Mobile and thus be competitive all-round.
BenchScience SGEMM (GFLOPS) float/FP32 107 76.14 141 158 [+12%] In this tough vectorised AVX2/FMA algorithm CML-U is 12% faster.
BenchScience DGEMM (GFLOPS) double/FP64 47.2 31.71 55 69.2 [+26%] With FP64 vectorised code, CML-U is 26% faster than CFL-U.
BenchScience SFFT (GFLOPS) float/FP32 3.75 7.21 13.23 13.93 [+5%] FFT is also heavily vectorised (x4 AVX2/FMA) but stresses the memory sub-system more.
BenchScience DFFT (GFLOPS) double/FP64 4 3.95 6.53 7.35 [+13%] With FP64 code, CML-U is 13% faster.
BenchScience SNBODY (GFLOPS) float/FP32 112.6 105 160 169 [+6%] N-Body simulation is vectorised but with more memory accesses.
BenchScience DNBODY (GFLOPS) double/FP64 45.3 30.64 57.9 64.16 [+11%] With FP64 code nothing much changes.
With highly vectorised SIMD code (scientific workloads) CML-U is again 15-25% faster than CFL-U which should be enough to match future Gen3 Ryzen Mobile with 256-bit SIMD units. Again we need ICL with AVX512 to bring dominance to these workloads or more cores.
CPU Image Processing Blur (3×3) Filter (MPix/s) 532 474 720 891 [+24%] In this vectorised integer AVX2 workload CML-U is 24% faster.
CPU Image Processing Sharpen (5×5) Filter (MPix/s) 146 191 290 359 [+24%] Same algorithm but more shared data still 24%.
CPU Image Processing Motion-Blur (7×7) Filter (MPix/s) 123 98.3 157 186 [+18%] Again same algorithm but even more data shared reduces improvement to 18%.
CPU Image Processing Edge Detection (2*5×5) Sobel Filter (MPix/s) 185 164 251 302 [+20%] Different algorithm but still AVX2 vectorised workload still 20% faster.
CPU Image Processing Noise Removal (5×5) Median Filter (MPix/s) 26.49 14.38 25.38 27.73 [+9%] Still AVX2 vectorised code but here just 9% faster.
CPU Image Processing Oil Painting Quantise Filter (MPix/s) 9.38 7.63 14.29 15.74 [+10%] Similar improvement here of about 10%.
CPU Image Processing Diffusion Randomise (XorShift) Filter (MPix/s) 660 764 1525 1580 [+4%] With integer AVX2 workload, only 4% improvement.
CPU Image Processing Marbling Perlin Noise 2D Filter (MPix/s) 94,16 105.1 188.8 214 [+13%] In this final test again with integer AVX2 workload CML-U is 13% faster.

Without any new instruction sets (AVX512, SHA/HWA, etc.) support, CML-U was never going to be a revolution in performance and has to rely on clock and very minor improvements/fixes (especially for vulnerabilities) only. Versions with more cores (6C/12T) would certainly help if they can stay within the power limits (TDP/turbo).

Intel themselves did not claim a big performance improvement – still CML-U is 10-25% faster than CFL-U across workloads – at same TDP. At the same cost/power, it is a welcome improvement and it does allow it to beat current competition (Ryzen Mobile) which was nipping at its heels; it may also be enough to match future Gen3 Ryzen Mobile designs.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

For some it may be disappointing we do not have brand-new improved “IceLake” (ICL-U) now rather than a 3-rd revision “Skylake” – but “CometLake” (CML-U) does seem to improve even over the previous revisions (8/9th gen “WhiskyLake”/”CofeeLake” WHL/CFL-U) while due to 2x core count completly outperforming the original (6/7th gen “Skylake”/”KabyLake”) in the same power envelope. Perhaps it also shows how much Intel has had to improve at short notice due to Ryzen Mobile APUs (e.g. 2500U) that finally brought competition to the mobile space.

While owners of 8/9-th gen won’t be upgrading – it is very rare to recommend changing from one generation to another anyway – owners of older hardware can look forward to over 2x performance increase in most workloads for the same power draw, not to mention the additional features (integrated WiFi6, Thunderbolt, etc.).

On the other hand, the competition (AMD Ryzen Mobile) also good performance and older 8/9th gen also offer competitive performance – thus it will all depend on price. With Gen3 Ryzen Mobile on the horizon (with 256-bit SIMD units) “CometLake” may just manage to match it on performance. It may also be worth waiting for “IceLake” to make its debut to see what performance improvements it brings and at what cost – which may also push “CometLake” prices down.

All in all Intel has managed to “squeeze” all it can from the old Skylake arch that while not revolutionary, still has enough to be competitive with current designs – and with future 50% increase core count (6C/12T from 4C/8T) might even beat them not just in cost but also in performance.

In a word: Qualified Recommendation!

Please see our other articles on:

SiSoftware Sandra Titanium (2018) SP4/a/c Update: Retpoline and hardware support

Note: Updated 2019/June with information regarding MDS as well as change of recent CFL-R microcode vulnerability reporting.

We are pleased to release SP4/a/c (version 28.69) update for Sandra Titanium (2018) with the following updates:

Sandra Titanium (2018) Press Release

  • Reporting of Operating System (Windows) speculation control settings for the recently discovered vulnerabilities:
    • Kernel Retpoline mitigation status (for RDCL) in recent Windows 10 / Server 2019 updates
    • Kernel Address Table Import Optimisation (“KATI”) status (as above)
    • L1TFL1 data terminal fault mitigation status
    • MDSMicroarchitectural Data Sampling/”ZombieLoad” mitigation status
  • Hardware Support:
    • AMD Ryzen2 (Matisse), Stoney Ridge support
    • Intel CometLake (CML), CannonLake (CNL), IceLake (ICL) support (based on public information)
  • CPU Benchmarks:
    • Image Processing: SIMD code improvement (SSE2/SSE4/AVX/AVX2-FMA/AVX512)
    • Multi-Media: Lock-up on NUMA systems (e.g. AMD ThreadRipper) thanks to Rob @ TechGage.
  • Memory/Cache Benchmarks
    • Return memory controller firmware version to Ranker
  • GPGPU Benchmarks:
    • CUDA SDK 10.1
    • OpenCL: Processing (Fractals/Mandelbrot) variable vector width based on reported FP16/32/64 optimal SIMD width.
  • Ranker, Price & Information Engines
    • HTTPS (encryption) support for all engines as well as the main website

What is Retpoline?

It is a mitigation against ‘Spectre‘ 2 variant (BTI – Branch Target Injection) that affects just about all CPUs (not just Intel but AMD, ARM, etc.). While ‘Spectre’ does not have the same overall performance impact degradation as ‘Meltdown‘ (RDCL – Rogue Data Cache Load) it can have a sizeable impact on some processors and workloads. At this time no CPUs contain hardware mitigation for Spectre without performance impact.

Retpoline (Return Trampoline) is a faster way to mitigate against it without restricting branch speculation in kernel mode (using IBRS/IBPB) and has recently been added to Linux and now Windows version 1809 builds with KB4482887. Note that it still needs to be enabled in registry via the Mitigation Features Override flags as by default it is not enabled.

What CPUs can Retpoline be used on?

Unfortunately Retpoline is only safe to use on some CPUs: AMD CPUs (though does not engage on Ryzen, see below), Intel Broadwell or older (v5 and earlier) – thus not Skylake (v6 or later).

Windows speculation control settings reporting:

Intel Haswell (Core v4), Broadwell (v5) – Retpoline enabled, KATI enabled
Kernel Retpoline Speculation Control – Enabled

Kernel Address Table Import Optimisation – Enabled

(Note RDCL mitigations KVA, L1TF are also enabled as required)

Intel Skylake (Core v6), Kabylake (v7), Skylake/Kabylake-X (v6x) – no Retpoline, KATI can be enabled
Kernel Retpoline Speculation Control – no

Kernel Address Table Import Optimisation – no/yes (can be enabled)

(Note RDCL mitigations KVA, L1TF are enabled as required)

Intel Coffeelake-R (Core v8r), Whiskeylake/AmberLake (Core v8r), CometLake* – no Retpoline, KATI not enabled
Kernel Retpoline Speculation Control – no

Kernel Address Table Import Optimisation – Enabled

Note 2019/June: Latest microcode (AEh) with MDS vulnerability support cause Windows to report KVA/L1TF mitigations as required despite CPU claiming to not be vulnerable to RDCL.

Intel Atom Braswell (Atom v5), GeminiLake/ApolloLake (Atom v6) – no Retpoline but KATI enabled
Kernel Retpoline Speculation Control – no

Kernel Address Table Import Optimisation – Enabled

(Note RDCL mitigations KVA, L1TF are enabled as required)

AMD Ryzen (Threadripper) 1, 2 – no Retpoline, no KATI
Kernel Retpoline Speculation Control – no (should be usable?)

Kernel Address Table Import Optimisation – no (should be usable)

(Note CPU does not require RDCL mitigation thus no KVA, L1TF required)

From our somewhat limited testing above it seems that:

  • Intel Haswell/Broadwell (Core v4/v5) and perhaps earlier (Ivy Bridge/Sandy Bridge Core v3/v2) users are in luck, Retpoline is enabled and should improve performance; unfortunately RDCL (“Meltdown” mitigation) remains.
  • Intel Coffeelake-R (Core v8r refresh), Whiskylake ULV (v8r) users do benefit a bit more for their investment – while Retpoline is not enabled, KATI is enabled and should help. Not requiring KVA is the biggest gain of CFL-R. 2019/June: latest microcode (AEh) causes Windows to require KVA/L1TF thus negating any benefit CFL-R had over original CFL/KBL/SKL.
  • Intel Skylake (Core v6), Kabylake (v7) and Coffeelake (v8) are not able to benefit from Retpoline but KATI can work on some systems (driver dependent). However, on our Skylake ULV, Skylake-X test systems KATI could not be enabled. We are investigating further.
  • Intel Atom (v4/v5+) users should be able to use Retpoline but it seems it cannot be enabled currently. KATI is enabled.
  • AMD Ryzen (Threadripper) 1/2 users should also be able to use Retpoline but it seems it cannot be enabled currently. While RDCL is not required, mitigations for Spectre v2 are required and should be enabled. We are investigating further.

Reviews using Sandra 2018 SP4:

Update & Download

Commercial version customers can download the free updates from their software distributor; Lite users please download from your favourite download site.

Download Sandra Lite