Intel Core Gen10 CometLake (i9-10900K) Review & Benchmarks – CPU Performance

Intel Core i9 10th Gen

What is “CometLake”?

It is one of the 10th generation Core arch (CML) from Intel – the latest revision of the venerable (6th gen!) “Skylake” (SKL) arch; it succeeds the “CofeeLake” 8/9-gen current architectures for desktop devices. The “real” 10th generation Core arch is “IceLake” (ICL) that does bring many changes – but has only released on mobile (ULV) devices so far. It is likely Intel will skip it altogether – on the desktop.

As a result there ar no major updates vs. previous “Skylake” (SKL) designs, save increase in core count top end versions and hardware vulnerability mitigations which can still make a big difference:

  • Up to 10C/20T (from 8C/10T “CoffeeLake” or 4C/8T “Skylake”/”KabyLake”)
  • Increase Turbo ratios, base clocks
  • Hyper-Threading (SMT) enabled on all Core SKUs (i9, i7, i5, i3)
  • 2-channel DDR4-2933 (up from 2667)
  • Thunderbolt 3 integrated
  • Hardware fixes/mitigations for vulnerabilities (“Meltdown”, “MDS”, various “Spectre” types)
  • New platform based on LGA1200 socket – thus new motherboards

Unlike CML ULV – we have a modest increase in core count (10C/20T vs. CFL 8C/16T) in the same 125W TDP power envelope – but it is still a big increase vs older designs that have always had 4C/8T. Hyper-Threading is no longer disabled on i7, i5 that – should you wish to keep it enabled – can still provide good performance gains in many applications.

DDR4 official speed support has gone up to 2993Mt/s (46GB/s bandwidth) up from 2667Mt/s which should help feed all those extra cores.

While CFL does mitigate “Meltdown” (CVE-2017-5754 “rogue data cache load”) and reports “not vulnerable” (can be checked with Sandra or similar utility) – due to MDS (to which CFL is vulnerable) recent versions of Windows do consider KVA (“kernel VA shadowing”) required and enable it by default. Thus the relatively large overhead of “Meltdown” mitigation is back. ML does report “not vulnerable” to both “Meltdown” and MDS and thus KVA is not required nor enabled. Hopefully there will be no further vulnerabilities discovered to undo these fixes.

Why review it now?

As “IceLake” (ICL) does not seem to make its public debut on desktop/workstation, “CometLake” (CML) is the latest APU from Intel you can buy today;despite being just a revision of “Skylake” due to increased core counts/Turbo ratios they may still prove worthy competitors not just in cost but also performance.

As per above, the additional hardware fixes/mitigations for vulnerabilities discovered since “Cofeelake” launched – especially “Meltdown” but also “Spectre” variants – the operating system & applications do not need to deploy slower mitigations that can affect performance (especially I/O). For some workloads, this may be worth an upgrade alone!

To compare against the other Gen10 CPU, please see our other articles:

Hardware Specifications

We are comparing the top-of-the-range Intel desktop with competing architectures (gen 8, 7, 6) as well as competiors (AMD) with a view to upgrading to a mid-range but high performance design.

CPU Specifications Intel i9 10900K (CML) Intel i9 9900K (CFL) AMD Ryzen 9 3900X AMD Ryzen 7 3700X Comments
Cores (CU) / Threads (SP) 10C / 20T 8C / 16T 12C / 24T 8C / 16T 25% increase in core count
Speed (Min / Max / Turbo) 1.6-3.7-5.3GHz 1.6-3.6-5GHz 3.8-4.6GHz 3.6-4.4GHz CML has modest Turbo increase.
Power (TDP) 125W 95W 105W 65W 25% increase in TDP
L1D / L1I Caches 10x 32kB 8-way / 10x 32kB 8-way 8x 32kB 8-way / 8x 32kB 8-way 12x 32kB 8-way / 12x 32kB 8-way 8x 32kB 8-way / 8x 32kB 8-way No L1 changes
L2 Caches 10x 256kB 8-way 8x 256kB 16-way 12x 512kB 16-way 8x512kB 16-way No L2 changes
L3 Caches 20MB 16-way 16MB 16-way 4x 16MB 16-way (64MB) 2x 16MB 16-way (32MB) 25% larger L3
Microcode (Firmware) MU06A505-C8 MU069E0C-9E MU8F71000-21 MU8F7100-13 Revisions just keep on coming.

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (AVX2, AVX, etc.). “CometLake” (CML) supports all modern instruction sets including AVX2, FMA3 but not AVX512 (like “IceLake”, “Skylake-X”) or SHA HWA (like Atom, Ryzen).

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest AMD and Intel drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations.

Native Benchmarks Intel i9 10900K (CML) Intel i9 9900K (CFL) AMD Ryzen 9 3900X AMD Ryzen 7 3700X Comments
CPU Arithmetic Benchmark Native Dhrystone Integer (GIPS) 553 [+38%] 400 572 336 CML starts off 38% faster than CFL with 25% more cores.
CPU Arithmetic Benchmark Native Dhrystone Long (GIPS) 537 [+37%] 393 559 339 With a 64-bit integer workload still 37% faster.
CPU Arithmetic Benchmark Native FP32 (Float) Whetstone (GFLOPS) 350 [+48%] 236 338 202 With floating-point workload CML is 48% faster!
CPU Arithmetic Benchmark Native FP64 (Double) Whetstone (GFLOPS) 288 [+47%] 196 292 170 With FP64 we see a similar 47% improvement.
With integer (legacy) workloads, CML is almost 40% faster than CFL – much more than just core increase (+25%). With floating-point we see an ever greater almost 50% improvement! This allows it to get within a whisker of AMD’s 3900X with its 12C.
BenchCpuMM Native Integer (Int32) Multi-Media (Mpix/s) 1,561 [+58%] 985 1,467 1,023 In this vectorised AVX2 integer test CML is ~60% faster than CFL!
BenchCpuMM Native Long (Int64) Multi-Media (Mpix/s) 645 [+56%] 414 552 374 With a 64-bit AVX2 integer workload the difference is similar 56%.
BenchCpuMM Native Quad-Int (Int128) Multi-Media (Mpix/s) 13.3 [+97%] 6.75 15.26 6.54 This is a tough test using Long integers to emulate Int128 without SIMD but CML is almost 2x faster!
BenchCpuMM Native Float/FP32 Multi-Media (Mpix/s) 1,551 [+70%] 914 1,510 1,000 In this floating-point AVX/FMA vectorised test, CML- 70% faster.
BenchCpuMM Native Double/FP64 Multi-Media (Mpix/s) 895 [+67%] 535 931 618 Switching to FP64 SIMD code, nothing much changes still 67% faster.
BenchCpuMM Native Quad-Float/FP128 Multi-Media (Mpix/s) 34.9 [+52%] 23 35.2 24.2 In this heavy algorithm using FP64 to mantissa extend FP128 with AVX2 – we see 52% improvement.
With heavily vectorised SIMD workloads CML improves even more over CFL, once even 2x faster, which again allows it to trade blows with 3900X and its 12C. All the mitigations must weigh heavy on CFL as the large improvement is hard to justify otherwise.
BenchCrypt Crypto AES-256 (GB/s) 20.4 [+16%] 17.6 23.9 18.04 With AES/HWA support all CPUs are memory bandwidth bound.
BenchCrypt Crypto AES-128 (GB/s) 20.4 [+16%] 17.6 24.4 18.76 No change with AES128, CML is 16% faster.
BenchCrypt Crypto SHA2-256 (GB/s) 19.46 [+62%] 12 33.6 24.2 Without SHA/HWA Ryzen beats CML.
BenchCrypt Crypto SHA1 (GB/s) 22.9 34 23 Less compute intensive SHA1 allows CML to catch up.
BenchCrypt Crypto SHA2-512 (GB/s) 9 SHA2-512 is not accelerated by SHA/HWA CML does better.
The memory sub-system is crucial here, and CML improves over CFL with faster memory – the extra cores don’t help. But Ryzen is still faster and with SHA/HWA much faster in hashing than even Intel’s AVX2 SIMD units can muster.
BenchFinance Black-Scholes float/FP32 (MOPT/s) 276 With non vectorised CML needs to catch up.
BenchFinance Black-Scholes double/FP64 (MOPT/s) 377 [+58%] 238 424 257 Using FP64 CML is 58% faster but cannot beat Ryzen
BenchFinance Binomial float/FP32 (kOPT/s) 59.9 Binomial uses thread shared data thus stresses the cache & memory system.
BenchFinance Binomial double/FP64 (kOPT/s) 90.1 [+46%] 61.6 113 64 With FP64 code CML is 46% faster than CFL.
BenchFinance Monte-Carlo float/FP32 (kOPT/s) 56.5 Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches.
BenchFinance Monte-Carlo double/FP64 (kOPT/s) 125 [+2.8x] 44.5 184 110 Switching to FP64 nothing much changes, CML is 2.8x faster.
With non-SIMD financial workloads, CML still improves ~50% over CFL that is a big change; unfortunately AMD’s 3900X is still faster but at least CML remains competitive while CFL was outclassed. ZEN3 will prove a big challenge though.
BenchScience SGEMM (GFLOPS) float/FP32 375 446 263 In this tough vectorised AVX2/FMA algorithm.
BenchScience DGEMM (GFLOPS) double/FP64 216 [+3%] 209 201 193 With FP64 vectorised code, CML is just 3% faster.
BenchScience SFFT (GFLOPS) float/FP32 22.33 25.13 22.78 FFT is also heavily vectorised (x4 AVX2/FMA) but stresses the memory sub-system more.
BenchScience DFFT (GFLOPS) double/FP64 9.11 [-19%] 11.21 18.62 11.16 With FP64 code, Ryzen is king.
BenchScience SNBODY (GFLOPS) float/FP32 557 689 612 N-Body simulation is vectorised but with more memory accesses.
BenchScience DNBODY (GFLOPS) double/FP64 252 [+47%] 171 300 220 With FP64 code CML is ~50% faster
With highly vectorised SIMD code (scientific workloads) CML improvement is variable but it is there; it is likely that subtle improvements must be made in software for some workloads due to the core-contention for many-threaded cores. However, Ryzen 3900X is always faster.
CPU Image Processing Blur (3×3) Filter (MPix/s) 3,823 [+49%] 2,560 3,380 2,564 In this vectorised integer AVX2 workload CML is 50% faster.
CPU Image Processing Sharpen (5×5) Filter (MPix/s) 1,530 [+53%] 1,000 1,612 955 Same algorithm but more shared data, 53% faster.
CPU Image Processing Motion-Blur (7×7) Filter (MPix/s) 781 [+50%] 519 819 492 Again same algorithm but even more data shared still 50% faster.
CPU Image Processing Edge Detection (2*5×5) Sobel Filter (MPix/s) 1,335 [+61%] 827 1,395 832 Different algorithm but still AVX2 vectorised workload now 60% faster.
CPU Image Processing Noise Removal (5×5) Median Filter (MPix/s) 123 [+58%] 78 147 90.45 Still AVX2 vectorised code but here just 58% faster.
CPU Image Processing Oil Painting Quantise Filter (MPix/s) 63 [+49%] 42.2 40.4 25.3 Similar improvement here of about 49%.
CPU Image Processing Diffusion Randomise (XorShift) Filter (MPix/s) 4,145 [+4%] 4,000 1,718 1,763 With integer AVX2 workload, only 4% improvement.
CPU Image Processing Marbling Perlin Noise 2D Filter (MPix/s) 891 [+49%] 596 519 323 In this final test again with integer AVX2 workload CML is 50% faster.
Without any new instruction sets (AVX512, SHA/HWA, etc.) support, CML was never going to be a revolution in performance but again we see it beat CFL by ~50% similar to what we’ve seen in other benchmarks.

Intel themselves did not claim a big performance improvement – possibly as it makes CFL pretty much obsolete, but with slightly more cores and higher clocks/TDP CML can reach Ryzen 3900X levels of performance which is no mean feat. With ZEN3 looking to launch soon, this is not before time.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

For some it may be disappointing we do not have brand-new improved “IceLake” (ICL) now rather than a 3-rd revision “Skylake”, but “CometLake” (CML) does seem to improve even over the previous revisions (8/9th gen /”CofeeLake”CFL) due to modest increase in cores, base/Turbo clocks but perhaps also due to hardware-based vunerabilities mitigations which no longer require costly software versions.

Thus, somewhat surprisingly CML is able to trade blows with the 3900X and its 12C/24T that shows the heavily-revised “Skylake” core can still pack a punch against AMD’s latest and greatest. Naturally we would have preferred 12-cores not 10 but that would likely “eat” even further into Intel’s HEDT platform.

While owners of 8/9-th gen won’t be upgrading – it is very rare to recommend changing from one generation to another anyway – owners of older hardware can look forward to over 2x performance increase in most workloads for the same power draw, not to mention the additional features.

On the other hand, the competition (AMD Ryzen 3000 series) has more cores (12C and more) for great cost and performance – and still compatible with the old (with BIOS update) AM4 socket mainboards! With CML needing a new motherboard (LGA1200) and future “IceLake”-based CPUs possibly needing new motherboards again, CML is very much a stop-gap solution.

All in all Intel has managed to squeeze all it can from the old “Skylake” arch that while not revolutionary, still has enough to be competitive with current designs; while it goes out on a high, it is likely the end-of-the-road for this core.

In a word: Qualified Recommendation

Please see our other articles on: