What is “CofeeLake-R” CFL-R?
It is the “refresh” (updated) version of the 8th generation Intel Core architecture (CFL) – itself a minor stepping of the previous 7th generation “KabyLake” (KBL), itself a minor update of the 6th generation “SkyLake” (SKL). While ordinarily this would not be much of an event – this time we do have more significant changes:
- Patched vulnerabilities in hardware: this can help restore I/O workload performance degradation due to OS mitigations
- Kernel Page Table Isolation (KPTI) aka “Meltdown” – Patched in hardware
- L1TF/Foreshadow – Patched in hardware
- (IBPB/IBRS) “Spectre 2” – OS mitigation needed
- Speculative Store Bypass disabling (SSBD) “Spectre 4” – OS mitigation needed
- Increased core counts yet again: CFL-R top-end now has 8 cores, not 6.
Intel CPUs bore the brunt of the vulnerabilities disclosed at the start of 2018 with “Meltdown” operating system mitigations (KVA) likely having the biggest performance impact in I/O workloads. While modern features (e.g. PCID (process context id) acceleration) could help reduce performance impact somewhat on recent architectures (4th gen and newer) the impact can still be significant. The CFL-R hardware fixes (thus not needing KVA) may thus prove very important.
On the desktop we also see increased cores (again!) now up to 8 (thus 16 threads with HyperThreading) – double what KBL and SKL brought and matching AMD.
We also see increased clocks, mainly Turbo, but this still allows 1 or 2 cores to boost clocks higher than CFL could and thus help workloads not massively threaded. This can improve responsiveness as single tasks can be run at top speed when there is little thread utilization.
While rated TDP has not changed, in practice we are likely to see increased “real” power consumption especially due to higher clocks – with Turbo pushing power consumption even higher – close to SKL/KBL-X.
In this article we test CPU Core performance; please see our other articles on:
- Intel Core i7 9900K CofeeLake-R Review & Benchmarks – 8-core/16-thread CPU Performance
- Intel UHD 630 (Core i7 8700K, i9 9900K) – GPGPU Performance
- Intel Core i7 8700K CofeeLake Review & Benchmarks – CPU 6-core/12-thread Performance
We are comparing the top-of-the-range Gen 8 Core i7 (8700K) with previous generation (6700K) and competing architectures with a view to upgrading to a mid-range high performance design.
|CPU Specifications||Intel i9-9900K CofeeLake-R||Intel i7-8700K CofeeLake||AMD Ryzen2 2700X Pinnacle Ridge||Intel i9-7900X SkyLake-X||Comments|
|L1D / L1I Caches||8x 32kB 8-way / 8x 32kB 8-way||6x 32kB 8-way / 6x 32kB 8-way||8x 32kB 8-way / 8x 64kB 8-way||10x 32kB 8-way / 10x 32kB 8-way||No L1D/I changes, Ryzen’s L1I is twice as big.|
|L2 Caches||8x 256kB 4-way||6x 256kB 4-way||8x 512kB 8-way||10x 1MB 16-way||No L2 changes, Ryzen’s L2 is twice as big again.|
|L3 Caches||16MB 16-way||12MB 16-way||2x 8MB 16-way||2x 8MB 16-way||L3 has also increased with no of cores, and now matches Ryzen.|
|TLB 4kB pages
||64 4-way / 64 8-way / 1536 6-way||64 4-way / 64 8-way/ 1536 6-way||64 full-way 1536 8-way||64 4-way / 64 8-way / 1536 6-way||No TLB changes.|
|TLB 2MB pages
||8 full-way / 1536 6-way||8 full-way / 1536 6-way||64 full-way 1536 2-way||8 full-way / 1536 6-way||No TLB changes.|
|Memory Controller Speed (MHz)||1200-5000||1200-4400||1333-2667||1200-2700||The uncore (memory controller) runs at faster clock due to higher rated clock but not a lot in it.|
|Memory Data Speed (MHz)
||3200||3200||2667||3200||CFL/R can easily run at 3200Mt/s while KBL/SKL were not as reliable. We could not get Ryzen past 2667 while it does support 2933.|
|Memory Channels / Width
||2 / 128-bit||2 / 128-bit||2 / 128-bit||2 / 128-bit||All have 128-bit total channel width.|
|Memory Bandwidth (GB/s)
||50||50||42||100||Bandwidth has naturally increased with memory clock speed but latencies are higher.|
|Uncore / Memory Controller Firmware
||2.6.2||2.6.2||We’re on firmware 2.6.x on both.|
|Memory Timing (clocks)
||16-16-16-36 6-52-25-12 2T||16-16-16-36 6-52-25-12 2T||16-17-17-35 7-60-20-10 2T||Timings are very much BIOS dependent and vary a lot.|
We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (AVX2, AVX, etc.). CFL-R supports most modern instruction sets (AVX2, FMA3) but not the latest SKL/KBL-X AVX512 nor a few others like SHA HWA (Atom, Ryzen).
Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.
Environment: Windows 10 x64 (1807), latest drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations.
Spectre / Meltdown Windows Mitigations: all were enabled as per default (BTI enabled, RDCL/KVA enabled, PCID enabled).
CFL-R does not really perform any different cache/memory wise vs. old CFL as the caches and memory controller are unchanged.
SiSoftware Official Ranker Scores
Final Thoughts / Conclusions
CFL-R just adds more cores, thus enjoys higher aggregated L1D/L2 bandiwdths vs CFL but the L3 is still disappointing – especially as now it has to feed 33% more cores/threads (8/16 vs 6/12). Latencies (in clocks) do not change either but as it can clock higher they do decrease in real terms (ns).
The memory controller is the very same (even running same firmware) thus performs the same though now it has to feed 33% more cores/threads (8/16 vs 6/12) thus when all cores/threads are used the aggregated bandwidth falls due to extra contention. In fairness Ryzen2 has the same issue (too many cores/threads for too little bandwidth) thus SKL/KBL-X is where you should be looking for more bandwidth.