What is “CofeeLake” CFL?
The 8th generation Intel Core architecture is code-named “CofeeLake” (CFL): unlike previous architectures, it is a minor stepping of the previous 7th generation “KabyLake” (KBL), itself a minor update of the 6th generation “SkyLake” (SKL). The server/workstation (SKL-X/KBL-X) CPU core saw new instruction set support (AVX512) as well as other improvements – these have not made the transition yet.
Possibly due limited competition (before AMD Ryzen launch), process issues (still at 14nm) and the disclosure of a whole host of hardware vulnerabilities (Spectre, Meltdown, etc.) which required microcode (firmware) updates – performance improvements have not been forthcoming. This is pretty much unprecedented – while some Core updates were only evolutionary we have not had complete stagnation before; in addition the built-in GPU core has also remained pretty much stagnant – we will investigate this in a subsequent article.
However, CFL does bring up a major change – and that is increased core counts both on desktop and mobile: on desktop we go from 4 to 6 cores (+50%) while on mobile (ULV) we go from 2 to 4 (+100%) within the same TDP envelope!
In this article we test CPU Cache and Memory performance; please see our other articles on:
- Intel Core i7 8700K CofeeLake Review & Benchmarks – CPU 6-core/12-thread Performance
- Intel Core i7 8700K CofeeLake Review & Benchmarks – GPGPU (UHD 630) Performance
We are comparing the top-of-the-range Gen 8 Core i7 (8700K) with previous generation (6700K) and competing architectures with a view to upgrading to a mid-range high performance design.
|CPU Specifications||Intel i7-8700K CofeeLake||AMD Ryzen2 2700X Pinnacle Ridge||Intel i9-7900X SkyLake-X||Intel i7-6700K SkyLake||Comments|
|L1D / L1I Caches||6x 32kB 8-way / 6x 32kB 8-way||8x 32kB 8-way / 8x 64kB 8-way||10x 32kB 8-way / 10x 32kB 8-way||4x 32kB 8-way / 4x 32kB 8-way||No L1D/I changes, Ryzen’s L1I is twice as big.|
|L2 Caches||6x 256kB 4-way||8x 512kB 8-way||10x 1MB 16-way||4x 256kB 4-way||No L2 changes, Ryzen’s L2 is twice as big again.|
|L3 Caches||12MB 16-way||2x 8MB 16-way||2x 8MB 16-way||8MB 16-way||L3 has also increased with no of cores, still behind Ryzen’s dual 8MB L3 caches.|
|TLB 4kB pages
||64 4-way / 64 8-way/ 1536 6-way||64 full-way 1536 8-way||64 4-way / 64 8-way / 1536 6-way||64 4-way / 64 8-way / 1536 6-way||No TLB changes.|
|TLB 2MB pages
||8 full-way / 1536 6-way||64 full-way 1536 2-way||8 full-way / 1536 6-way||8 full-way / 1536 6-way||No TLB changes.|
|Memory Controller Speed (MHz)||1200-4400||1333-2667||1200-2700||1200-4000||The uncore (memory controller) runs at faster clock due to higher rated clock but not a lot in it.|
|Memory Data Speed (MHz)
||3200||2667||3200||2533||CFL can easily run at 3200Mt/s while KBL/SKL were not as reliable. We could not get Ryzen past 2667 while it does support 2933.|
|Memory Channels / Width
||2 / 128-bit||2 / 128-bit||2 / 128-bit||2 / 128-bit||All have 128-bit total channel width.|
|Memory Bandwidth (GB/s)
||50||42||100||40||Bandwidth has naturally increased with memory clock speed but latencies are higher.|
|Uncore / Memory Controller Firmware
||2.6.2||126.96.36.199||We’re on firmware 2.6.x vs. 2.0.x on old SKL/KBL.|
|Memory Timing (clocks)
||16-16-16-36 6-52-25-12 2T||16-17-17-35 7-60-20-10 2T||16-18-18-36 5-54-21-10 2T||Timings are very much BIOS dependent and vary a lot.|
We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (AVX2, AVX, etc.). CFL supports most modern instruction sets (AVX2, FMA3) but not the latest SKL/KBL-X AVX512 nor a few others like SHA HWA (Atom, Ryzen).
Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.
Environment: Windows 10 x64 (1807), latest drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations.
Spectre / Meltdown Windows Mitigations: all were enabled as per default (BTI enabled, RDCL/KVA enabled, PCID enabled).
Ryzen2 brings nice updates – good bandwidth increases to all caches L1D/L2/L3 and also well-needed latency reduction for data (and code) accesses. Yes, there is still work to be done to bring the latencies down further – but it may be just enough to beat Intel to 2nd place for a good while.
At the high-end, ThreadRipper2 will likely benefit most as it’s going against many-core SKL-X AVX512-enabled competitor which is a lot “tougher” than the normal SKL/KBL/CFL consumer versions.
SiSoftware Official Ranker Scores
Final Thoughts / Conclusions
CFL’s caches and memory (uncore) sub-systems are unchanged from SKL/KBL and thus provide no surprises, with rock-solid performance at 3200Mt/s with huge bandwidth (needed after all to feed 12 threads) but Ryzen2 has improved a lot over old AMD CPU designs.
With the continuous increase in cores/threads (8/12 in CFL-R) as with Ryzen1/2 but modest DDR4 speed increases (not to mention very high cost), the desktop platforms are likely to see diminishing returns due to core/thread data starvation while the extra cores just cannot be fed by the memory sub-systems. The L2 and L3 caches will need to be improved (widened, larger as with SKL-X) also the now defunct L4/eDRAM cache should re-emerge to mitigate these issues…