What is “CofeeLake” CFL?
The 8th generation Intel Core architecture is code-named “CofeeLake” (CFL): unlike previous architectures, it is a minor stepping of the previous 7th generation “KabyLake” (KBL), itself a minor update of the 6th generation “SkyLake” (SKL). The server/workstation (SKL-X/KBL-X) CPU core saw new instruction set support (AVX512) as well as other improvements – these have not made the transition yet.
Possibly due limited competition (before AMD Ryzen launch), process issues (still at 14nm) and the disclosure of a whole host of hardware vulnerabilities (Spectre, Meltdown, etc.) which required microcode (firmware) updates – performance improvements have not been forthcoming. This is pretty much unprecedented – while some Core updates were only evolutionary we have not had complete stagnation before; in addition the built-in GPU core has also remained pretty much stagnant – we will investigate this in a subsequent article.
However, CFL does bring up a major change – and that is increased core counts both on desktop and mobile: on desktop we go from 4 to 6 cores (+50%) while on mobile (ULV) we go from 2 to 4 (+100%) within the same TDP envelope!
In this article we test CPU Cache and Memory performance; please see our other articles on:
- Intel Core i7 8700K CofeeLake Review & Benchmarks – CPU 6-core/12-thread Performance
- Intel UHD 630 (Core i7 8700K, i9 9900K) – GPGPU Performance
- Intel Core i7 9900K CofeeLake-R Review & Benchmarks – 8-core/16-thread CPU Performance
Hardware Specifications
We are comparing the top-of-the-range Gen 8 Core i7 (8700K) with previous generation (6700K) and competing architectures with a view to upgrading to a mid-range high performance design.
CPU Specifications | Intel i7-8700K CofeeLake | AMD Ryzen2 2700X Pinnacle Ridge | Intel i9-7900X SkyLake-X | Intel i7-6700K SkyLake | Comments | |
L1D / L1I Caches | 6x 32kB 8-way / 6x 32kB 8-way | 8x 32kB 8-way / 8x 64kB 8-way | 10x 32kB 8-way / 10x 32kB 8-way | 4x 32kB 8-way / 4x 32kB 8-way | No L1D/I changes, Ryzen’s L1I is twice as big. | |
L2 Caches | 6x 256kB 4-way | 8x 512kB 8-way | 10x 1MB 16-way | 4x 256kB 4-way | No L2 changes, Ryzen’s L2 is twice as big again. | |
L3 Caches | 12MB 16-way | 2x 8MB 16-way | 2x 8MB 16-way | 8MB 16-way | L3 has also increased with no of cores, still behind Ryzen’s dual 8MB L3 caches. | |
TLB 4kB pages |
64 4-way / 64 8-way/ 1536 6-way | 64 full-way 1536 8-way | 64 4-way / 64 8-way / 1536 6-way | 64 4-way / 64 8-way / 1536 6-way | No TLB changes. | |
TLB 2MB pages |
8 full-way / 1536 6-way | 64 full-way 1536 2-way | 8 full-way / 1536 6-way | 8 full-way / 1536 6-way | No TLB changes. | |
Memory Controller Speed (MHz) | 1200-4400 | 1333-2667 | 1200-2700 | 1200-4000 | The uncore (memory controller) runs at faster clock due to higher rated clock but not a lot in it. | |
Memory Data Speed (MHz) |
3200 | 2667 | 3200 | 2533 | CFL can easily run at 3200Mt/s while KBL/SKL were not as reliable. We could not get Ryzen past 2667 while it does support 2933. | |
Memory Channels / Width |
2 / 128-bit | 2 / 128-bit | 2 / 128-bit | 2 / 128-bit | All have 128-bit total channel width. | |
Memory Bandwidth (GB/s) |
50 | 42 | 100 | 40 | Bandwidth has naturally increased with memory clock speed but latencies are higher. | |
Uncore / Memory Controller Firmware |
2.6.2 | 2.0.0.6 | We’re on firmware 2.6.x vs. 2.0.x on old SKL/KBL. | |||
Memory Timing (clocks) |
16-16-16-36 6-52-25-12 2T | 16-17-17-35 7-60-20-10 2T | 16-18-18-36 5-54-21-10 2T | Timings are very much BIOS dependent and vary a lot. |
Native Performance
We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (AVX2, AVX, etc.). CFL supports most modern instruction sets (AVX2, FMA3) but not the latest SKL/KBL-X AVX512 nor a few others like SHA HWA (Atom, Ryzen).
Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.
Environment: Windows 10 x64 (1807), latest drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations.
Spectre / Meltdown Windows Mitigations: all were enabled as per default (BTI enabled, RDCL/KVA enabled, PCID enabled).
CFL does not bring anything new vs. old KBL/SKL, both caches and memory controller are unchanged. The latter can now (officially) use higher clocked memory thus it does improve in terms of bandwidth/latencies and the uncore can also clock a bit higher but that is it.
SiSoftware Official Ranker Scores
Final Thoughts / Conclusions
CFL’s caches and memory (uncore) sub-systems are unchanged from SKL/KBL and thus provide no surprises, with rock-solid performance at 3200Mt/s with huge bandwidth (needed after all to feed 12 threads) but Ryzen2 has improved a lot over old AMD CPU designs.
With the continuous increase in cores/threads (8/12 in CFL-R) as with Ryzen1/2 but modest DDR4 speed increases (not to mention very high cost), the desktop platforms are likely to see diminishing returns due to core/thread data starvation while the extra cores just cannot be fed by the memory sub-systems. The L2 and L3 caches will need to be improved (widened, larger as with SKL-X) also the now defunct L4/eDRAM cache should re-emerge to mitigate these issues…