What is “XE” / “TigerLake”?
It is 3rd update of the “next generation” Core (gen 11) architecture (TGL/TigerLake) from Intel the one that replaced the ageing “Skylake (SKL)” arch and its many derivatives that are still with us (“CometLake (CML)”, “RocketLake (RKL)”, etc.). It is the optimisation of the “IceLake (ICL)” arch and thus on update 10nm++ again launched for mobile ULV (U/Y) devices and perhaps for other platforms too.
While not a “revolution” like ICL was, it still contains big changes SoC: CPU, GPU, memory controller:
- 10nm++ process (lower voltage, higher performance benefits)
- Gen12 (XE-LP) graphics (up to 96 EU, similar to discrete DG1 graphics)
- DDR5 / LPDDR5 memory controller support (2 controllers, 2 channels each, 5400Mt/s)
- No eDRAM cache unfortunately (like CrystallWell and co)
- New Image Processing Unit (IPU6) up to 4K90 resolution
- New 2x Media Encoders HEVC 4K60-10b 4:4:4 & 8K30-10b 4:2:0
- PCIe 4.0 (up to 32GB/s with x16 lanes)
While ICL has already greatly upgraded the GP-GPU to gen 11 cores (and more than doubled to 64EU for G7), TGL upgrades them yet again to “XE”-LP gen 12 cores now all the way up to 96EUs. While again most features seem to be geared towards gaming and media (with new image processing and media encoders) – there should be a few new instructions for AI – hopefully provided by a OpenCL extension.
Again there is no FP64 support (!) while FP16 is naturally supported at 2x rate as before. BF16 should also be supported by a future driver. Int32, Int16 performance has reportedly doubled with Int8 now supported and DP4A accelerated.
The new memory controller supports DDR5 / LPDDR5 (5400Mt/s) that should – once memory becomes readily available – provide more bandwidth for the EU cores; until then LPDDR4X can clock even faster (4267Mt/s). There is no mention about eDRAM (L4) cache at all.
We do hope to see more GPGPU-friendly features in upcoming versions now that Intel is taking graphics seriously. Perhaps with the forthcoming DG1 discrete graphics
GPGPU (Xe-LP G7) Performance Benchmarking
In this article we test GPGPU core performance; please see our other articles on:
- CPU
- GPGPU
Hardware Specifications
We are comparing the middle-range Intel integrated GP-GPUs with previous generation, as well as competing architectures with a view to upgrading to a brand-new, high performance, design.
GPGPU Specifications | Intel Iris XE-LP G7 |
Intel XE-LP G1 |
Intel Iris Plus (IceLake) G7 |
AMD Vega 8 (Ryzen5) |
Comments | |
Arch Chipset | EV12 / G7 | EV12 / G1 | EV11 / G7 | GCN1.5 | The first G12 from Intel. | |
Cores (CU) / Threads (SP) | 96 / 768 | 32 / 256 | 64 / 512 | 8 / 512 | 50% more cores vs. G11 | |
SIMD per CU / Width | 8 | 8 | 8 | 64 | Same SIMD width | |
Wave/Warp Size | 32 | 32 | 16/32 | 64 | Wave size matches nVidia | |
Speed (Min-Turbo) |
1.2GHz | 1.15GHz | 1.1GHz | 1.1GHz | Turbo speed has slightly increased. | |
Power (TDP) | 15-35W | 15-35W | 15-35W | 15-35W | Similar power envelope. | |
ROP / TMU | 24 / 48 | 8 / 16 | 16 / 32 | 8 / 32 | ROPs and TMUs have also increased 50%. | |
Shared Memory |
64kB |
64kB | 64kB | 32kB | Same shared memory but 2x Vega. | |
Constant Memory |
3.2GB | 3.2GB | 2.7GB | 3.2GB | No dedicated constant memory but large. | |
Global Memory | 2x LP-DDR4X 4267Mt/s (LPDDR5 5400Mt/s) | 2x LP-DDR4X 4267Mt/s | 2x LP-DDR4X 3733Mt/s | 2x DDR4-2400 | Can support faster (LP)DDR5 in the future. | |
Memory Bandwidth |
42GB/s | 42GB/s | 58GB/s | 42GB/s | Highest (possible) bandwidth ever | |
L1 Caches | 64kB x 6 | 64kB x 2 | 16kB x 8 | 8x 16kB | L1 is much larger. | |
L3 Cache | 3.8MB | ? | 3MB | ? | L3 has modestly increased. | |
Maximum Work-group Size |
256×256 | 256×256 | 256×256 | 1024×1024 | Vega supports 4x bigger workgroups. | |
FP64/double ratio |
No! | No! | No! | Yes, 1/16x | No FP64 support in current drivers! | |
FP16/half ratio |
2x | 2x | 2x | 2x | Same 2x ratio |
Disclaimer
This is an independent article that has not been endorsed or sponsored by any entity (e.g. Intel). All trademarks acknowledged and used for identification only under fair use. Errors and omissions excepted (E&OE).
The article contains only public information available elsewhere on the Internet and not provided under NDA or embargoed. At publication time, the products have not been directly tested by SiSoftware and thus the accuracy of the benchmark scores cannot be verified; however, they appear consistent and do not appear to be false/fake.
Processing Performance
We are testing both OpenCL performance using the latest SDK / libraries / drivers from both Intel and competition.
Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.
Environment: Windows 10 x64, latest Intel and AMD drivers. Turbo / Boost was enabled on all configurations.
Memory Performance
We are testing both OpenCL performance using the latest SDK / libraries / drivers from Intel and competition.
Results Interpretation: For bandwidth tests (MB/s, etc.) high values mean better performance, for latency tests (ns, etc.) low values mean better performance.
Environment: Windows 10 x64, latest Intel and AMD drivers. Turbo / Boost was enabled on all configurations.
SiSoftware Official Ranker Scores
Final Thoughts / Conclusions
Once again Intel seems to be taking graphics seriously: for the 2nd time in a row we have a major graphics upgrade with Xe with big upgrades in EV cores (count), performance and bandwidth. Overall it seems to be 50% faster than EV11 with lower-end devices benefiting most from the upgrade. While the competition was unassailable – Intel has managed to close the gap and overtake.
However, this is still a core aimed at gamers and it does not provide much for GP-GPU; the improved integer performance is very much welcome – 3-times better (!) but few and specific instructions for AI only. Lack of FP64 makes it unsuitable for high-precision financial and scientific workloads; something that the old EV7-9 cores could do reasonably well (all things considered).
For integrated graphics, this is not a problem – not many people would expect ULV GPU core to run compute-heavy workloads; however, the dedicated DG1 card would really be out-spec’d by the competition, with even old, low-end devices providing more features. However, dedicated DG1 is likely to include (some) FP64 units and/or additional units unlike the low-power (LP ULV) integrated versions.
Getting back to ULV, Xe-LP’s performance completely obsoletes devices (e.g. SKL/KBL/WHL/CML-ULV) using the older EV9x cores – unless you really don’t plan on using them except for “business 2D graphics” or displaying the desktop.
If you have not upgraded to ICL yet, TGL is a far better, compelling, proposition that should be your (current) top choice for long-term use. For ICL owners, there is still a lot to upgrade though not as massive as anything released previously.
In a word: Highly Recommended – 8/10!
Please see our other articles on:
- Intel Iris Plus G7 Gen11 IceLake ULV (i7-1065G7) Review & Benchmarks – GPGPU Performance
- Intel Core Gen10 IceLake ULV (i7-1065G7) Review & Benchmarks – CPU AVX512 Performance
Disclaimer
This is an independent article that has not been endorsed or sponsored by any entity (e.g. Intel). All trademarks acknowledged and used for identification only under fair use. Errors and omissions excepted (E&OE).
The article contains only public information available elsewhere on the Internet and not provided under NDA or embargoed. At publication time, the products have not been directly tested by SiSoftware and thus the accuracy of the benchmark scores cannot be verified; however, they appear consistent and do not appear to be false/fake.