What is “DG1” / “Iris Xe”?
It is the gen(eration) 12 graphics introduced with the “TigerLake” (TGL) mobile APUs and will also feature on “RocketLake” (RKL) desktop APUs – but will also be launched as discrete GPU for laptops and even desktops as an add-in card. In effect, Intel is re-entering the discrete graphics market – if we discount “Phi” / “Larabee” GP-GPUs – would be the ancient 740 Graphics accelerator of 1998!
While integrated Intel graphics have stagnated for a long time (e.g. EV7 / “Skylake”) – since “IceLake” Intel has made great strides, first with Gen11 and then with Gen12 / Xe that have (finally) brought big changes:
- 10nm++ process (lower voltage, higher performance benefits)
- Gen12 (Xe-LP) graphics 80 EU (96 EU as here Xe Max)
- LP-DDR4X generally used now up to 4266Mt/s, 128-bit (~66GB/s bandwidth)
- New Image Processing Unit (IPU6) up to 4K90 resolution
- New 2x Media Encoders HEVC 4K60-10b 4:4:4 & 8K30-10b 4:2:0
- PCIe 4.0 (up to 8GB/s with x4 lanes)
While discrete desktop DG1 (and laptop Iris Xe Max Gen12) GPU unit is the same as the integrated part – being discrete it does behave a bit differently:
- It can be clocked higher (1.65GHz) as it has its own 15W (laptop)/25W (desktop) power budget
- Dedicated LP-DDR4X memory and thus bandwidth but no zero-copy data transfers
- Data upload/download through PCIe4 x4 (PCIe3 with current systems)
- Possible fanless due to low-power (25-30W TDP), single-slot
In terms of support, everything is the same, sadly no FP64 which competing low-end discrete graphics do support. FP16 rate is 2x FP32 though and more likely to be used. Int32, Int16 performance has reportedly doubled with Int8 now supported and DP4A accelerated.
GP-GPU DG1 (Iris Xe Max Gen12) Performance Benchmarking
In this article we test GPGPU core performance; please see our other articles on:
- CPU
- Intel Core Gen 11 RocketLake (i7-11700K) Review & Benchmarks – AVX512 Performance
- Intel Core Gen11 TigerLake ULV (i7-1165G7) Review & Benchmarks – CPU AVX512 Performance
- Benchmarks of JCC Erratum Mitigation – Intel CPUs
- Intel Core Gen10 IceLake ULV (i7-1065G7) Review & Benchmarks – CPU AVX512 Performance
- AVX512 Improvement for Icelake Mobile (i7-1065G7 ULV)
- GPGPU
Hardware Specifications
We are comparing the middle-range Intel integrated GP-GPUs with previous generation, as well as competing architectures with a view to upgrading to a brand-new, high performance, design.
GP-GPU |
Intel DG1 (Discrete, Iris Xe Max, 96C) | Intel Iris Xe-LP (Internal, 1165G7, 96C) | Intel Iris Plus LP (Internal, 1065G7, 64C) | nVidia GeForce GT 1030 (Discrete, GP108, 3C) |
Comments | |
Arch / Chipset | EV12 / DG1 | EV12 / G7 (built-in “TigerLake”) | EV11 / G7 (built-in “IceLake”) | GP108-300 “Pascal” | Same GPU as TGL | |
Cores (CU) / Threads (SP) | 96 / 768 | 96 / 768 | 64 / 512 | 3 / 384 | Same no. of CU / SP | |
Tensors (TSX) / Matrix Units (MMA) |
none | none | none | none on Pascal | Sadly no TSX/MMA units | |
SIMD per CU / Width | 8 | 8 | 8 | 128 | Same SIMD width | |
Wave/Warp Size | 32 | 32 | 32 | 32 | Wave size matches nVidia | |
Speed (Base/Turbo) (GHz) |
1.65GHz [+38%] |
1.2GHz | 1.1GHz | 1.227-1.468GHz | Turbo is 38% faster. | |
Power (TDP) | 25-30W (Dedicated) | 15-25W (Shared) | 15-25W (Shared) | 30W (Dedicated) | Similar power envelope. | |
ROP / TMU | 24 / 48 | 24 / 48 | 16 / 32 | 8 / 24 | Same no. ROP / TMUs | |
Shared Memory (kB) |
64kB |
64kB | 64kB | 32kB | Same shared memory but 2x nVidia. | |
Constant Memory (GB) |
1.6GB | 3.2GB | 2.7GB | 64kB | No dedicated constant memory but large. | |
Global Memory Size (GB) | 4GB (Dedicated) |
(Shared) ~50% system memory | (Shared) ~50% system memory | 2GB (Dedicated) | Somewhat small memory. | |
Global Memory Type | LP-DDR4X 128-bit 4267Mt/s [+14%] |
(Shared) LP-DDR4X 128-bit 4267Mt/s | (Shared) LP-DDR4X 128-bit 3733Mt/s | GDDR5 64-bit 6000Mt/s | Memory rate is 14% faster. | |
Memory Bandwidth (GB/s) |
66GB/s [=] | (Shared) 66GB/s | (Shared) 58GB/s | 48GB/s | Same bandwidth but dedicated. | |
L1 Caches | 64kB x 6 | 64kB x 6 | 16kB x 8 | 16kB x3 | Same L1 | |
L3 Cache | 3.8MB | 3.8MB | 3MB | 512kB | Same L3 | |
Maximum Work-group Size |
256×256 | 256×256 | 256×256 | 1024×1024 | nVidia supports bigger workgroups | |
FP64/double ratio |
No! | No! | No! | Yes, 1/64x | No FP64 support | |
FP16/half ratio |
2x | 2x | 2x | 2x | Same 2x ratio | |
OpenCL Suppport |
3.0 | 3.0 | 2.1 | 1.2 | Intel is up to 3.0 while nVidia still on 1.2! | |
Price/RRP (USD) |
Unknown, possibly $70 | $426 (whole APU) | $426 (whole APU) | $80 (at launch, more now due to pandemic) | We will need to see OEM card price |
Disclaimer
This is an independent article that has not been endorsed or sponsored by any entity (e.g. Intel). All trademarks acknowledged and used for identification only under fair use. Errors and omissions excepted (E&OE).
The article contains only public information (available elsewhere on the Internet) and not provided under NDA nor embargoed. At publication time, the products have not been directly tested by SiSoftware and thus the accuracy of the benchmark scores cannot be verified; however, they appear consistent and do not appear to be false/fake.
Processing Performance
We are testing both OpenCL performance using the latest SDK / libraries / drivers from both Intel and competition.
Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.
Environment: Windows 10 x64, latest Intel and nVidia drivers. Turbo / Boost was enabled on all configurations.
SiSoftware Official Ranker Scores
Final Thoughts / Conclusions
Executive Summary: Intel DG1 Xe is a great low-end dedicated GPU. Good Performance, 8/10!
Intel seems to have started to take graphics (GP-GPUs today) seriously: DG1 packs some serious power at the low-end just as we’ve seen when testing “TigerLake” (TGL). Overall, it is 8% faster than integrated Xe-LP of TGL but sometimes as much as 50% in some tests and likely to get faster with future drivers.
Like all low-end dedicated GPUs (even the 1030) the PCIe3 x4 connection makes data uploads/downloads slow – thus judicious use of overlapping compute and transfers is needed to prevent bottlenecks. But DG1 supports PCIe4 – thus future Intel systems (“RocketLake”) that bring PCIe4 support will double transfers which should alleviate the problem.
Lack of native FP64 support is disappointing but likely not useful on low-end GP-GPUs like these; at 1/32x rate on 1030 it is very slow. FP16 rate at 2x is almost 2x faster than FP32 and light-years over the crippled 1/64x rate on 1030. We do hope that future (higher-end) versions will have native FP64 support though that would come in handy for financial workloads and precision modelling.
For HTPC, media/NAS servers (including virtualised), DG1 will make a good choice – e.g. instead of the GT 1030, Quadro NVS – as the Intel drivers are just as reliable as nVidia across operating systems. We’re not taking AAA games here, we’re talking normal desktop/server graphics and media transcoding.
The decoding/encoding/transcoding (QuickSync) supports all the modern features (HEVC/H265, AV1, etc.) and is widely supported – making it ideal upgrade for older systems (e.g. SKL/KBL/WHL/CML) using EV9x or even older graphics. Due to low-power (25W, TDP-down to 15W), DG1 would also make an ideal fanless/quiet card, hopefully low-profile single-slot for those ITX HTPC media clients or servers.
The rumoured price (OEM) is also extremely competitive – especially considering the competition where prices if anything have gone up. This makes DG1 over 2.5x more “price efficient” than say the 1030 we compared it against – a bargain.
Competition, even at low-end, is always welcome and should you find DG1 is not for you – it will at least force competitors (e.g. AMD, nVidia) to release updated dedicated low-end GPUs for those that need them. At the moment the choices (like the 1030) are relatively expensive for what you are getting and DG1 will certainly improve choices.
Please see our other articles on:
- Intel Iris Plus G7 Gen11 IceLake ULV (i7-1065G7) Review & Benchmarks – GPGPU Performance
- Intel Core Gen10 IceLake ULV (i7-1065G7) Review & Benchmarks – CPU AVX512 Performance
Disclaimer
This is an independent article that has not been endorsed or sponsored by any entity (e.g. Intel). All trademarks acknowledged and used for identification only under fair use. Errors and omissions excepted (E&OE).
The article contains only public information (available elsewhere on the Internet) and not provided under NDA nor embargoed. At publication time, the products have not been directly tested by SiSoftware and thus the accuracy of the benchmark scores cannot be verified; however, they appear consistent and do not appear to be false/fake.