What is “Vega2”?
It is the code-name of the updated “Vega” GPU arch(itecture), the last of the GCN (graphics core next) arch (version 5.1) shrinked to 7nm before being replaced by the forthcoming “Navi”. Originally for the professional/workstation high-end market, “Vega2″/”big Vega” designed for compute (scientific, machine learning, etc.) workloads was pressed into service to battle the latest 2000-series “Turing”/RTX competition.
As a result it contains many high-end features not normally found on consumer cards:
- 1/4 FP64 rate (instead of 1/16 or worse)
- 16GB HBM2 memory (instead of 8-12)
- 4096-bit HBM2 memory 1TB/s bandwidth (instead of 400-500)
- Int8/Int4 support for AI/ML workloads
- PCIe 4.0 capable but not enabled at this time
See these other articles on GPGPU performance:
- AMD Radeon 5700XT: Navi GPGPU performance in OpenCL
- nVidia Titan V : Volta GPGPU performance in CUDA & OpenCL
- nVidia Titan X: Pascal GPGPU performance in CUDA & OpenCL
Hardware Specifications
We are comparing the middle-range Radeon with previous generation cards and competing architectures with a view to upgrading to a mid-range high performance design.
GPGPU Specifications | AMD Radeon VII (Vega2) | nVidia Titan V (Volta) | nVidia Titan X (Pascal) | AMD Vega 56 (Vega1) | Comments | |
Arch Chipset | Vega 20 / GCN 5.1 | GV100 / 7.0 | GP102 / 6.1 | Vega 10 / GCN 5.0 | A minor revision of Vega1. | |
Cores (CU) / Threads (SP) | 60 / 3840 | 80 / 5120 | 28 / 3584 | 56 / 3584 | More CUs than normal Vega but not 64. | |
SIMD per CU / Width | 4 / 16 | n/a | n/a | 4 / 16 | Naturally same SIMD count and width | |
Wave/Warp Size | 64 | 32 | 32 | 75 | Wave size has always been 2x nVidia. | |
Speed (Min-Turbo) | 1.4 – 1.750 [+21%] | (135-1455) | 1.531 (139-1910) | 1.156 – 1.471 | Base clock is ~20% higher and turbo | |
Power (TDP) | 300W [+42%] | 300W | 250W | 210W | TDP has gone up by 40%. | |
ROP / TMU | 64 / 256 | 96 / 320 | 96 / 224 | 64 / | ROPs and TMUs unchanged | |
Shared Memory |
32kB | 48 / 96 kB | 48 / 96kB | 32kB | No shared memory change. | |
Constant Memory |
8GB | 64kB | 64kB | 4GB | No dedicated constant memory but large. | |
Global Memory | 16GB HBM2 2Gbps 4096-bit | 12GB HBM2 2x850Mbps 3072-bit | 12GB GDDR5X 10Gbps 384-bit | 8GB HBM2 1.89Gbps 2048-bit | 2x as big and 2x as wide HBM a huge improvement. | |
Memory Bandwidth (GB/s) |
1000 [+2.4x] | 652 | 512 | 410 | Still bandwidth is 9% higher. | |
L1 Caches | 16kB x 60 | 96kB x 80 | 48kB x 28 | 16kB x 56 | L1 has not changed. | |
L2 Cache | 4MB | 4.5MB | 3MB | 4MB | L2 has not changed. | |
Maximum Work-group Size |
256 / 1024 | 1024 / 2048 | 1024 / 2048 | 256 / 1024 | Same work-group sizes. | |
FP64/double ratio |
1/4x | 1/2x | 1/32x | 1/16x | Ratio is 4x better than Vega1. | |
FP16/half ratio |
2x | 2x | 1/64x | 2x | Ratio is the same throughout. |
Disclaimer
This is an independent article that has not been endorsed or sponsored by any entity (e.g AMD). All trademarks acknowledged and used for indentification only under fair use.
The article contains only public information available elsewhere on the Internet and not provided under NDA or embargoed. At publication time, the products have not been directly testied by SiSoftware and thus the accuracy of the benchmark scores cannot be verified; however, they appear consistent and do not appear to be false/fake.
Processing Performance
We are testing both OpenCL performance using the latest SDK / libraries / drivers from both AMD and competition.
Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.
Environment: Windows 10 x64, latest AMD and nVidia drivers. Turbo / Boost was enabled on all configurations.
Memory Performance
We are testing both OpenCL performance using the latest SDK / libraries / drivers from AMD and competition.
Results Interpretation: For bandwidth tests (MB/s, etc.) high values mean better performance, for latency tests (ns, etc.) low values mean better performance.
Environment: Windows 10 x64, latest AMD and nVidia. drivers. Turbo / Boost was enabled on all configurations.
SiSoftware Official Ranker Scores
Final Thoughts / Conclusions
Vega2 (“BigVega”) is a big improvement over normal Vega1 and its workstation-class pedigree shows. For FP16/Fp32 workloads though the 30-40% performance improvement may not be worth it considering the much higher price: naturally FP64 performance is almost 4x due to 1/4 FP64 rate though not as good at professional cards with 1/2 rate or Titan competition with similar 1/2 rate.
While the GCN core (rev 5.1) has seen internal updates, there is nothing new that can be supported/optimised for in the compute land thus any code working well on Vega1 should work just as well on Vega2.
The 16GB HBM2 wide memory also helps big workloads with 2x higher bandwidth and also lower latency due to higher clock. For some workloads this alone makes it a definite buy when competition stops at 12GB.
Unfortunately the card has had a limited release at a relatively high price thus value/price ratio depends entirely on your workload – if FP64 with large datasets then it is very much worth it; if FP32/FP16 with datasets that fit in standard 8GB memory then the older Vega1 is much better value and you can even get 2 for the price of the Vega2.
For revolutionary change we need to wait for Navi and its brand new RDNA (Radeon DNA) arch(itecture)…
Disclaimer
This is an independent article that has not been endorsed or sponsored by any entity (e.g AMD). All trademarks acknowledged and used for indentification only under fair use.
The article contains only public information available elsewhere on the Internet and not provided under NDA or embargoed. At publication time, the products have not been directly testied by SiSoftware and thus the accuracy of the benchmark scores cannot be verified; however, they appear consistent and do not appear to be false/fake.