What is “Navi”?
It is the code-name of the new AMD GPU, the first of the brand-new RDNA (Radeon DNA) GPU arch(itecture) – replacing the “Vega” that was the last of the GCN (graphics core next) arch(itecture). It is a mid-range GPU optimised for gaming thus not expected to set records, but GPUs today are used for many other tasks (mining, encoding, algorithm/compute acceleration, etc.) as well.
RDNA arch brings big changes from the various GCN revisions we’ve seen previously, but its first iteration here does not bring any major new features at least in the compute domain. Hopefully the next versions will bring tensor units (matrix multiplicators) and other accelerated instruction sets and so on.
See these other articles on GPGPU performance:
We are comparing the middle-range Radeon with previous generation cards and competing architectures with a view to upgrading to a mid-range high performance design.
|GPGPU Specifications||AMD Radeon 5700XT (Navi)||AMD Radeon VII (Vega2)||nVidia Titan X (Pascal)||AMD Radeon 56 (Vega1)||Comments|
|Arch Chipset||RDNA / Navi 10||GCN5.1 / Vega 20||Pascal / GP102||GCN5.0 / Vega 10||The first of the Navi chips.|
|Cores (CU) / Threads (SP)||40 / 2560||60 / 3840||28 / 3584||56 / 3584||Less CUs than Vega1 and same (64x) SP per CU.|
|SIMD per CU / Width||2 / 32 [2x]||4 / 16||–||4 / 16||Navi increases the SIMD width but decreases counts.|
|Wave/Warp Size||32 [1/2x]||64||32||64||Wave size is reduced to match nVidia.|
|Speed (Min-Turbo)||1.6 / 1.755||1.4 / 1.75||1.531 / 1.91||1.156 / 1.471||40% faster base and 20% turbo than Vega 1.|
|Power (TDP)||225W||295W||250W||210W||Slightly higher TDP but nothing significant|
|ROP / TMU||64 / 160||64 / 240||96 / 224||64 / 224||ROPs are the same but we see ~30% less TMUs.|
||32kB||48kB / 96kB per SM||32kB||We have 2x more shared memory allowing bigger kernels.|
||4GB||8GB||64kB dedicated||4GB||No dedicated constant memory but large.|
|Global Memory||8GB GDDR6 14Gt/s 256-bit||16GB HBM2 1Gt/s 4096-bit||12GB GDDR5X 10Gt/s 384-bit||8GB HBM2 900Gt/s 4096-bit||Sadly no HBM this time but the faster but not very wide.|
|Memory Bandwidth (GB/s)
||448GB/s [+9%]||1024GB/s||512GB/s||410GB/s||Still bandwidth is 9% higher.|
|L1 Caches||? x40||16kB x60||48kB x28||16kB x56||L1 does not appear changed but unclear.|
|L2 Cache||4MB||4MB||3MB||4MB||L2 has not changed.|
|Maximum Work-group Size
||1024 / 1024||256 / 1024||1024 / 2048 per SM||256 / 1024||AMD has unlocked work-group sizes to 4x.|
||1/16x||1/4x||1/32x||1/16x||Ratio is same as consumer Vega1 rather than pro Vega2.|
||2x||2x||1/64x||2x||Ratio is the same throughout.|
We are testing both OpenCL performance using the latest SDK / libraries / drivers from both AMD and competition.
Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.
Environment: Windows 10 x64, latest AMD and nVidia drivers. Turbo / Boost was enabled on all configurations.
We are testing both OpenCL performance using the latest SDK / libraries / drivers from AMD and competition.
Results Interpretation: For bandwidth tests (MB/s, etc.) high values mean better performance, for latency tests (ns, etc.) low values mean better performance.
Environment: Windows 10 x64, latest AMD and nVidia. drivers. Turbo / Boost was enabled on all configurations.
SiSoftware Official Ranker Scores
Final Thoughts / Conclusions
“Navi” is an interesting chip to be sure and perhaps more was expected of it; as always the drivers are the weak link and it is hard to determine which issues will be fixed driver-side and which will need to be optimised in compute kernels.
Thus performance-wise it oscillates between 1/2x and 50% Vega1 performance depending on algorithm, with compute-heavy algorithms (especially crypto-currencies) doing best and shared/local memory heavy algorithms doing worst. The 2x bigger shared memory (64kB vs 32) in conjunction with the larger work-group (1024 vs 256 by default) sizes do present future optimisation opportunities. AMD has also reduced the warp/wave size to match nVidia – a historic change.
Memory wise, the cost-cutting change from HBM2 to even high-speed GDDR6 does bring more bandwidth but naturally higher latencies – but PCIe 4.0 doubles upload/download bandwidths which will become much more important on higher capacity (16GB+) cards in the future.
Overall it is hard to recommend it for compute workloads unless the particular algorithm (crypto, financial) does well on Navi, otherwise the much older Vega1 56/64 offer better performance/cost ratio especially today. However, as drivers mature and implementations are optimised for it, Navi is likely to start to perform better.
We are looking forward to the next iterations of Navi, especially the rumoured “big Navi” version optimised for compute…