AMD Radeon RX 6800 (RDNA2, Navi2L) Review & Benchmarks – GPGPU Performance

What is “Navi2”?

It is the code-name of the new AMD GPU, the 2nd generation RDNA (Radeon DNA) GPU arch(itecture) that itself replaced “Vega” / last of the GCN arch(itecture). Unlike the original Navi that was a mid-range GPU – this is the very much expected “big Navi” top-end GPU designed to battle nVidia’s finest 3000-series GPUs.

Navi/RDNA arch brought big changes from Vega/GCN and Navi2 has been enhanced and optimised from Navi1 adding more features:

  • Ray-Tracing (RT) Cores – similar to nVidia’s Turing/Ampere cards
  • Infinity Cache – 128MB
  • Smart-Access Memory – PCIe BAR re-sizeing
  • 6800 has Navi21 XL mid-range chip with 2/3 CUs enabled (60 out of 80)

Unlike Vega1/2 GPUs that perhaps were very much compute focused – Navi1/2 seem to be more gaming focused with the few compute features already introduced: reduced workgroup size matching nVidia (32), increased work-group sizes (1024). It is likely that AMD will launch HBM2 professional cards and hopefully Navi versions with tensor units (TSX) or matrix multiplicators (tuMMA).

See these other articles on GP-GPU performance:

Hardware Specifications

We are comparing the mid-range Radeon with previous generation cards and competing architectures with a view to upgrading to a mid-range high performance design. We have included the top-end previous generation cards that may be cheaper to obtain today.

GP-GPU Specifications AMD Radeon RX 6800 (Navi2L) AMD Radeon 5700XT (Navi1) nVidia 3070 (Ampere) nVidia 2080TI (Turing) Comments
Arch / Chipset RDNA2 / Navi 21 XL RDNA1 / Navi 10 Ampere / GA104 / SM8.6 Turing / GT102 / SM7.5 The 2nd of the Navi cores
Cores (CU) / Threads (SP) 60 / 3840 [+50%]
40 / 2,560 46 / 5,888 68 / 4,352 The XL version has 50% more cores.
Wave/Warp Size 32 32 32 32 Wave size now matches nVidia.
Speed (Min-Turbo) (GHz)
1.7 (1.815)
1.6 (1.755) 1.5 (.725) 1.35 (1.635) 40% faster base and 20% turbo than Vega1.
Power (TDP) 250W [+11%] 225W 220W 260W Power has only increased by 33%
ROP / TMU 96 / 240 [+50%]
64 / 160 96 / 184 88 / 272 ROPs are the same but we see ~30% less TMUs.
Ray-Tracing (RT)
60 none 82 68 Navi2 brings 60 RT cores like nVidia.
Shared Memory (kB)
64kB 64kB 48kB / 96kB per SM 48kB / 96kB per SM No change in shared memory.
Constant Memory (GB)
8GB [+2x] 4GB 64kB dedicated 64kB dedicated No dedicated constant memory but large.
Global Memory (GB)
16GB GDDR6 16Gbps 256-bit 8GB GDDR6 14Gbps 256-bit 8GB GDDR6 14Gbps 256-bit 11GB GDDR6 14Gbps 320-bit No HBM at this level.
Memory Bandwidth (GB/s)
512GB/s [+14%] 448GB/s 448GB/s 616GB/s Still bandwidth is 9% higher.
L1 Caches (kB)
32kB / WG + 128kB/Array 64kB/Array 46x 128kB/SM 68x 96kB/SM L1 has been doubled (2x)
L2 Cache (MB)
4MB 4MB 4MB 5.5MB L2 has not changed.
Maximum Work-group Size
1024 / 1024 1024 / 1024 1024 / 2048 per SM 1024 / 2048 per SM AMD has unlocked work-group sizes to 4x.
FP64/double ratio
1/16x 1/16x 1/32x 1/32x Ratio is 2x nVidia.
FP16/half ratio
2x 2x 2x 2x Ratios are the same throughput.
Price/RRP (USD)
579 [+29%] 450 499 1,000 Price is 30% higher than Navi1.

AMD Radeon 6900 XT

Disclaimer

This is an independent article that has not been endorsed nor sponsored by any entity (e.g. AMD). All trademarks acknowledged and used for identification only under fair use.

The article contains only public information (available elsewhere on the Internet) and not provided under NDA nor embargoed. At publication time, the products have not been directly tested by SiSoftware and thus the accuracy of the benchmark scores cannot be verified; however, they appear consistent and do not appear to be false/fake.

Processing Performance

We are testing both OpenCL performance using the latest SDK / libraries / drivers from both AMD and competition.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest AMD and nVidia drivers. Turbo / Boost was enabled on all configurations.

Processing Benchmarks AMD Radeon RX 6800 (Navi2L) AMD Radeon 5700XT (Navi1) nVidia 3070 (Ampere) nVidia 2080TI (Turing) Comments
GPGPU Arithmetic Benchmark Mandel FP16/Half (Mpix/s) 32,865 [+60%] 20,503 37,403 41,080 Navi2L is still 60% faster than top Navi1.
GPGPU Arithmetic Benchmark Mandel FP32/Single (Mpix/s) 22,073 [+61%] 13,684 25,994 25,000 Standard FP32 is again 60% faster.
GPGPU Arithmetic Benchmark Mandel FP64/Double (Mpix/s) 2003 [+84%]
1,086 608 812 FP64 is 84% than Navi1.
GPGPU Arithmetic Benchmark Mandel FP128/Quad (Mpix/s) 80.73 [+85%]
43.74 22.57 30.4 Emulated FP128 is hard on FP64 units but we see similar improvement.
Navi2L with 50% more CU/SP and faster clock is 60-85% faster than old Navi1 – making it pretty much obsolete. Unlike its big brother (6900XT) it can match its nVidia competition in FP16/FP32 but it is almost 4x faster in FP64: if your workload needs high precision then Navi2 is the way to go.
GPGPU Crypto Benchmark Crypto AES-256 (GB/s) 150 [+67%] 90 61 48 Navi2L is 67% faster than Navi1
GPGPU Crypto Benchmark Crypto AES-128 (GB/s) 111 64
GPGPU Crypto Benchmark Crypto SHA2-256 (GB/s) 257 [+74%] 148 232 192 Navi2L is 74% faster than Navi1.
GPGPU Crypto Benchmark Crypto SHA1 (GB/s) 157 170
GPGPU Crypto Benchmark Crypto SHA2-512 (GB/s) 161
Despite a modest bandwidth increase, Navi2L is between 67-74% faster than Navi1 – making it obsolete. We can see why 6900XT is memory constrained having the same bandwidth as this 6800.

It also has no problem keeping up with its nVidia competition (300) which it beats by a decent amount.

GPGPU Finance Benchmark Black-Scholes float/FP32 (MOPT/s) 14,210 [+18%] 12,062 12,300 17,230 In this FP32 financial workload Navi2L is 18% faster.
GPGPU Finance Benchmark Black-Scholes double/FP64 (MOPT/s) 2,822 992 1,530
GPGPU Finance Benchmark Binomial float/FP32 (kOPT/s) 4,275 [+2.1x] 2,035 4,716 4,280 Binomial uses thread shared data – Navi2L over 2x faster.
GPGPU Finance Benchmark Binomial double/FP64 (kOPT/s) 129 117 164
GPGPU Finance Benchmark Monte-Carlo float/FP32 (kOPT/s) 9,048 [+4%] 8,731 13,109 11,440 Monte-Carlo uses read-only thread shared data reducing modify pressure: Navi2L is 4% faster.
GPGPU Finance Benchmark Monte-Carlo double/FP64 (kOPT/s) 464 248 327
For financial FP32 workloads, Navi2L is between 18%-2x faster than Navi1, a more modest improvement. Still it can keep up with its nVidia competition which is something its bigger brother is not quite able to do.
GPGPU Science Benchmark SGEMM (GFLOPS) float/FP32 7,992 [+52%] 5,271 8,889 7,400 GEMM brings 52% improvement over Navi1.
GPGPU Science Benchmark DGEMM (GFLOPS) double/FP64 646 350 502
GPGPU Science Benchmark SFFT (GFLOPS) float/FP32 523 [+36%] 384 533 513 FFT also improved by a decent 36% on Navi2.
GPGPU Science Benchmark DFFT (GFLOPS) double/FP64 184 176 302
GPGPU Science Benchmark SNBODY (GFLOPS) float/FP32 6,730 [+44%] 4,676 9,514 9,330 Navi2 improves by 44% over Navi1 in N-Body.
GPGPU Science Benchmark DNBODY (GFLOPS) double/FP64 462 248 222
The scientific FP16/32 algorithms scale better, with Navi2L 36-52% faster than Navi1; again this is sufficient to keep up with nVidia competition. Thankfully due to the high FP32/64 ratio on nVidia consumer cards – Navi2L is a great choice for high-precision (FP64) workloads.
GPGPU Image Processing Blur (3×3) Filter single/FP32 (MPix/s) 34,013 [+3.85x] 8,824 24,043 23,090 In this 3×3 convolution algorithm, Navi2L is almost 4x faster!
GPGPU Image Processing Sharpen (5×5) Filter single/FP32 (MPix/s) 12,177 [+6.86x] 1,776 8,295 6,000 Same algorithm but more shared data makes Navi2L almost 7x faster!!
GPGPU Image Processing Motion Blur (7×7) Filter single/FP32 (MPix/s) 11,492 [+6.23x] 1,844 8,371 6,180 With even more data the gap remains at almost 6x.
GPGPU Image Processing Edge Detection (2*5×5) Sobel Filter single/FP32 (MPix/s) 11,980 [+6.9x] 1,733 8,049 6,220 Still convolution but with 2 filters – similarly almost 7x faster!!
GPGPU Image Processing Noise Removal (5×5) Median Filter single/FP32 (MPix/s) 151 [+3.83x] 39,47 167 53.8 Different algorithm makes Navi2 “only” 4x faster.
GPGPU Image Processing Oil Painting Quantise Filter single/FP32 (MPix/s) 97.44 [+2x] 50.64 97.3 20.38 Without major processing, this filter performs well on Navi2L.
GPGPU Image Processing Diffusion Randomise (XorShift) Filter single/FP32 (MPix/s) 48,606 [62%] 30,003 46,976 53,300 This algorithm is 64-bit integer heavy and Navi2 is 62% faster.
GPGPU Image Processing Marbling Perlin Noise 2D Filter single/FP32 (MPix/s) 20,186 [+2.12x] 9,523 708 4,130 One of the most complex and largest filters, Navi2 is over 2x faster
For image processing using FP32 precision, the improvement is just incredible: Navi2L is 2-7x faster than Navi1! Be it hardware, driver – what we saw in the 6900XL review was no fluke: Navi2 GPUs simply fly dominating even top-end nVidia competition.
GPGPU Image Processing Overall GP-GPU Performance (Points) 29,650 [+56%] 18,960 22,330 19,800 Overall (including memory) Navi2L is 56% faster than Navi1.
Across all benchmarks – including memory bandwidth/latency – Navi2L is almost 60% faster than Navi1 – a huge improvement for a mid-range chip. Considering we saw 6900XT just 80% faster makes 6800 a bargain.
Power/TDP (W) 250 [+11%] 225 220 260 Power has increased by just 11%.
Performance/Power (Power Efficiency) (Points/W) 119 [+41%] 84 101 76 Navi2L power efficiency is 41% better than Navi1.
With only 11% in TDP/Power, the ~60% performance increase over Navi1 makes Navi2L 41% more efficient, beating everything else. Here competition cannot match it.
Price/RRP (USD) $579 [+29%] $450 $499 $1,000 Price has doubled vs. Navi1.
Performance/Power (Power Efficiency) (Points/W) 51.2 [+22%] 42.13 44.7 19.8 Navi2 price efficiency is 22% lower than Navi1.
While Navi2L is about 60% faster than Navi1 across all algorithms, price increase of 30% is half that – thus Navi2L gets you much more for your money. And let’s not forget a better buy than its competition.

Memory Performance

We are testing both OpenCL performance using the latest SDK / libraries / drivers from AMD and competition.

Results Interpretation: For bandwidth tests (MB/s, etc.) high values mean better performance, for latency tests (ns, etc.) low values mean better performance.

Environment: Windows 10 x64, latest AMD and nVidia. drivers. Turbo / Boost was enabled on all configurations.

Memory Benchmarks AMD Radeon RX 6800 (Navi2L) AMD Radeon 5700XT (Navi1) nVidia 3070 (Ampere) nVidia 2080TI (Turing) Comments
GPGPU Memory Bandwidth Internal Memory Bandwidth (GB/s) 439 [+11%] 396 375 494 Navi2L gets a 11% bandwidth bump.
GPGPU Memory Bandwidth Upload Bandwidth (GB/s) 26.02 [+6%] 23.73 24.82 11.3 All modern cards have PCIe4 bus now.
GPGPU Memory Bandwidth Download Bandwidth (GB/s) 26.27 [+4%] 24.84 24.57 11.93 Again just 4% change.
Navi2’s 14% on-paper bandwidth upgrade translates to 11% in real life which at this level is OK; but certainly not for the higher end 6900XT as we saw in the other article.

Navi1 already has a PCIe4 interface (on 500-series AMD and soon 500-series Intel) thus minor improvement in upload/download bandwidth that only shines compared to much older PCIe3 cards (e.g. nVidia’s Turing).

GPGPU Memory Latency Global (In-Page Random Access) Latency (ns) 155 [-36%] 241 141 135 Due to the higher speed latency is 36% lower.
GPGPU Memory Latency Global (Full Range Random Access) Latency (ns) 847 243
GPGPU Memory Latency Global (Sequential Access) Latency (ns) 40
GPGPU Memory Latency Constant Memory (In-Page Random Access) Latency (ns) 77
GPGPU Memory Latency Shared Memory (In-Page Random Access) Latency (ns) 10.6
GPGPU Memory Latency Texture (In-Page Random Access) Latency (ns) 157
GPGPU Memory Latency Texture (Full Range Random Access) Latency (ns) 268
GPGPU Memory Latency Texture (Sequential Access) Latency (ns) 67
Not unexpected, GDDR6′ latencies are higher than HBM2 although not by as much as we were fearing.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

Summary: Much, much faster than old Navi1 and stellar FP64 performance: 9.5/10

Ever since the release of Navi1 (“little-Navi”) and its revolutionary new architecture AMD fans everywhere have eagerly awaited the release of “big-Navi” that can bring the fight to nVidia. A year or so later – we have Navi2: pretty much twice Navi1 + ray-tracing – tensors/tiles. It is everything that was expected but in some ways perhaps we expected more?

With 50% more CU/SP and faster speed, the little brother of “big-Navi” Navi2L chip in 6800 does not disappoint: in compute FP16/32 tasks it is at least 60% faster and in many cases much, much faster. Unlike its bigger brother – it has no problem keeping up with nVidia’s latest competition (Ampere 3070).

Compute FP64 performance is much better thanks to lower FP32/64 ratio (1/16x vs. nVidia 1/32x) that allows it to really put the boot into nVidia – and if that’s the kind of algorithms you run – Navi2 is your choice.

Memory-wise, it seems the meager bandwidth improvement over Navi1is OK for Navi2L which seems to have sufficient resources; it does not seem OK for Navi2XL as we saw in the 6900XT review. Having 2x more memory (16GB) allows much bigger kernels to run – and again shows that 6900XT should have had more.

The price (USD 579) is a bit higher than the old 5700XT but perhaps a sign of the times. Power/TDP is just 11% higher (250W) which shows how much Navi2 has improved efficiency over Navi1.

In summary – Navi2L in mid-range performs much better (compute wise) for the money and also against its competition (3070). We hear that it does even better in games! If you have Navi1, it is time to upgrade. AMD has done good!

To see how its “big-brother” Navi2X performs, please see our AMD Radeon RX 6900XT (RDNA2, Navi2X) Review & Benchmarks – GPGPU Performance article.

AMD Radeon 6900 XT

Disclaimer

This is an independent article that has not been endorsed nor sponsored by any entity (e.g. AMD). All trademarks acknowledged and used for identification only under fair use.

The article contains only public information (available elsewhere on the Internet) and not provided under NDA nor embargoed. At publication time, the products have not been directly tested by SiSoftware and thus the accuracy of the benchmark scores cannot be verified; however, they appear consistent and do not appear to be false/fake.

Tagged , , , , , , . Bookmark the permalink.

Comments are closed.