We are pleased to release R25 (version 31.133) update for Sandra 20/21 with brand-new support for future instruction sets (ISA) and future hardware.

We have also published two articles with our thoughts on the two major announcements from Intel:

Benchmarks, Hardware Support updates and fixes

CPU Multi-Media Benchmarks

  • AVX-iFMA(52): New 256-bit code path based on AVX512-iFMA(52) 512-bit for future arch (e.g. “Meteor Lake” MTL, “Arrow Lake” ARL). We saw +66% improvement as detailed in our AVX512-iFMA(52) Improvement for IceLake and TigerLake article.
  • AVX512-FP16: New code path for Xeon processors that support AVX512-FP16. We expect +90% improvement over FP32 if precision loss in acceptable (e.g. zoomed out fractals).

Note: Future FP16 code-paths will also be added to the other CPU benchmarks, however some parts may remain FP32 as the loss of precision would make the results useless. We have explored this in our GP-GPU article dealing with FP16 support: FP16 GP-GPU Image Processing Performance & Quality.

CPU Cryptogaphy Benchmarks

  • SHA2-512 HWA: Hardware-accelerated hashing SHA512 code-path – based on current SHA2-256 HWA. We expect ~3x (three times) better performance based on the SHA2 non-accelerated/HWA results.
  • Future SM3-256 (China) HWA: Hardware-accelerated hashing SM3 code-path (China’s version of SHA hashing functions). We expect similar performance improvement to SHA HWA.
  • Future SM4-128/256 (China) HWA: Hardware-accelerated block crypto SM4 code-path (China’s version of AES block crypto functions). We expect similar performance improvement to AES HWA.

Note: We will default to “SM4 + SM3” benchmarks – rather than “AES + SHA” for both CPU & GP-GPU Cryptography benchmarks when “China” locale is detected as these algorithms are more likely to be used there.

Note 2: ARM already includes SHA2-512, SM4, SM3 HWA (hardware acceleration extensions) in their high-end cores.

Note 3: While AES & SHA are not going anywhere, there has been a shift to other crypto-algorithms (especially those that can encrypt/decrypt and hash/authenticate like “ChaCha20Poly1305” as famously used in WireGuard VPN) that are fast enough even without hardware acceleration!

All CPU Benchmarks – AVX10 Support

  • AVX10.2+ 256-bit future code paths (FP32, FP64 and FP16) for Hybrid architectures (e.g. “Meteor Lake” MTL, “Arrow Lake” ARL). Note both Core (P) and Atom (E) will run the same 256-bit width binary and not different widths
  • AVX10.1+ 512-bit & AVX512 shared code path (FP32, FP64 and FP16) for Xeon architectures
  • Possible AVX10.2+ 128-bit future code path for Atom/Other discrete architectures if required

Note: in future Hybrid arch, Core (P) cores are likely to support 128/256-bit widths only. We don’t know (and we could not tell you) whether disabling Atom (E) cores will get the Core (P) to report 512-bit widths.

Hardware Support

  • Intel 14th gen Hybrid “Raptor Lake Refresh” RPL-S support
  • Intel future gen Hybrid “Meteor Lake” (MTL-S/M/P/N), “Arrow Lake” (ARL-S/U), “Lunar Lake” (LNL-M) detection
  • Intel future gen Xeon “Granite Rapids” SP/D detection

