SiSoftware Logo
  НаЧало   Вопросы И Ответы   Пресса   Загрузка И Покупка   Рейтинги   Контакты  
New: SiSoftware Sandra 2014

Intel Icon

Benchmarks : Intel Sandy Bridge CPU/GPU: AVX Performance

What is it?

The new CPU (and GPU) architecture from Intel that introduces many advances that software can use of for better performance, power efficiency and also 'bang-per-buck'. All platforms including server, desktop and mobile benefit from the architectural improvements though we compare desktop performance only here. We have worked to add support for all the new advancements in Sandra 2011.

Why do we measure it?

Future processors from competitors will also support these technologies and thus these improvements will benefit all modern processors. The new technologies we employed in our tests are as follows:

  • AVX (Advanced Vector eXtensions) – a new instruction set with 256-bit width (double of SSE2/3/4) that greatly enhances performance of SIMD code. All the benchmarks that could benefit have been updated to support AVX: Multi-Media, Multi-Core Efficiency, Cryptography, Memory Bandwidth, Memory & Cache Bandwidth benchmarks.
  • AES instructions, SHLD (shift left double - used for SHA hashing) and ADC (add with carry) greatly improve cryptographic performance in the most popular algorithms today: AES for encryption/decryption (AES256/AES128) and and SHA for signing (SHA256/SHA1). The Cryptography benchmark now supports both AES and AVX.
  • Multi-Format Codec (MFX) hardware accelerating transcoding for MPEG2, VC1 and AVC (Intel Quick Sync Video Technology). The Media Transcode benchmark supports both hardware and software media trascoding.
  • GPGPU: OpenCL and DirectX ComputeShader support, fully supported by the GPGPU Processing, GPGPU Memory and GPGPU Cryptography benchmarks.

A look a the new CPU

Let's take a look at the new Sandy Bridge CPU and compare it with a Core i7 (Nehalem, Socket 1366) and Core i5 (Westmere-A, Socket 1156).

CPU Results

In order to highlight the improvements in the new technologies we are comparing a similarly clocked Nehalem CPU with the brand-new Sandy Bridge CPU.

Note: Turbo / Dynamic Overclocking was disabled; See here for Turbo Performance. This was done to keep the results consistent: Turbo may engage at different times at different speeds and thus results do vary from run to run.

Note 2: AVX/FMA requires Windows 7 SP1 / Windows 2008 R2 Server SP1.

Benchmark/CPU Nehalem (3066MHz) Sandy Bridge (3000 MHz) Comments
Speed 3066MHz (23x 133MHz), No Turbo 3000MHz (30x 100MHz), No Turbo  
Cores/Threads 4C / 8T 4C / 8T  
Caches L1/L2/L3 4x32kB / 4x256kB / 8MB 4x32kB / 4x256kB / 6MB Smaller but faster L3 cache.
CPU Arithmetic 75.62GOPS (90.22MIPS / 63.38MFLOPS) (SSE4.2 / SSE3) 82.22GOPS (94.88MIPS / 71.26MFLOPS) (SSE4.2 / SSE3) SB outperforms the old Nehalem by 9% even with a slight clock disadvantage!
CPU Multi-Media 143.43MPix/s (165.76 / 124.12 / 67.2) (SSE4.1/SSE2) 150.2MPix/s (172.54 / 130.74 / 71.47) (SSE4.1/SSE2) SB outperforms the old Nehalem by 5% even with a slight clock disadvantage!
CPU Multi-Media AVX - 210.5MPix/s (180.3 / 245.78 / 139.65) (AVX) AVX makes a clear winner of SB, with almost double performance increase!
Multi-Core Efficiency 19.4GB/s / 44.5ns 18.14GB/s / 43.5ns Although it has a slightly lower bandwidth, the ring bus offers better inter-core latency.
Cryptography 812MB/s (789 / 835) (ALU / SSE4) 772MB/s (795 / 750) (ALU / SSE4) Similar performance using the same instruction set.
Cryptography AES/AVX - 2240MB/s (5600 / 913) (AES / AVX) Hardware accelerated AES and AVX yields triple performance, a fantastic result.
Power Efficiency 10.79GIPS / 1.58 10.81GIPS / 1.57 Better power design and frequency scaling makes SB a winner in this test by a whisker.

Virtual Machine performance is very much important today, with WPF (Windows Presentation Foundation) applications replacing traditional MFC apps and Internet Java applets and objects providing rich multi-platform interfaces.

Benchmark/CPU Nehalem (3066MHz) Sandy Bridge (3000 MHz) Comments
.NET Arithmetic 26.77GOPS (16.51MIPS / 43.4MFLOPS) 25.51GOPS (14.9MIPS / 43.7MFLOPS) .Net CLR has not been updated for SB and SB offers no improvement in VM performance.
.NET Multi-Media 20MPix/s (30.36 / 13.17 / 24.73) 19.57MPix/s (32.55 / 18.83 / 23.3) Strangely SB offers no improvemens for .NET in any of the tests. Some work for Microsoft?
Java Arithmetic 73GOPS (121.17 / 44) 70.34GOPS (134.64 / 36.75) SB wins the Java Dhrystone test but loses overall.
Java Multi-Media 24.75MPix/s (25.52 / 24 / 23.8) 24.46MPix/s (25.78 / 23.21 / 23.2) Close result but better resuls were expected; Sun/Oracle also needs to burn the midnight oil.

GPU (Integrated Graphics) Results

As all the Sandy Bridge CPUs include a built-in updated GPU, here we test both video, transcoding and GPGPU performance. DirectX ComputeShader is supported by the current video drivers for the first time. OpenCL CPU support is currently alpha with GPU support to come at some later date.

Note: Turbo / Dynamic Overclocking was disabled; See here for Turbo Performance. This was done to keep the results consistent: Turbo may engage at different times at different speeds and thus results do vary from run to run.

Note 2: We are comparing software OpenCL CPU performance with video ComputeShader GPU performance; we will update the results once OpenCL GPU support is released.

Benchmark/GPU Nehalem Sandy Bridge GPU, HD2000 Comments
GPU Shaders/Speed - 6EU / 850MHz, Turbo Disabled  
Media Transcoding CPU (WMV > h364, h364 > h364) 741kB/s (712 / 771) 779kB/s (743 / 817) Better CPU transcoding performance due to SB architecture improvements.
Media Transcoding GPU (WMV > h364, h364 > h364) - 4.7MB/s (4.7 / 4.7) Huge speed increase (6.5x) over software transcoders with similar quality. A fantastic result!
Video Shading - 11.78MPix/s (31.23 / 4.44) (DX10.1)
Video Memory (Internal / Transfer) - 4.35GB/s (9 / 2) (DX10.1)
GP Processing 55MPix/s (78.63 / 38.37) (CPU) 19.44MPix/s (74.6 / 5) (GPU) While the software OpenCL CPU run-time naively supports doubles (64-bit floating point), the ComputeShader GPU does not thus the result difference. Otherwise, the float result is comparable.
GP Cryptography 431MB/s (343 / 542) (CPU) 403MB/s (156 / 1024) (GPU) Similar result with 50% lower AES but 50% higher SHA due to different run-times.
GP Memory (Internal / Transfer) 11.6GB/s (11.5 / 11.8) (CPU) 4.4GB/s (9.18 / 2.11) (GPU) Similar internal memory performance (as expected) but much lower transfer performance, most likely more optimisations to be made to the GPGPU SB drivers.

With a significant increase in the number of shaders, the Sandy Bridge’s GPU now threatens the low-end market of integrated graphics, opening a new chapter in the competition with AMD and nVIDIA.

Memory Results

Here we test the memory controller as well as the internal CPU caches. We use only 2 DIMMs, i.e. 2 channels, on i7 in order to have an objective comparison; we tested with the Internal GPU enabled and disabled (thus using external graphics).

Note: Turbo / Dynamic Overclocking was disabled; See here for Turbo Performance. This was done to keep the results consistent: Turbo may engage at different times at different speeds and thus results do vary from run to run.

Benchmark/Memory 2x DDR3 PC3-10700 (Nehalem) 2x DDR3 PC3-10700 (Sandy Bridge) Comments
Speed/Timing 1333MHz 9-9-9-24 4-33-11-6 1333MHz 9-9-9-24 4-34-10-5  
Memory Controller Speed 2666MHz (20x 133MHz) 3000MHz (30x 100MHz) Uncore frequency is faster on SB by default.
Memory Bandwidth 16.18GB/s (16.16 / 16.2) 17.57GB/s (17.56 / 17.58) (SSE2) Clear advantage to the SB memory controller using the very same memory and even instruction set.
Memory Bandwidth AVX - 17.58GB/s (17.6 / 17.56) (AVX) AVX offers a small advantage but better performance is expected of the next generation.
Memory Bandwidth AVX, Internal GPU enabled - 17.28GB/s (17.26 / 17.29) (AVX) Minor hit in bandwidth when enabling the internal GPU, still higher than NH/LF; good news for users of internal graphics.
Cache & Memory
L1 / L2 / L3
74.5GB/s / 33.6 (SSE2)
343GB/s / 202.2GB/s / 47GB/s
87GB/s / 30.3 (SSE2)
354GB/s / 251GB/s / 110.74GB/s
Similar advantage in bandwidth for SB, with L3 cache 2.3x faster!
Cache & Memory AVX
L1 / L2 / L3
- 91GB/s / 34 (AVX)
404.56GB/s / 251.7GB/s / 108.54GB/s
AVX offers better cache bandwidth especially L1 caches, with better improvements expected of the next generation.
Cache & Memory AVX, Interal GPU enabled
L1 / L2 / L3
- 86.81GB/s / 32.30 (AVX)
360GB/s / 251GB/s / 111GB/s
Enabling the GPU does affect performance but still matches a NH/LF with external graphics.
Latency Random
L1 / L2 / L3
74.6ns / 56.20
4 / 11 / 52clocks
75.3ns / 56.20
4 / 12 / 45clocks
Similar latencies except SB has a lower latency L3 cache.
Latency Linear
L1 / L2 / L3
6.0ns / 4.50
4 / 11 / 16clocks
7.7ns / 5.70
4 / 13 / 13clocks
Somewhat higher L2 latency for SB, but still lower L3 latency.


Because not all things in life are evaluated to their true value, the next measurements will take into consideration various efficiency aspects:

Efficiency/CPU Core i7 965 (Nehalem) Sandy Bridge Comments
Performance vs. Cost (i.e. how much performance you get for your money) 163.68 MOPS/$ 446.85MOPS/$ SB offers far more for your money or costs far less for the same (or better) performance making it a no-brainer.
Performance vs. Power (this measures the efficiency of power design, or TDP) 582 MOPS/W 865 MOPS/W Having a better performance index on the same frequency and an improved power design gives SB a roughly 1.5x advantage on power efficiency.
Performance vs. Speed (how performance scales with speed and how they perform at the same speed) 24.66 MOPS/MHz 27.41 MOPS/MHz Here SB does win, but not with a wide margin proving that is evolutionary not revolutionary.

Live Results @ SiSoftware Live Ranker

Final Thoughts / Conclusions

So is it worth the upgrade? A new investment in a new platform - upgrading from current 1366/1156 plaforms?

  • AVX Technology allows it to win all computational benchmarks - though few software currently support it. FMA will do the same for floating-point code once released. Ensure you are using Windows 7/Server 2008 R2 SP1 or later; if you're using Vista or XP it's not good news.

  • AES Technology greatly improves cryptography tasks, with AVX/SHLD/ADC improving hashing performance. While Westmere does have AES also, it does not have AVX.

  • Thermal power decreased markedly, with 50% better power efficiency. This is not to be trifled at.

  • You get far more for your money, there is no point to pay for the expensive Nehalem platform; Lynnfield/Westmere are cheaper but are now outclassed.

  • Better memory and L3 cache performance though only 2 channel; if you are memory constrained, either use 3-channel Nehalem or wait for its replacement. Minor hit when enabling the integrated GPU - no worries for mobile users.

  • Integrated GPU with DirectX 10.1 support and GPGPU DirectCompute support which performance that matches low-end discrete/external graphics solutions. While desktop users won't be impressed, HTPC and mobile users should be overjoyed. While DirectX 11 support would have been nice but all major features (ComputeShader, multi-threading) are supported.

  • New Platform unfortunately you will need to get a new mainboard to upgrade to this brand-new CPU which needs to be factored in.

  • Trivia: The nickname for "Sandra" is "Sandy" thus they both share the same name. Coincidence or are there dark forces at work?

Please let us know what you thought of this article by voting using the icons/links below. Thank you for reading.

Новости | Обзоры | Twitter | Facebook | Политика безопасности | Лицензия | Контакты