SiSoftware Logo
  Home   FAQ   Pubblicità   Download & Ordini   Ranking   Contatti  
New: SiSoftware Sandra 2013
DE DE EN EN FR FR JP JP RU RU 

Intel Icon

Benchmarks : Intel Sandy Bridge CPU: Turbo (Dynamic Overclocking) Performance


What is it?

The new CPU architecture from Intel that introduces many hardware advances that software can take advantage of for better performance, power efficiency and also 'bang-per-buck'. All platforms including server, desktop and mobile benefit from the architectural improvements. We have worked to add support for all the new advancements in Sandra 2011.

Why do we measure it?

Future processors from competitors will also support these technologies and thus these improvements will benefit all modern processors. The new technologies we employed in our tests are as follows:

  • AVX (Advanced Vector eXtensions) – a new instruction set with 256-bit width (double of SSE2/3/4) that greatly enhances performance of SIMD code. All the benchmarks that could benefit have been updated to support AVX: Multi-Media, Multi-Core Efficiency, Cryptography, Memory Bandwidth, Memory & Cache Bandwidth benchmarks.
  • AES instructions, SHLD (shift left double - used for SHA hashing) and ADC (add with carry) greatly improve cryptographic performance in the most popular algorithms today: AES for encryption/decryption (AES256/AES128) and and SHA for signing (SHA256/SHA1). The Cryptography benchmark now supports both AES and AVX.
  • Multi-Format Codec (MFX) hardware accelerating transcoding for MPEG2, VC1 and AVC (Intel Quick Sync Video Technology). The Media Transcode benchmark supports both hardware and software media trascoding.
  • GPGPU: OpenCL and DirectX ComputeShader support, fully supported by the GPGPU Processing, GPGPU Memory and GPGPU Cryptography benchmarks.

A look a the new CPU

Let's take a look at the new Sandy Bridge CPU and compare it with a Core i7 (Nehalem, Socket 1366) and Core i5 (Westmere-A, Socket 1156).

CPU Results

In order to highlight the improvements in the new technologies but also Turbo (aka Dynamic Overclocking) we are comparing a somewhat faster clocked Nehalem CPU with the brand-new Sandy Bridge CPU.

Note: Turbo / Dynamic Overclocking is enabled. It becomes effective when the operating system requests the P0 state, the highest performance one. Also the time that is active depends on workload and operating environment. See here for No-Turbo Performance.

Note 2: AVX/FMA requires Windows 7 SP1 / Windows 2008 R2 Server SP1.

Benchmark/CPU Nehalem i7 965 (3.2GHz) Sandy Bridge (3GHz) Comments
Speed 3200MHz (24x 133MHz), Turbo on +2 / +1 / +1 / +1 3000MHz (30x 100MHz), Turbo on +6 / +4 / +2 / +1 Lower base clock speed but far more aggressive Turbo.
Cores/Threads 4C / 8T 4C / 8T  
Caches L1/L2/L3 4x32kB / 4x256kB / 8MB 4x32kB / 4x256kB / 6MB Smaller but faster L3 cache.
CPU Arithmetic 81.66GOPS (98MIPS / 68.33MFLOPS) (SSE4.2 / SSE3) 85.15GOPS (98.12MIPS / 73.0MFLOPS)(SSE4.2 / SSE3) Turbo makes an easy task for SB to outperform the i7 965, even though Sandra uses all the cores/threads.
CPU Multi-Media 156MPix/s (180.42 / 135 / 73) (SSE4.1/SSE2) 155MPix/s (178.37 / 134.75 / 73.77) (SSE4.1/SSE2) Similar result even though SB is running ~7% slower.
CPU Multi-Media AVX - 217.5MPix/s (186.34 / 253.9 / 144.36) (AVX) AVX makes a clear winner of SB, with 40% performance increase!
Multi-Core Efficiency 19.43GB/s / 44.7ns 18.8GB/s / 39.8ns Although it has a slightly lower bandwidth, the ring bus offers lower (beter) inter-core latency.
Cryptography 843MB/s (821 / 865) (ALU / SSE4) 797MB/s (821 / 773) (ALU / SSE4) The two processors are not that apart when standard instruction sets are used.
Cryptography AES/AVX - 2270MB/s (5610 / 943) (AES / AVX) The same extraordinary result giving the hardware accelerated AES and AVX, but not much over the test with Turbo disabled.
Power Efficiency 11GIPS / 1.63 10.88GIPS / 1.58 Close result with SB losing by a whisker.

Virtual Machine performance is very much important today, with WPF (Windows Presentation Foundation) applications replacing traditional MFC apps and Internet Java applets and objects providing rich multi-platform interfaces.

Benchmark/CPU Nehalem i7 965 (3.2GHz) Sandy Bridge (3GHz) Comments
.NET Arithmetic 28.59GOPS (18MIPS / 45.41MFLOPS) 26.55GOPS (15.35MIPS / 46MFLOPS) Clearly the .Net CLR benchmark reveals a weak spot on SB, because there is no improvement in VM performance.
.NET Multi-Media 21.85MPix/s (33.42 / 14.29 / 27.12) 20.15MPix/s (33.27 / 12.2 / 24) Here too, SB offers no improvemens for .NET. Some optimisation work for Microsoft?
Java Arithmetic 83.14GOPS (127.72 / 54.13) 72.7GOPS (137.88 / 38.32) Again, Turbo offers no advantage though SB wins the Java Dhrystone test but loses overall.
Java Multi-Media 27MPix/s (28.15 / 26 / 25.77) 25.32MPix/s (26.61 / 24 / 24) It's a close score, but better results were expected from SB; Oracle also needs to burn the midnight oil.

GPU (Integrated Graphics) Results

As all the Sandy Bridge CPUs include a built-in updated GPU, here we test both video, transcoding and GPGPU performance. DirectX ComputeShader is supported by the current video drivers for the first time. OpenCL CPU support is currently alpha with GPU support to come at some later date.

Note: Turbo / Dynamic Overclocking is enabled. It becomes effective when the operating system requests the P0 state, the highest performance one. Also the time that is active depends on workload and operating environment. See here for No-Turbo Performance.

Note 2: We are comparing software OpenCL CPU performance with video ComputeShader GPU performance; we will update the results once OpenCL GPU support will be released.

Benchmark/GPU Nehalem i7 965 Sandy Bridge GPU, HD2000 Comments
GPU Shaders/Speed - 6EU / 850MHz, Turbo Enabled up to 1100MHz  
Media Transcoding CPU (WMV > h364, h364 > h364) 862kB/s (812 / 915) 837kB/s (775 / 903) Even with it's architecture improvements, the Turbo fails to put SB in an advantage.
Media Transcoding GPU (WMV > h364, h364 > h364) - 4.8MB/s (4.73 / 4.8) A similar speed increase over software transcoders with similar quality. Turbo offers almost no improvement, could be early graphics driver issues.
Video Shading - 11.81MPix/s (31.23 / 4.47) (DX10.1) Also here, there is a very small increase in the MPix count over the Turbo disabled test.
Video Memory (Internal / Transfer) - 4.3GB/s (9 / 2) (DX10.1) The memory controller speed or the speed of the memory itself are the same, and so the results.
GP Processing 60MPix/s (87.13 / 41.4) (CPU) 20.13MPix/s (77.26 / 5.24) (GPU) Because Turbo dynamically adjust both CPU and GPU frequency the results showed here are relevant, for both competitors.
GP Cryptography 431MB/s (343 / 542) (CPU) 417MB/s (161 / 1024) (GPU) Similar result with 50% lower AES but 50% higher SHA due to different run-times.
GP Memory (Internal / Transfer) 11.69GB/s (11.54 / 11.84) (CPU) 4.35GB/s (9 / 2) (GPU) Similar internal memory performance (as expected) but much lower transfer performance, most likely more optimisations to be made to the GPGPU SB drivers.

With a significant increase in the number of shaders, the Sandy Bridge’s GPU now threatens the low-end market of integrated graphics, opening a new chapter in the competition with AMD and nVIDIA.

Memory Results

Here we test the memory controller as well as the internal CPU caches. We use only 2 DIMMs, i.e. 2 channels, on i7 in order to have an objective comparison; we tested with the Internal GPU enabled and disabled (thus using external graphics).

Note: Turbo / Dynamic Overclocking is enabled. It becomes effective when the operating system requests the P0 state, the highest performance one. Also the time that is active depends on workload and operating environment. See here for No-Turbo Performance.

Benchmark/Memory 2x DDR3 PC3-10700 (Nehalem) 2x DDR3 PC3-10700 (Sandy Bridge) Comments
Speed/Timing 1333MHz 9-9-9-24 4-33-11-6 1333MHz 9-9-9-25 4-34-10-5  
Memory Controller Speed 2666MHz (20x 133MHz), Turbo on 3000MHz (30x 100MHz), Turbo on Uncore frequency is faster on SB by default.
Memory Bandwidth 16.16GB/s (16.13 / 16.2) 17.58GB/s (17.55 / 17.58) (SSE2) SB keeps the advantage thanks to its memory controller but Turbo here diminishes the gains.
Memory Bandwidth AVX - 17.56GB/s (17.55 / 17.58) (AVX) No performance change using the AVX instruction set.
Memory Bandwidth AVX, Internal GPU enabled - 17.3GB/s (17.32 / 17.28) (AVX) Minor hit in bandwidth when enabling the internal GPU, still higher than Nehalem; good news for users of internal graphics.
Cache & Memory
L1 / L2 / L3
74.4GB/s / 34.9 (SSE2)
357.3GB/s / 207.4GB/s / 47GB/s
88.7GB/s / 31.8 (SSE2)
366.86GB/s / 258.86GB/s / 114GB/s
Turbo Boost keeps the advantage that SB has in bandwidth.
Cache & Memory AVX
L1 / L2 / L3
- 93.16GB/s / 35.2 (AVX)
418.6GB/s / 263.24GB/s / 113.33GB/s
Better cache performance with AVX allows SB to increase its lead over NH.
Cache & Memory AVX, Interal GPU enabled
L1 / L2 / L3
- 88.63GB/s / 32.7 (AVX)
373.3GB/s / 260GB/s / 113GB/s
Enabling the GPU does affect performance but still matches NH with external graphics, thanks to AVX
Latency Random
L1 / L2 / L3
74ns / 59.4
4 / 11 / 50clocks
73.5ns / 63.3
4 / 10 / 42clocks
A closer match in this set-up, but it's the other way around with SB's lower.
Latency Linear
L1 / L2 / L3
5.7ns / 4.8
4 / 10 / 12clocks
7.1ns / 6.2
4 / 11 / 14clocks
The L3 cache on SB is higher for the first time in these tests, but the overall gap is smaller.

Efficiencies

Because not all things in life are evaluated to their true value, the next measurements will take into consideration various efficiency aspects:

Efficiency/CPU Core i7 965 (Nehalem) Sandy Bridge Comments
Performance vs. Cost (i.e. how much performance you get for your money) 176 MOPS/$ 463 MOPS/$ SB offers the very best 'bang-for-buck' by far, especially when Turbo is enabled.
Performance vs. Power (this measures the efficiency of power design, or TDP) 628 MOPS/W 896 MOPS/W With turbo enabled, even at lower base clock, SB is way ahead of the older i7 generation.
Performance vs. Speed (how performance scales with speed and how they perform at the same speed) 25.52 MOPS/MHz 28.38 MOPS/MHz This test proves that Turbo Boost 2 is more effective, as the gap between the two is larger than with Turbo off.

Live Results @ SiSoftware Live Ranker

Final Thoughts / Conclusions

The reloaded version of Intel's Turbo Boost is showing some performance improvements here and there. Not all the tests reavealed something extraordinary, and that is because there is more than one trigger for the frequency adjustment: the number of active cores, estimated current consumption or processor temperature. And to raise the Turbo on one core to a higher level, the other cores must be in idle. In "real life" situations it does not happen so often. So it's up to you to leave the Overcloking to be dynamic or buy a "K" series and raise it manually, beacause the premium is worth.

  • AVX Technology allows it to win all computational benchmarks - though few software currently support it

  • AES Technology greatly improves cryptography tasks, with AVX/SHLD/ADC improving hashing performance.

  • Thermal power decreased markedly, with 50% better power efficiency. This is not to be trifled at.

  • Better memory and L3 cache performance but only 2 channel while a 3 channel Nehalem still faster; if you need the bandwidth, use 3-channel Nehalem or its future replacement.

  • Integrated GPU with DirectX 10.1 support and GPGPU DirectCompute support which matches low-end discrete/external graphics solutions. While desktop users might not be impressed, HTPC and mobile users should be overjoyed.

  • Turbo 2 technology is far more aggressive (+6 for 1 core compared to +2) and yields good results even when using highly threaded workloads like Sandra.

  • Trivia: The nickname for "Sandra" is "Sandy" thus they both share the same name. Coincidence or are there dark forces at work?

Please let us know what you thought of this article by voting using the icons/links below. Thank you for reading.

Novità | Recensioni | Twitter | Facebook | informazioni sulla privacy | licenza | contatti