What is Exynos?
“Exynos” is the family of mobile SoCs from Samsung; the CPU cores in the modern versions are Qualcomm’s own “Krait” (though some are standard ARM Cortex cores) while the (GP)GPU core is Qualcomm’s own “Adreno” – unlike competing ARM SoCs which generally contain standard ARM CPU and GPU designs.
There are various series, with series 800 (Prime) representing the top of the range, with lower numbered series (e.g. 600, 400, 200, etc.) representing lower performance. Within the same series higher numbers (e.g. 805, 801, 800, etc.) represent newer generation and generally better performance and more features.
The CPU cores are called “Krait” and are Qualcomm’s own design under ARM licence – they are not standard ARM Cortex cores. The latest 400 series shares many features to the Cortex A15 – though some features are similar to the older Cortex A8/A9.
In this article we test CPU core (Krait) performance; please see our other articles on:
Hardware Specifications
We are comparing the internal CPU cores of various modern SoCs in the latest phones and tablets.
SoC Specifications | Samsung Exynos 5433 / Samsung Galaxy Note 4C | Qualcomm Snapdragon 805 / Samsung Galaxy Note 4F | Qualcomm Snapdragon 801 / Sony XPeria Z3 | Qualcomm Snapdragon 600 / Samsung Galaxy S4 LTE | Samsung Exynos 5420 / Samsung Note 10 – 2014 Edition | Comments | |
CPU Arch / ARM Arch | Cortex A57+A53 ARMv8-A | Krait 450 (APQ8084) ARMv7-A | Krait 400 (MSM8974-AC) ARMv7-A | Krait 300 (MSM8960) ARMv7-A | Cortex A15+A7 ARMv7-A | While the Cortex A5x series are 64-bit, the OS of Note 4 runs in 32-bit mode, thus ARMv7 normal code. It is unclear whether there will ever be a 64-bit version for this phone. | |
Cores (CU) / Threads (SP) | 4C + 4c / 8 threads simultaneously | 4C / 4 threads | 4C / 4 threads | 4C / 4 threads | 4C + 4c / 4 threads | Except Exynos which is big.LITTLE and has 4 big and 4 little cores, all other CPUs are quad-core. However, the Exynos 5433 can actually run 8 threads at the same time vs. 4 threads for the other CPUs including the older 5420. | |
Speed (Min / Max / Turbo) (MHz) | 400-1900 (400-1300 / 700-1900) | 300-2650 | 300-2466 | 384-1890 | 250-1900 (500-1300 / 600-1900) | We see Krait 400 pushing close to 3GHz while Cortex designs hover around 2GHz, thus relying on compute power | |
L0D / L0I Caches (kB) | n/a | 4x 4kB | 4x 4kB | 4x 4kB | n/a | All Kraits have very small L0 caches while Cortex is a more traditional design. | |
L1D / L1I Caches (kB) | 2x 4x 32kB | 4x 16kB | 4x 16kB | 4x 16kB | 2x 4x 32kB | Cortex has 2x larger L1 caches than Krait but supposedly a bit slower. | |
L2 Caches (MB) | 2MB | 2MB | 2MB | 2MB | 2MB | All designs have the same size L2 cache. |
Native Performance
We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets (Neon2, Neon, etc.).
Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.
Environment: Android 5.x.x, latest updates (May 2015).
Native Benchmarks | Samsung Exynos 5433 / Cortex A57+A53 | Qualcomm Snapdragon 805 / Krait 450 | Qualcomm Snapdragon 801 / Krait 400 | Qualcomm Snapdragon 600 / Krait 300 | Samsung Exynos 5420 / Cortex A15+A7 | Comments | |
Native Dhrystone (GIPS) | 17.07 | 17.24 [+1%] | 14.7 | 10.3 | 14.5 | Here both 5433 and 805 are neck-and-neck within 1% difference. Despite its much higher clock (+40%) the Krait 450 just keeps up with the latest Cortex A57. | |
Native FP64 (Double) Whetstone (GFLOPS) | 90 [+7%] | 84 | 74 | 62 | 92 | 5433 flexes its FP muscles here, being 7% faster than 805 despite the much higher clock. While double-precision floating-point workloads are uncommon on mobile/tablets, its use is increasing as more complex apps are ported. | |
Native FP32 (Float) Whetstone (GFLOPS) | 162 [+3%] | 157 | 136 | 108 | 73 | With FP64 VFP code, the 5433 is only 3% faster. | |
Despite its very high clock (+40%), both CPUs are pretty much within 3-7% of each other. Naturally 5433 also supports ARMv8 64-bit but is forced to run in legacy ARMv7 mode. | |||||||
Native Integer (Int32) Multi-Media (Mpix/s) | 22.48 Neon [+45%] | 15.5 Neon | 12.4 Neon | 10.7 Neon | 15.1 Neon | Krait never seemed to do very well with SIMD (Neon) code and here we see 5433 being 45% faster than 805, the largest we’ve seen so far. ARM has really improved SIMD performance in modern Cortex cores with A15 already handily beating Krait designs. Qualcomm needs to overhaul the SIMD units to remain competitive. | |
Native Long (Int64) Multi-Media (Mpix/s) | 4.3 Neon [+67%] | 2.57 Neon | 2.19 Neon | 1.86 Neon | 2.47 Neon | With 64-bit Neon workload we see 5433 pull ahead, 67% faster than the 805! For integer SIMD multi-media code, Cortex is the core to beat! | |
Native Quad-Int (Int128) Multi-Media (kpix/s) | 932 [+27%] | 730 | 681 | 520 | 577 | With normal int64 code, the 5433 still leads but that lead falls to 27%. It woud naturally do better in 64-bit mode if it were running an 64-bit OS. | |
Native Float/FP32 Multi-Media (Mpix/s) | 20.2 Neon [+25%] | 16.2 Neon | 14.13 Neon | 10.45 Neon | 13.57 Neon | Switching to floating-point Neon SIMD code, the 5433 is still 25% faster over 805. | |
Native Double/FP64 Multi-Media (Mpix/s) | 7.59 [+31%] | 5.78 | 4.6 | 3.89 | 4.16 | Switching to FP64 VFP code (Neon does support FP64 in ARMv8), 5433 is still 31% faster than 805. | |
Native Quad-Float/FP128 Multi-Media (kpix/s) | 301 [=] | 295 | 257 | 184 | 190 | In this heavy algorithm using FP64 to mantissa extend FP128, we finally have the 5433 slowing down just matching the 805. Cortex’s power is realised with SIMD code. | |
With highly-optimised Neon SIMD code, the Cortex A57 that powers 5433 makes mince-meat out of the 805’s Krait being between 25-67% faster despite the much lower clock speed. Qualcomm really needs to improve those SIMD units or risk being badly left behind. Naturally if the 5433 were running in ARMv8 64-bit mode the difference would be much higher. | |||||||
Crypto SHA2-512 (MB/s) | 118 Neon [+2.26x%] | 52 Neon | 45 Neon | 32 Neon | 67 Neon | Starting with this tough 64-bit Neon SIMD accelerated hashing algorithm, 5433 again flexes its SIMD muscles beating the 805 over 2x (2.26x faster). It shows just how much better modern Cortex cores are executing SIMD code. | |
Crypto AES-256 (MB/s) | 136 | 147 [+8%] | 130 | 90 | 146 | In this non-SIMD workload, the 805 manages to be 8% faster – a surprising result. While the 5433 does support AES HWA that is only in ARMv8 mode. | |
Crypto SHA2-256 (MB/s) | 332 Neon [+46%] | 227 Neon | 225 Neon | 148 Neon | 186 Neon | Switching to a 32-bit Neon SIMD, 5433 is on top again beating the 805 by 46%. Again, Cortex A5x does support SHA HWA but only in ARMv8 mode. | |
Crypto AES-128 (GB/s) | 216 [+30%] | 166 | 145 | 109 | 165 | Less rounds do seem to make a bit of a difference with 5433 now winning by 30% over the 805. | |
Crypto SHA1 (GB/s) | 362 Neon [+14%] | 315 Neon | 250 Neon | 213 Neon | 297 Neon | SHA1 is the “lightest” compute workload and here 5433 is only 14% faster. | |
Again in SIMD Neon code the 5433 shows its power, beating the 805 between 14-126% similar to what we saw in the Mandelbrot tests. Naturally 5433 also supports both AES and SHA HWA but only in ARMv8 mode which needs a 64-bit OS. Here x86 does better as all instruction sets are available in both x86 and x64 unlike ARM who conveniently seems to forget about the 32-bit world. | |||||||
Black-Scholes float/FP32 (MOPT/s) | 11.79 [+42%] | 8.29 | 5.36 | 5.4 | 6.12 | As this algorithm does not use SIMD, the 5433 still manages to handily beat the 805 by 42%. | |
Black-Scholes double/FP64 (MOPT/s) | 6.11 [+47%] | 4.14 | 2.83 | 3.24 | 3.28 | Switching over to FP64 code, the 5433 still manages to be 47% faster – the 805 just cannot get a break! | |
Binomial float/FP32 (kOPT/s) | 1.26 | 2.03 [+61%] | 1.76 | 1.29 | 1.98 | Binomial uses thread shared data thus stresses the cache & memory system; here finally we see the 805 pull ahead by 61%, a big win considering past results. | |
Binomial double/FP64 (kOPT/s) | 1.26 | 1.85 [+46%] | 1.71 | 1.53 | 2.41 | Switching to FP64 code the 805 still wins but by just 46%. It seems this is the kind of algorithm it prefers. | |
Monte-Carlo float/FP32 (kOPT/s) | 2.51 [+2x] | 1.26 | 1.42 | 1.04 | 1.14 | Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches; the fortunes are reversed again as 5433 is now 2x (twice) as fast as the 805. | |
Monte-Carlo double/FP64 (kOPT/s) | 1.87 [+2.08x] | 0.897 | 0.711 | 0.883 | 1.23 | And finall FP64 code does not make any difference, again the 5433 is 2x as fast. | |
The financial tests generally favour the 5433 which is between 40-100% faster than the 805, except in the “tough” binomial test where the 805 is between 40-60% faster. Even in VFP code the Cortex A5X is the core to beat! | |||||||
SGEMM (MFLOPS) float/FP32 | 3906 Neon [+9%] | 3579 Neon | 3644 Neon | 2626 Neon | 4889 Neon | In this complex Neon SIMD workload we would expect the 5433 to lead, and it does but only by 9%. It seems again that memory accesses slow it down and some of the 8 threads may be starving. | |
DGEMM (MFLOPS) double/FP64 | 1454 [+2.05x] | 707 | 797 | 547 | 531 | Neon does not support FP64 thus all CPUs use VFP code; here 5433 shows its power being over 2x (twice) faster than the 805. | |
SFFT (GFLOPS) float/FP32 | 720 Neon | 989 Neon [+37%] | 919 Neon | 620 Neon | 708 Neon | FFT also uses SIMD and thus Neon but stresses the memory sub-system more: as we saw in Binomial, the 805 in the lead by 37%. | |
DFFT (GFLOPS) double/FP64 | 457 | 586 [+28%] | 550 | 401 | 399 | With FP64 VFP code, the 805 still leads by 28%. It seems the memory sub-system of the 5433 lets it down. | |
SNBODY (GFLOPS) float/FP32 | 758 Neon [+80%] | 420 Neon | 331 Neon | 342 Neon | 465 Neon | N-Body simulation is SIMD heavy but has many memory accesses to shared data, but read-only – allows the 5433 to win again by 80%. It seems read-only data is not a problem, but read/modify/write is. | |
DNBODY (GFLOPS) double/FP64 | 339 [+46%] | 232 | 183 | 145 | 199 | With FP64 VFP code see the 5433 still winning but by just 46%. | |
The results mirror what we saw in the Financial tests: whenever thread-shared memory is used that is read/modified/written – the 5433 slows down, no doubt the extra 4 threads don’t help matters and likely slow it down. | |||||||
Inter-Core Bandwidth (MB/s) | 3994 [+10%] (but ~500/core) | 3599 (but ~899/core) | 2950 (but ~737/core) | 2133 (but ~533/core) | 1349 (but ~337/core) | One thing that Qualcomm does very well is memory performance, both CPU and GPU-wise. But here 5433 has 4 more cores which helps it muscle out its rival with 10% more bandwidth. But while it technically wins, the bandwidth per core is just ~500MB/s while 805 has ~899MB/s, almost 2x more bandwidth. We see how all these caches perform in the Snapdragon 805 Cache and Memory performance article. | |
Inter-Core Latency (ns) | 287 | 121 [-57%] | 118 | 162 | 128 | Latency, however, is much higher – or in other words 805 is 57% faster. It will be interesting to see whether this is due to different core transfer (e.g. big-2-LITTLE) or even between the same type (big-2-big / LITTLE-2-LITTLE). |
The 5433 with its modern Cortex A5X design as well as 8-theads walks all over the 805 despite being clocked much lower – especially in SIMD (Neon) tests it is up to 2x (twice) as fast. Only in algorithms that make extensive use of shared thread data and read/modify/write it – the 805 catches a break and is faster.
It will be interesting to see whether the extra 4 threads (aka little cores) just get in the way in these tests and put too much strain on the memory system; effectively it may be better to use just 4 threads (aka BIG cores). We will investigate this in a future article.
Software VM (.Net/Java) Performance
We are testing arithmetic and vectorised performance of software virtual machines (SVM), i.e. Java which is what Android and its apps are running. While key compute code will naturally be native, the rest of the code will naturally run on the JVM.
Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.
Environment: Android 5.x.x, latest updates (May 2015).
While native code showed some surprises, here the 5433 is the undisputed champion – beating the 805 in all tests by a wide margin of 50% to over 100% (2x as fast). For pure Java apps the 5433 should feel a lot faster.
SiSoftware Official Ranker Scores
Final Thoughts / Conclusions
It is not really a surprise that the latest ARMv8 64-bit 8-core Cortex A57+A53 (albeit running in 32-bit ARMv7 mode) in Exynos 5433 would dominate the ageing Krait 400-series core in Snapdragon 805 – but the latter’s 40% higher clock could have thrown a few “wobblies”.
Unlike earlier big.LITTLE designs, all 8-cores can be used simultaneously – but this may actually present a problem when using static work allocators as the “big” cores may wait for the “LITTLE” cores to finish – in effect having 8 little cores. We will be exploring the differences in performance when using just the “big” cores, just the “LITTLE” cores or all in a future article.
It is naturally a pity that the 5433 does not use a 64-bit Android version and thus benefit from all the ARMv8 improvements, not to mention new instruction sets like AES HWA, SHA HWA, FP64 Neon and so on. It seems that Samsung (like other vendors we may add) may never actually release a 64-bit OS/ROM for it – and thus the 5433 like other Cortex A5x SoCs are destined to run 32-bit for their whole life… Without 64-bit binary drivers there may not be a way for 3-rd party developers (modders?) to make a 64-bit OS either…
However, even under these circumstances the Note 4-powered Exynos is the most powerful Note (CPU-wise) – though the roles seem to be reversed when comparing the GPUs as we saw in the previous article Exynos 5433 (Mali) GPGPU performance. Thus the decision as to which Note 4 to choose is more difficult – do you want CPU or GPU power? As lots of compute tasks are moving to GPGPU (even on tablet/phones) – we would lean towards GPU prowess… Don’t forget to consider memory performance which we’re invesigating in the next article Exynos 5433 Cache and Memory performance.