Raspberry Pi 4B Review: Windows Arm64 on Broadcom BCM2711

What is “Raspberry Pi 4B”?

It is the 4th generation of the Raspberry Pi series of single-board computers (SBC) that can be consider to have single-handedly restarted the enthusiast computer revolution of the 1980s. Selling millions of units per year, it is the best selling UK computer and used far and wide for many projects.

Unlike x86 world, where main-boards need discrete CPUs, graphics boards, memory modules, etc. to function – in the ARM world this is not common, with single-board containing not just the SoC (CPU, graphics) but also soldered memory, Ethernet, WiFi/Bluetooth that are not upgrade-able.

The Pi has always used custom-made Broadcom SoCs based on the ARM architecture (BCM series) – with the Pi4 now using AArch64 64-bit Cortex-A cores. While previous Pi versions were limited to 1GB memory, the Pi 4 with 2-8GB can now run a wider range of operating system and applications – including Windows Arm64.

What is “Windows Arm64”?

It is the 64-bit version of “desktop” Windows 10/11 for AArch64 ARM devices – analogous to the current x64 Windows 10/11 for Intel & AMD CPUs. While “desktop” Windows 8.x has been available for ARM (AArch32) as Windows RT – it did not allow running of non-Microsoft native Win32 applications (like Sandra) and also did not support emulation for current x86/x64 applications.

We should also recall that Windows Phone (though now dead) has always run on ARM devices and was first to unify the Windows desktop and Windows CE kernels – thus this is not a brand-new port. Windows was first ported to 64-bit with the long-dead Alpha64 (Windows NT 4) and then Itanium IA64 (Windows XP 64) which showed the versatility of the NT micro-kernel.

By contrast, Windows 10/11 Arm64 is able to run both native AArch64 applications (compiled for Arm64 like Sandra) as well as emulating x86/x64 and also ARM (AArch32) native applications through WOW (Windows on Windows emulation ). While it does come with native versions of in-box drivers for many peripherals/devices – it may not a driver for very new peripheral/device and the manufacturers are unlikely to provide support for Arm64. For some devices, standard in-box “class” drivers (e.g. NVMe controller/SSD, AHCI controller/SSD, USB controller, keyboard, mouse, etc.) do work but otherwise a driver is required.

How do I get Windows Arm64 on Raspberry Pi 4B?

Thanks to a team of extremely talented developers (far more talented than us) and lots of work – firmware (BIOS/UEFI), drivers and even an installer automatically creates a bootable micro-SD or USB drive:

What hardware do I need to run Windows Arm4 on Raspberry Pi 4B?

In addition to the Pi itself, we recommend additional hardware as Windows is quite a demanding OS:

While the 2GB Pi 4 version has just about enough memory to run Windows 10/11, more memory is good for disk caching with the 4GB version likely best value. While original firmware/drivers required limiting memory to 3GB – current firmware/drivers can use up to the full 8GB.

The most important thing is the storage as most micro-SD cards do not have the I/O small-block performance necessary; A2-rated micro-SD cards require updated SD controllers that the Pi does not have. Your best bet is using the USB 3.0 controller with either a SATA or NVMe SSD or a high-performance USB stick.

Despite firmware updates, the Pi 4, especially overclocked, does require decent cooling – thus use either an all-metal case (if you prefer no noise) or a fan/heatsink combo should you want to overclock higher. Do note that metal cases do attenuate the WiFi/Bluetooth signal from the internal antenna – however you are best using the Ethernet connectivity for best network performance.

Raspberry Pi 4 SoC Details

  • Broadcom BCM2711 SoC custom-made for the Pi Foundation
  • ARMv8-A (Arm64) 64-bit core
  • 4x ARM Cortex A72 “big-cores” (vs. Cortex A53 “little-cores” in Pi 3B)
  • 1.5GHz base clock, generally over-clockable to 2GHz
  • 16nm++ process
  • Unified 1MB L2 cache (vs. 512kB on Pi 3B)
  • USB 3.0 controller (vs. USB 2.0 on Pi 3B) enabling much faster I/O
  • USB-C power port – only 5v up to 3A

The (arguably) most important upgrade vs. Pi 3B(+) is the larger memory versions (2, 4, 8GB) that allow the running of far heavier operating systems (OS) like Windows that are not really usable with just 1GB memory. In addition, the new USB 3.0 controller allows the use of much faster storage – far beyond the standard SD controller that all PIs generally use.

ARM Advanced Instructions Support

SIMD: As in the x86 world, ARM supports SIMD instructions called “NEON” operating on 128-bit width registers – equivalent to SSE2. However, there are 32 of them – while x64 SSE/AVX only provide 16 – until AVX512 which also provides 32. In most algorithms we can use them in batches of 4 – effectively making them 512-bit!

Unlike newer cores, the older A7X cores do not support SVE (“Scalable Vector eXtensions” – the successor of NEON), although current designs are still just 128-bit width although they do provide more flexibility especially when implementing complex algorithms.

Crypto: Similar to x86, ARM does provide hardware-accelerated (HWA) encryption/decryption (AES, SM3) as well as hashing (SHA1, SHA2, SHA3, SM4) – but the Broadcom SoC cores have these features disabled! Thus the Pi does not perform well as a crypto device.

Virtualisation: The ARM cores do have hardware virtualisation – but currently the UEFI firmware does not provide the required fuctionality to enable Hyper-V (which is not publicly available anyway). You will need VmWare ESXi ARM edition or KVM (e.g. Proxmox 7 PVE) and then perhaps try to run Windows 10/11 Arm64 on it!

Security Extensions: ARM cores since ARMv6 (!) have included TrustZone secure virtualisation. AMD’s own recent CPUs all contain an ARM core (Cortex A5?) supporting TrustZone handling the security functionality (e.g. PSP / firmware-emulated TPM). See our article
Crypto-processor (TPM) Benchmarking: Discrete vs. internal AMD, Intel, Microsoft HV.

Changes in Sandra to support ARM

As a relatively old piece of software (launched all the way back in 1997 (!)), Sandra contains a good amount of legacy but optimised code, with the benchmarks generally written in assembler (MASM, NASM and previously YASM) for x86/x64 using various SIMD instruction sets: SSE2, SSSE3, SSE4.1/4.2, AVX/FMA3, AVX2 and finally AVX512. All this had to be translated in more generic C/C++ code using templated instrinsics implementation for both x86/x64 and ARM/Arm64.

As a result, some of the historic benchmarks in Sandra have substantially changed – with the scores changing to some extent. This cannot be helped as it ensures benchmarking fairness between x86/x64 and ARM/Arm64 going forward.

For this reason we recommend using the very latest version of Sandra and keep up with updated versions that likely fix bugs, improve performance and stability.

CPU Performance Benchmarking

In this article we test CPU core performance; please see our other articles on:

Hardware Specifications

We are comparing the Raspberry Pi with Atom x64 processors of similar vintage – all running current Windows 10, latest drivers.

Specifications Raspberry Pi 4B (BCM2711) Raspberry Pi 3B+ (BCM2837) Intel Celeron J3455 (Apollo Lake) Intel Pentium Silver N5030 (Gemini Lake) Comments
Arch(itecture) Cortex A72 (Arm64) 16nm Cortex A53 (Arm64) 28nm Atom “Apollo Lake” (APL) (x64) 14nm Atom “Gemini Lake” (GML) (x64) 14nm Big Arm-core vs. Little x86
Launch Date
2019 2016 2016 Q3 2019 Q4 Similar age
Cores (CU) / Threads (SP) 4C / 4T 4C / 4T 4C / 4T 4C / 4T Same number of threads
Rated Speed (GHz) 1.5 1.2 1.5 1.1 Similar base clock
All/Single Turbo Speed (GHz)
2.0-2.2 1.4-1.6 2.3 3.1 Much lower turbo
Rated/Turbo Power (W)
4-6 3-5 10 / 14 6 / 14 Far less power for whole SBC
L1D / L1I Caches 4x 32kB 2-way | 4x 48kB 3-way 4x 32kB 4-way | 4x 32kB 2-way 4x 32kB 4-way | 4x 32kB 4-way 4x 32kB 4-way | 4x 32kB 4-way Similar L1 caches
L2 Caches 1MB 16-way 512kB 16-way 2x 1MB 4MB Atom has 2-4x bigger L2
L3 Cache(s) n/a n/a n/a n/a None have L3
Microcode (Firmware) 1.37 1.37 0506CA-1C 0706A8-16 Updates keep on coming
Special Instruction Sets
v8-A, VFP4, TZ, Neon v8-A, VFP4, TZ, Neon AES, VT-x, VT-d, SSE4.2 AES, VT-x, VT-d, SSE4.2 No crypto on BCM – big loss
SIMD Width / Units
128-bit 128-bit 128-bit 128-bit Same width
Price / RRP (USD)
$55 (whole SBC) $35 (whole SBC) $31 (CPU only) ~$41 (CPU only) Pi price is for whole BMC! (including memory)

Disclaimer

This is an independent article that has not been endorsed nor sponsored by any entity (e.g. Raspberry, Broadcom, Intel, etc.). All trademarks acknowledged and used for identification only under fair use.

The article contains only public information (available elsewhere on the Internet) and not provided under NDA nor embargoed.

Note: We (SiSoftware) claim copyright over the scores (benchmark results) posted to the Ranker. Please see:
Privacy: Who owns the data (scores) posted to the Ranker?

And please, don’t forget small ISVs like ourselves in these very challenging times. Please buy a copy of Sandra if you find our software useful. Your custom means everything to us!

Native Performance

We are testing native arithmetic, SIMD and cryptography performance using the highest performing instruction sets, on both x64 and Arm64 platforms.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64/Arm64, latest drivers. 2MB “large pages” were enabled and in use. Turbo / Boost was enabled on all configurations where supported.

Native Benchmarks Raspberry Pi 4B (BCM2711) 1.5GHz 4W
Raspberry Pi 4B (BCM2711) 2GHz 6W
Intel Celeron J3455 (Apollo Lake) 10W Intel Pentium Silver N5030 (Gemini Lake) 6W
Comments
CPU Arithmetic Benchmark Native Dhrystone Integer (GIPS) 25 33.2 [-17%] 40 42 Pi4 starts just under 20% slower.
CPU Arithmetic Benchmark Native Dhrystone Long (GIPS) 25 32.76 [-18] 40 42 A 64-bit integer workload is the same.
CPU Arithmetic Benchmark Native FP32 (Float) Whetstone (GFLOPS) 18.6 26.24 [-7%] 26.89* 32.48* With floating-point, Pi4 is only 7% slower.
CPU Arithmetic Benchmark Native FP64 (Double) Whetstone (GFLOPS) 18 23.31 [+25%] 18.66* 27* With FP64 it beats APL by 25%!
In these standard C legacy tests, the Pi4 performs quite well against Atom, considering the latter can turbo much higher. When overclocked (OC’d) to 2GHz, it almost matches the old “ApolloLake” (APL) Atom running 20% faster.

The newer “GeminiLake” (GML) Atom can turbo 2x higher than Pi4 (3.1Ghz vs. Pi 4 base of 1.5) and thus performs much better. The performance/clock is similar while the Pi4 total power is less than what only the Atom SoC consumes.

Note*: using SSE2-3 SIMD processing.

BenchCpuMM Native Integer (Int32) Multi-Media (Mpix/s) 25 35.42 [1/3x] 92* 105* Atom is 3x faster than even OC Pi4.
BenchCpuMM Native Long (Int64) Multi-Media (Mpix/s) 17.5** 24.61** [-28%] 34* 39* With a 64-bit, Pi4 is just 28% slower.
BenchCpuMM Native Quad-Int (Int128) Multi-Media (Mpix/s) 2.3** 3.13** [-33%] 4.63* 5.73* Using 64-bit int to emulate Int128 Pi4 is 33% slower.
BenchCpuMM Native Float/FP32 Multi-Media (Mpix/s) 34.72** 53** [-20%] 66* 70.6* In this floating-point vectorised test Pi4 is 20% slower.
BenchCpuMM Native Double/FP64 Multi-Media (Mpix/s) 20.36** 28.25** [-24%] 37* 37.8* Switching to FP64 Pi4 is 24% slower.
BenchCpuMM Native Quad-Float/FP128 Multi-Media (Mpix/s) 0.87** 1.3** [-30%] 1.85* 2.51* Using FP64 to mantissa extend FP128 Pi4 is 30% slower.
With heavily vectorised SIMD workloads – Pi4 even at 2GHz is between 20-30% slower than old Atom and pretty much outclassed by new Atom. It seems the Windows as a platform is not as optimised for Arm64 as for x64 with more work needed to generate performant code.

Still ARM/Neon is holding its own against SSE2-4/x64 on these older Atoms. However, with the latest Atom cores (as deployed in hybrid AlderLake/RaptorLake) supporting AVX2/FMA3 for the first time – ARM will have SVE-enabled cores to counter.

The Raspberry Pi has never used the very latest ARM cores in their designs, with the A72 already 2 years old by the time Pi 4 has launched – it is meant to be a cheap SBC and not desktop/laptop class design. That falls to companies like Qualcomm as used in Samsung Galaxy Book range of devices.

Note*: using SSE2/4 128-bit (or higher) SIMD processing.

Note**: using NEON 128-bit (or higher) Advanced SIMD processing.

BenchCrypt Crypto AES-256 (GB/s) 0.24 0.33 [1/10x] 3.96* 5.87* No hardware acceleration, Pi is 1/10 of Atom.
BenchCrypt Crypto AES-128 (GB/s) 0.34 0.45 No change with AES128.
BenchCrypt Crypto SHA2-256 (GB/s) 0.4 0.53 [1/3x] 1.61** 1.85** No SIMD, Pi is only 1/3 of Atom,
BenchCrypt Crypto SHA1 (GB/s) 0.56 0.75 Less compute intensive SHA1.
BenchCrypt Crypto SHA2-512 (GB/s) 0.625 SHA2-512 is not accelerated by SHA HWA.
Allegedly for licensing/cost reasons, the BCM range as used in the Pi does not enable the AES nor SHA crypto hardware-acceleration (HWA) instructions – even though pretty much all other ARMv8 cores include it! This means we are forced to use software emulation which is about 10x (ten times) slower which is a big shame.

In the meantime, we are converting the multi-buffer SSE4 hashing code to NEON which should greatly improve performance by hashing 4x buffers simultaneously. This should match the Atom, although the relatively low memory bandwidth (LP-DDR4) may hinder performance to some extent.

Note*: using AES HWA (hardware acceleration).

Note**: using multi-buffer SSE4 (4x) hashing.

Note***: using SHA HWA (hardware acceleration).

Note****: using multi-buffer Neon (4x) hashing.

CPU Multi-Core Benchmark Inter-Module (CCX) Latency (Same Package) (ns) 85 76 74.1 [+2%] 87.4 Similar latency to Atom.
Without SMT and single cluster, there is no difference between thread-pairings (no “best”/”worst” case) and and we have an unified L2.

Even at base clock the latency is competitive with Atom and greatly reduces at 2GHz. But without SMT, L1D caches cannot be used for fast inter-thread transfers and the L2 latency seems to be relatively high (inter-lock read/check/modify). We shall see later how the unified L2 cache performs.

CPU Multi-Core Benchmark Total Inter-Thread Bandwidth – Best Pairing (GB/s) 2.5** 3.12** [+27%] 2.44* 3.92* Pi4 manages to beat the old Atom and almost match the new one.
Without SMT and single cluster, there is no difference between thread-pairings (no “best”/”worst” case) and and we have an unified L2.

Clocked at 2GHz, the Pi 4 with its new(er) inter-core bus, the Pi4 does really well here with great inter-core bandwidth that beats old APL Atom by ~30% and is just below the new Atom. However, without SMT the L1D caches cannot be used for inter-core transfers and the L2 cache does not seem to be that fast. We shall see later how the unified L2 cache performs.

Note:* using SSE 128-bit wide transfers.

Note**: using NEON 128-bit wide transfers.

BenchFinance Black-Scholes float/FP32 (MOPT/s) 18.85 24.66 Black-scholes is un-vectorised and compute heavy.
BenchFinance Black-Scholes double/FP64 (MOPT/s) 15.36 20.43 [-4%] 21.1 26.51 Using FP64 Pi4 matches Atom.
BenchFinance Binomial float/FP32 (kOPT/s) 9.55 12.75 Binomial uses thread shared data thus stresses the cache & memory system.
BenchFinance Binomial double/FP64 (kOPT/s) 4.18 5.53 [-33%] 8.32 9.23 With FP64 Pi4 is 33% slower.
BenchFinance Monte-Carlo float/FP32 (kOPT/s) 6.39 8.54 Monte-Carlo also uses thread shared data but read-only thus reducing modify pressure on the caches.
BenchFinance Monte-Carlo double/FP64 (kOPT/s) 2.63 3.5 [-70%] 11.33 14.03 Switching to FP64 Pi4 is a lot slower here.
With non-SIMD financial workloads, similar to what we’ve seen in legacy floating-point code (Whetstone), Pi4 does pretty well in some algorithms – though those involving thread-sharing data (Binomial, Monte-Carlo) seem to take a big hit.

In any case, such code is these days best offloaded to the GPU via one of the GP-GPU interfaces (OpenCL, Vulkan, DirectX Compute, etc.) – but the Pi4 does not have a “proper” video driver with compute acceleration we cannot really test its prowess.

BenchScience SGEMM (GFLOPS) float/FP32 17.28** 18.17** In this tough vectorised algorithm Pi4 does well
BenchScience DGEMM (GFLOPS) double/FP64 6.15** 6.55** [-36%] 10.1* 11.01* With FP64 vectorised code, Pi4 is 36% slower.
BenchScience SFFT (GFLOPS) float/FP32 0.652** FFT is also heavily vectorised but memory dependent.
BenchScience DFFT (GFLOPS) double/FP64 0.506** 0.48** [1/5x] 2.07* 2.5* With FP64 code, Pi4 is 1/5 the Atom performance.
BenchScience SN-BODY (GFLOPS) float/FP32 16.5** N-Body simulation is vectorised but with more memory accesses.
BenchScience DN-BODY (GFLOPS) double/FP64 4.38** 5.81** [82%] 3.18* 7.21* With FP64 Pi4 is finally 82% faster than Atom.
With highly vectorised SIMD code (scientific workloads), it is clear that a lot of work is needed to optimise code for ARM to get it to match x86/x64 in performance. In some algorithms (GEMM, N-BODY) it is doing well but in memory latency bound algorithms that also stress unified caches (L2 here) (FFT) – performance does suffer.

Note*: using SSE2-4 128-bit SIMD (or wider).

Note**: using NEON 128-bit SIMD (or wider).

CPU Image Processing Blur (3×3) Filter (MPix/s) 123 165 [+3%] 159* 161* In this vectorised integer workload Pi4 matches Atom.
CPU Image Processing Sharpen (5×5) Filter (MPix/s) 41.83** 57.13** [-4%] 59* 64* Same algorithm but more shared data  Pi is 4% slower.
CPU Image Processing Motion-Blur (7×7) Filter (MPix/s) 21.6** 29** [-13%] 33* 36* Again same algorithm but even more data shared 13% slower.
CPU Image Processing Edge Detection (2*5×5) Sobel Filter (MPix/s) 30.11** 40.1** [-9%] 44* 50* Different algorithm but still vectorised workload Pi4 is 9% slower.
CPU Image Processing Noise Removal (5×5) Median Filter (MPix/s) 2.26** 3** [-35%] 4.58* 4.88* Still vectorised code Pi4 is 35% slower.
CPU Image Processing Oil Painting Quantise Filter (MPix/s) 2.46** 3.37** [-16%] 4* 4.05* In this tough filter, Pi4 is just 16% slower.
CPU Image Processing Diffusion Randomise (XorShift) Filter (MPix/s) 132** 162.9** [-34%] 238* 320* With 64-bit integer workload, Pi4 is 1/3 slower than Atom.
CPU Image Processing Marbling Perlin Noise 2D Filter (MPix/s) 24.72** 33** [-25%] 44* 49* In this final test (scatter/gather) Pi4 is 1/4 slower than Atom
We know these benchmarks *love* SIMD, with AVX2/AVX512 always performing strongly – and while Pi4 does not have the versatility of SSE2/SSE4, it does very well with NEON – and is just 10-16% slower in most tests. In tests involving scatter/gather and thus memory latency bound – the Pi4 does less well and is 25-35% slower than the old Atom.

Again, these days such image-processing algorithms are offloaded to the GPU and unlikely to be run on the CPU – except for very complex non-linear filters – thus performance is acceptable.

Note*: using SSE2-4 128-bit SIMD (or wider).

Note**: using NEON 128-bit SIMD (or wider).

Aggregate Score (Points) 260 340 [1/2x] 680 790 Across all benchmarks, Pi4 is 50% slower than Atom.
Perhaps surprising despite good performance, Pi4 over-clocked to 2GHz still scores only 50% of the old Atom APL, although there are early days; with optimisations we are confident the score will only improve – while Atom is pretty much fully optimised and unlikely to extract more performance. Additional SIMD/Neon code paths (e.g. hashing) to compensate for the lack of hardware acceleration will also increase the score significantly.
Price/RRP (USD) $55 (whole SBC) $55 (whole SBC) $31 (tray only), ~$80 board + $20 4GB DDR3 $41 (tray only), ~$130 board + $20 4GB DDR3 Pi4 cost is for the whole SBC (including memory!) while Atom is for SoC only.
Price Efficiency (Perf. vs. Cost) (Points/USD) 4.73 6.18 [-10%] 6.80 5.27 Pi4 even OC ends up 10% less value.
Despite the Atom requiring a mainboard (with the SoC + heatsink built-in) and 4GB DDR3 SO-DIMM stick, the low aggregate score (1/2) makes even the overclocked Pi4 less value-for-money, although we’re only talking 10%. The Pi4 board is far smaller, and likely the cheapest 4GB well-supported SBC you can buy and run Windows 10/11 on.
Power/TDP (W) 4W (whole SBC) 6W (whole SBC) 10W SoC, ~25W whole 12W SoC, ~25W whole Pi4 is about 1/2 the power of the Atom SoC and 1/4 of whole board + SoC.
Power Efficiency (Perf. vs. Power) (W) 65 56.7 [+77%] 27 31.6 The Pi4 is impossible to beat at 1/4 the TDP.
The Pi4 SBC power usage at stock is just around 4-5W which represents tremendous power efficiency for a 64-bit 4GB computer that can run Windows 10/11. While over-clocking at 2GHz tremendously improves performance, the power 6W+ actually reduces power efficiency somewhat, still 77% better than Atom.

SiSoftware Official Ranker Scores

Is x86 Dead?

We think: not yet! But the signs are not good…

The Pi 4B with its relatively old Cortex A72 cores is not a competitor x86, but might still replace the need for no-name/generic x86 little boxes – generally based on ancient Intel Atom Z8000, J1900, AMD G-Series or similar. While many of them are used for networking purposes (e.g. router/gateway/VPN) – a huge number are bought and used to run Windows as a cheap mini-computer! These are far more expensive, consume far more power, are less reliable and not much faster than the Pi 4B – not to mention even less supported (BIOS update anyone? ME/TXE firmware updates?) than the unofficial Pi running Windows!

They are nothing but e-waste – which is bad for all of us. For such simple desktop uses – browsing, word-processing, media-consumption, simple games (Crysis?) – the Pi 4B could fulfill the role so much better. Cheaper, less power hungry, less noisy (with passive case cooling), more secure, more reliable – and should you decide to repurpose it, there are many, many projects you could use it for. Or sell it / donate it – the Pi keeps its resale value similar to Apple devices.

Final Thoughts / Conclusions

Raspberry Pi4: Now running Windows (and Sandra) natively!

The Raspberry Pi range has been a huge success, there is no question about that. But due to its ARM platform – it had always run a flavour of Linux (can also run FreeBSD, etc.). Windows, for all its issues – it is still the desktop leader and this does not seem to be changing any time soon. While there have been ARM versions of Windows going back decades (Windows CE, Windows Mobile), the kernel/Win32 API were unified with Windows Phone (now dead), and we had actual Windows tablets with Windows RT (also dead) – we finally have desktop 64-bit ARM64 Windows with feature parity with standard 64-bit x64 Windows.

Through emulation it was possible to run x86 code, but with just 1GB of memory and micro-SD for storage, while Pi 3B(+) could just about run Windows Arm64 (and we test that in Raspberry Pi 3B+ 1GB Review: Windows Arm64 on Broadcom BCM2837 article) – it was a “proof of concept” rather than usable system. The Raspberry Pi 4B with 4-8GB of memory and USB3 storage has changed all that.

But how is the performance for native ARM64 applications? We expect emulation to be slow (for current x86/x64 applications) – but how about native?

At stock/base clock – even with its 4-cores – the Pi 4B is slow against similar vintage Intel Atom CPUs – but those turbo much higher. With a good cooling system, it seems 2GHz is achievable without voiding warranty – and then the Pi starts trading blows with the older Atoms. If you ever used a Windows tablet/laptop with an Atom CPU – then the Pi 4B performs similarly and is very much usable. If you have a Pi 4B gathering dust for whatever reason – bought for a project that never happened – you can always try Windows on it.

The power efficiency (the whole thing consumes just 4W!), price efficiency (the whole thing including 4GB LP-DDR4 costs just approx. $55) is really 2nd to none. But beyond that – unlike copies/competitors (whether ARM or x86 based) – the Pi is a de-facto standard – with huge software developer support (just about every project under the sun has a Pi version of its software), countless add-ons/hats and other devices – making it far more useful than just any other board.

The other news is that Windows 10/11 for ARM64 performs well and is ready for desktop-class ARM processors – perhaps not from Raspberry Pi/Broadcom – but Qualcomm, Samsung, MediaTek and all the other “usual suspects” that currently make SoCs for phones/tablets. Using modern X1 or X2 ARM cores – rather than the much older Cortex A72 here – we could have a worthy x86/x64 competitor – perhaps not desktop class but certainly tablet/mobile.

We have also seen ARM-based new Apple Mac (M1 SoC) computers able to run native Windows 10/11 Arm64 through Parallels virtualization software! Perhaps in the future we will see the ARM64 version of Bootcamp – and thus be able to natively boot/run Windows on Mac again. Apple’s M1’s cores are (arguably) more powerful than ARM’s own – and the most serious competitor to Intel and AMD on laptop/tablet platform.

Raspberry Pi4: Now running Windows (and Sandra) natively!

Further Articles

Please see our other articles on:

Disclaimer

This is an independent article that has not been endorsed nor sponsored by any entity (e.g. Raspberry, Broadcom, Intel, etc.). All trademarks acknowledged and used for identification only under fair use.

The article contains only public information (available elsewhere on the Internet) and not provided under NDA nor embargoed.

Note: We (SiSoftware) claim copyright over the scores (benchmark results) posted to the Ranker. Please see:
Privacy: Who owns the data (scores) posted to the Ranker?

And please, don’t forget small ISVs like ourselves in these very challenging times. Please buy a copy of Sandra if you find our software useful. Your custom means everything to us!

Tagged , , , , , . Bookmark the permalink.

Comments are closed.