Crypto-processor (TPM) Benchmarking: Discrete vs. internal AMD, Intel

What is a Crypto-Processor (TPM)?

A crypto-processor or TPM (“Trusted Platform Module”) is a secure unit (or interface) designed to perform cryptographic functions in a secure manner. While in the past it was implemented by a hardened micro-controller (chip) that had to be installed onto the mainboard, it is now typically built-in the main processor (e.g. AMD PSP) or platform hub (e.g. Intel ME in PCH).

What is the purpose of the Crypto-Processor?

Today the crypto-processor performs more and more functions securing the main system (processor, memory, disk) against various attack vectors – both external and internal. This includes:

  • True random number generator (that can generate unpredictable numbers)
  • Securely store digital keys and certificates (especially private keys)
  • Attestation of both hardware and software (ensure system integrity)
  • Authenticate devices and programs/apps (access, licensing, etc.)

A TPM enables modern computer functionality that has become common to ensure user privacy and security – but also the interests of providers of digital assets:

  • Secure boot: ensure viruses and rootkits don’t hijack the boot process.
  • Disk encryption: ensure “at rest” data on disk is encrypted and cannot be accessed in another system.
  • Memory isolation: ensure passwords, bio-metric data is not stolen or spoofed.
  • Software protection: ensure programs/apps data is not modified e.g. cracking, mods, cheating.
  • Protected display/audio/video path: ensure sensitive data is not copied or dumped.
  • Digital asset licensing: ensure data cannot be accessed without a licence.

Why benchmark the Crypto-Processor (TPM)?

While the crypto-processor is not designed for high-speed operations (it is not a crypto-accelerator) its increasing use in today’s modern systems warrants testing including performance evaluation. On modern systems (AMD, Intel) is usually implemented though the chipset (platform controller) – this allows us to determine (any) changes to internal platform controller (PCH, PSP, etc.) that are usually undocumented and highly classified even at standard NDA levels (green, orange).

How does Intel implement its crypto-processor (fTPM)?

Intel provides TPM through the PTT (“Platform Trust Technology”) that provides firmware emulation running on the ME (“Management Engine”) / CSME (“Converged Security Management Engine”) which is an internal micro-processor unit inside the chipset/PCH (“Platform Controller Hub”). [note not on the processor]

On modern systems, e.g. Skylake (SKL) ME 11.8+ it is implemented by a x86 Intel Quark processor (a miniature P5-“Pentium” derivative) running a version of Minix operating system. On older systems, ME 9 and older, it was implemented by an ARC Core (Argonaut) running a version of ThreadX RTOS (a real-time operating system).

As with all software – ME firmware needs to be kept up-to-date and that is usually included with BIOS updates, but can easily be updated separately using the public ME tools and firmware (update region).

How does AMD implement its crypto-processor (fTPM)?

AMD provides TPM through firmware emulation running on the the PSP (“Platform Security Processor”) that is built-in modern AMD processors. [note internal to processor]

On current systems, e.g. Ryzen (ZEN) PSP 3.0 is implemented by an ARM (Cortex A5?) core with TrustZone running an undisclosed operating system.

PSP firmware also needs to be kept up-to-date and as it is part of the AGESA (“AMD Generic Encapsulated Software Architecture“) it is included with BIOS updates as it can be somewhat more difficult to build “modded” (modified) BIOS with updated firmware.

How does Microsoft implement its crypto-processor (vTPM)?

Windows Server 2016 and later (2019, 2022) Hyper-V hypervisor provides virtual crypto-processor (vTPM) to virtual machine (VM) guests. This is implemented using Isolated User Mode (IUM) that stores the TPM data inside Virtualisation Based Security (VBS) of the host. Note that this does *not* depend on the host having a TPM itself! [runs completely inside host]

Shielded Gen 2 VMs can be deployed using vTPM – enabling encrypted state and migration traffic – but also within-VM standard security functions like full disk encryption (e.g. BitLocker for Windows), memory/core isolation, etc. This feature (tries to) prevents data stealing through VM copying, migration traffic monitoring, etc. – when deploying VMs using someone else’s cloud infrastructure. Naturall,y the Hyper-V host itself should be encrypted/secured – otherwise the VM TPM state can itself be stolen.

As this vTPM implementation runs on the host – its performance depends entirely on the speed of the host’s processor(s)/memory – and may even be able to take advantage of hardware-accelerated crypto instructions (AES, SHA, etc.) provided by the host’s processor(s).

Discrete implementations

Discrete TPM units are available from main component vendors (ST Micro, Atmel, Infineon, Nuvoton, etc.) are still found today on business/enterprise hardware but have been largely phased out in favour of built-in firmware implementations. For consumer grade hardware (mainboards) – they have generally been optional – requiring a compatible module to be purchased and installed.

This meant TPMs were largely never found on consumer hardware – not even laptops/tablets (Windows) that would not have been able to use disk/storage encryption despite other phones/tablets (Apple, Android, etc.) having such functionality as standard.

Note that even discrete units contain firmware that may need to be updated: case to point is “Infineon 9665” tested here (present in 2 of our older systems) that contained a vulnerability (“ROCA” 2017) which meant is was effectively banned by operating systems (e.g. Windows) until its firmware was updated. While it is was widely used as discrete modules by many OEMs (e.g. Asus) – they did not bother to provide any updates nor RMA for affected modules – thus rendering them worthless.

Crypto-Processor Benchmarking Considerations

As TPMs are implemented using relatively simple processors, only small data can be transferred – typically 1kB or less (as low as 32-bytes!). Our TPM API calls, the I/O or “function call overhead”: user-mode application <> Operating System (Windows) <> kernel-mode driver <> TPM can be significant – especially with all the vulnerability mitigations enabled on today’s operating systems (e.g. “Meltdown” on Intel, “Spectre”, etc.).

Running on newer processors that are not affected by the “old” vulnerabilities and thus do not need mitigations enabled – the TPM can appear very much faster simply due to much faster I/O to the device itself. While we can adjust for some call overhead, due to the complex nature of systems today it is not always possible to reliably determine it (e.g. due to power management transitions, pre-emption by other tasks, etc.) in which case it is left in just as when measuring the performance of other devices (e.g. disks, GP-GPUs, graphics, etc.) where we do not access the hardware directly but through operating system API calls.

In this article we test TPM cryptographic performance; please see our other articles on:

Hardware Specifications

We are comparing the various discrete hardware TPM units with firmware emulated versions across various processors/chipsets:

TPM Specifications Intel fTPM 2.0 PTT STMicro ST33TPHF20 TPM 2.0 Infineon SLB 9665 TPM 2.0 AMD fTPM 2.0 PSP Comments
Type Firmware (ME) Discrete Discrete Firmware (PSP) Both main vendors offer firmware.
Location
Platform Hub (PCH) Mainboard Mainboard Processor (built-in) PCH vs. CPU.
Spec. Version
2.0 2.0 2.0 2.0 All offer latest v2.0 spec.
Spec. Year / Revision
2018 / 1.38 2018 / 1.38 2016 / 1.16 2018 / 1.38 Infineon somewhat old.
PCR (Register Banks) 24 24 24 24 Minimum required.
Algorithms Implemented 25 25 19 25 Infineon somewhat lacking.
Encryption Algorithms
AES 128/256 AES 128/256 AES 128/256 AES 128/256 AES, RSA supported
Operation Modes
CTR, OFB, CBC, CFB, ECB CTR, OFB, CBC, CFB, ECB CFB only CTR, OFB, CBC, CFB, ECB No CBC on Infineon, all missing GCM.
Hashing Implemented SHA1, SHA2-256 SHA1, SHA2-256 SHA1, SHA2-256 SHA1, SHA2-256 SHA256 supported but not SHA512.
Counters 128 16 8 n/a AMD does not provide counters.
Sessions 64 64 16 64 Standard support.
Firmware Version 403.1.0.0* 74.64.17568.6659 5.62.12.13824** 3.51.0.5* Latest updates installed.
Processor Intel Quark x86 ~400MHz ARM SC300 SecureCore Undisclosed 16-bit ARM Cortex A5 TrustZone One 16-bit and rest 32-bit

Note*: firmware on AMD/Intel implementation depends on main ME/PSP firmware and is updated through it.

Note**: as mentioned this is the fixed firmware – older versions are banned by modern operating systems. Please buy from OEMs that support their products and do provide firmware updates.

Hashing: while all support SHA2-256, none yet support SHA2-512 (the 64-bit variant) nor SHA3.

Block crypto: while all support AES/CFB – Infineon is missing AES/CBC, AES/CFB, AES/CTR also, and none support more compute intensive AES/GCM nor AES/XTS (though this is mainly used for disk encryption).

Disclaimer

This is an independent article that has not been endorsed or sponsored by any entity (e.g. AMD, Intel, Infineon, ST Micro). All trademarks acknowledged and used for identification only under fair use.

Cryptographic Performance

We are testing native cryptographic performance for common functions (random number generation, hashing, signing, encryption/decryption) though the standard operating system (Windows) TPM drivers.

Results Interpretation: Higher values (MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest drivers. 2MB “large pages” were enabled and in use.

TPM Benchmarks Intel fTPM 2.0 2012 (ME 9.5 Hasswell) Intel fTPM 2.0 2016 (ME 11.8 SkyLake) Intel fTPM 2.0 2016 (ME 11.12 SkyLake-X) Intel fTPM 2.0 2018 (ME 12.0 CoffeeLake) STMicro ST33TPHF20 TPM 2.0 2018 (Discrete) Infineon SLB 9665 TPM 2.0 2016 (Discrete) AMD fTPM 2.0 2018 (PSP 3.0 / Ryzen 2000) AMD fTPM 2.0 2018 (PSP 3.0 / Ryzen 5000)
Random Generator (kB/s) 35 10.02 7.93 16.42 [+60%] 9.25 8.96 7.06 14.27 [+2x]
As the TPM returns random numbers in small blocks (32 bytes typically) the I/O overhead is huge – and it’s no surprise that faster processors dominate.

It is, nevertheless interesting to see that Intel’s SKL-X is significantly slower than desktop SKL/KBL – it seems the X299 PCH performs slower than desktop Z170 PCH – despite both being on ME 11.x firmware. CFL does have an updated ME and TPM firmware but is also faster and not affected by “Meltdown” which likely accounts for the performance delta (+60%). The old ME 9.5 firmware w/ARC processor is able to generate random numbers much faster.

It is interesting to see that AMD’s Ryzen 5000 PSP performs so much faster (2x) than the not-so-different series Ryzen 2000, neither affected by “Meltdown” or many other vulnerabilities. As the PSP is built-in the CPU not PCH, it is likely Ryzen 5000 has as different/faster ARM PSP processor.

Hashing SHA1 (kB/s) 399 318 238 326 [+3%] 59.52 128 426 837 [+2x]
With hashing, we work on larger blocks (1kB typically), thus the I/O overhead is much lower. While SHA1 has been largely superseded, it still remains in non-critical use.

It’s interesting to see firmware implementations being much faster than discrete devices, likely due to the more powerful processors used by Intel/AMD that are needed to run other platform tasks (main processor power management, etc.). Infineon with its 16-bit processor is very much slower (1/4x) while even ST Micro about 1/2x slower.

On Intel, CFL’s PCH performs similarly to the old SKL PCH thus there are no significant changes despite all the firmware changes. We see the old ME 9.5 w/ARC processor somewhat faster hashing than the Quark x86 processor in newer ME 11+ firmware.

On AMD, again we see Ryzen 5000 PSP almost twice as fast as the Ryzen 2000 PSP – thus what we have seen before was not a one-off. It does seem the PSP has been very much updated and is about twice as fast.

It’s not a surprise that Microsoft’s vTPM is the fastest – as its running on the host processor itself – but surprisingly it is not as fast as you may expect, likely due to the overheads of VM <> hypervisor <> host calls.

Hashing SHA2-256 (kB/s) 380 305 235 306 [=] 61.118 128 425 852 [+2x]
With SHA2, performance is only marginally slower than SHA1 which means replacing SHA1 does not incur a significant performance penalty. There are no significant performance changes thus the conclusions drawn remain the same.

It is interesting to see that the fastest firmware implementation (AMD PSP) over 10 times (10x) faster than the discrete Infineon, a big delta.

Final Thoughts / Conclusions

For better or worse, modern security today pretty much requires a crypto-processor (TPM). The good news (if you are happy to use it) is that firmware-based implementations are now common on both Intel and AMD – and these are not restricted to business/enterprise devices; no hard-to-find or impossible-to-update optional module required! As the crypto-processor firmware is part of the main engine firmware (Intel ME, AMD PSP, etc.) – it is kept up-to-date by updating the main system BIOS which is much more likely to be updated.

Not surprisingly, these firmware-based implementations are also faster – likely due to the much faster processors running the respective platforms engines (Intel PCH, AMD internal) and they get faster with platform/CPU upgrades.

While the crypto-processor is not meant to run heavy-duty crypto workloads – thus its performance is not system critical – it is used for more and more system security (e.g. key storage, attestation for system/device/user, digital asset protection, licensing) – its performance is becoming more critical. Again we see firmware-based implementations performing very well – sometimes over 10-times faster than older discrete devices which can make a big difference in system response times when using its functions.

Having just launched the benchmarks – we will be adding more benchmark tests in the future as well as watching for results from users through the Official SiSoftware Ranker. Please submit your benchmark results – they help enormously!

Please see our other articles on:

Disclaimer

This is an independent article that has not been endorsed or sponsored by any entity (e.g. AMD, Intel, Infineon, ST Micro). All trademarks acknowledged and used for identification only under fair use.

Tagged , , , , , . Bookmark the permalink.

Comments are closed.