Intel and AMD will need to keep an eye on Arm competition in their strongholds.
On Tuesday afternoon, Arm held a Vision Day event at which it teased details about its upcoming Arm v9 architecture.
The short version: expect a massively altered security landscape, along with improvements to vector math (which in turn means improvements in AI/ML and Digital Signal Processing, among other applications).
The key concept introduced in Arm v9’s new Confidential Compute Architecture is the
realm. Realms are containerized, isolated execution environments, completely opaque to both operating system and hypervisor. The hypervisor itself will only be responsible for scheduling and resource allocation. Realms themselves are to be managed by the
realm manager—a new concept that can apparently be implemented in 1/10th the code required for a hypervisor.
Applications inside a realm can
attest to the realm manager that they’re trustworthy—while all this is still very vague, “attestation” sounds like it might be an Arm-flavored analogue of System Guard Secure Launch, one facet of Microsoft’s Secured Core PC Initiative.
We don’t have any technical detail yet of what actually enforces the separation of one
realm from another—or from the host—but it seems likely that this will in turn be similar to AMD’s Secure Encrypted Virtualization, introduced with its Epyc Rome server processors. In AMD’s SEV, a secure processor manages separate keys for each guest in a hypervisor, as well as for the host itself.
In theory, one might separate realms from one another by dint of simple enforcement from a security coprocessor, with no actual encryption—but that wouldn’t protect it from physics-based side-channel attacks. We’re very much still guessing here, but we don’t see any way for Arm to make good on its promises to keep each realm safe from other realms, the host, and the hardware without per-realm encryption.
Overall arm designs break into two major categories: Neoverse v1/v2, which serve different segments of the server market, and Cortex—the mobile-optimized design familiar to anyone with an Android phone or tablet.
Notably, Arm predicts a 30 percent uplift with Cortex-X—and is promising massive upgrades to its Mali GPU, adding features such as ray tracing and variable rate shading. The new graphical features seem aimed at bringing Mali up to compete more closely with desktop GPUs from Nvidia and AMD—and the virtualization support specifically promised will be critical for isolating games and apps fully in the new realm containers described above.
In addition to CPU and GPU performance, Arm promises updates to its vector math functions—critical to tasks including but not limited to AI, machine learning, and digital signal processing. If you’re vaguely familiar with desktop x86 architecture over the last twenty years, you’ll have noticed successive buzz over first MMX, then SSE, and finally AVX instruction sets that promised to make games (among other things) go faster. These are all vector math instruction sets.
The reason there were so many vector instruction sets—and why applications had to be specifically coded to support or not support each one—is that they were fixed in size to match hardware register size onboard the CPU. As vector register size increased—all the way up to Intel’s most recent AVX-512, offering 512-byte registers—new instructions had to be developed to access the larger sizes.
Arm’s first vector math instruction set, NEON, also used fixed sizes. A replacement instruction set, SVE, offered dynamically sized replacements for some of NEON’s functionality. With SVE instructions, you could just say “I want to multiply these two 1,024 byte vectors” and let the processor itself figure out how many steps it needed to take in order to do so.
The problem with SVE is that it didn’t fully replace all of NEON’s functionality. SVE2 aims to fully replace NEON, where SVE could not. This in turn means a developer can write their code only once and have that same code run optimally on both a phone with 128 byte registers, and a server with 512 byte registers—and also on an imaginary future server, with hypothetical 2KiB registers.