How does unified memory in Apple processors work?

How does unified memory in Apple processors work? By purchasing the latest Apple computers (those with M1, M2, M2 Pro or M2 Max chips), the user comes across technical specifications that speak of “unified memory” which promises high-level performance.

Apple internal document confirms only iPhone 15 Pro will support Wi-Fi 6E

Apple began talking about this type of memory with the introduction of the M1 chip in 2010 and it is a technology that makes it possible to access memory resources (short to long-term) shared between all computer processing components.

The chips, or rather SoCs, of the Mx series, are optimized for machines and devices in which small dimensions and energy efficiency are of crucial importance. These system on a chip (SoC), enclose numerous technologies in a single processor and have a memory architecture that allows them to boast performance and efficiency significantly higher than the architectures of other manufacturers.

Apple explains that unified memory “offers high bandwidth and low latency in a single pool within a custom package. This allows technologies in the SoC to access the same data without moving it between pools of memory. improving performance and efficiency over previous architectures.”

What is unified memory for?

Unified memory isn’t new, but it’s kind of a throwback, when RAM was expensive and manufacturers preferred to share memory between the CPU (the computing unit) and GPU (the graphics unit). ); subsequently, video card manufacturers began to exploit dedicated memories, faster than the RAM accessible directly from the CPU but with obvious physical limits related to the distance that the signal must travel (to communicate to and from the actual GPU and CPU).

What Apple refers to today is an evolution of unified memory, with SoC components that can have access to the same areas, reading and writing data, without necessarily having to copy and move them. You can combine RAM and storage unit memory into a single shared memory pool (a block of memory assigned to a program, application, or file).

This approach that Apple takes forward, is indicated as able to improve processing efficiency and energy consumption. The CPU and GPU share memory locations (portions of cells into which a memory can be logically divided), rather than accessing separate memories for the allocation of RAM, VRAM and other subsystems.

The width of

The bandwidth of Apple SoCs varies by chip; in MacBook Airs with the M2 chip, Apple talks about 100 GBps of “memory bandwidth” (the theoretical amount of memory that can be handled per second); in the 14″ and 16″ MacBook Pros with M2 Pro and M2 Max chips, Apple speaks respectively of 200 GBps of memory bandwidth and 400 GBps of memory bandwidth; in the base Mac Studio with M1 Max chip the memory bandwidth is 400 GBps, in the Mac Studio with M1 Ultra it reaches 800 GBps of memory bandwidth. Simply put, the higher the bandwidth, the faster the in-memory or background data processing speed is keeping the SoC busy behind the scenes, draining CPU power and consuming memory.

With the arrival of the M2 chip, Apple has used the second generation 5 nanometer technology, making these chips take a further leap forward compared to the performance per watt of the M1 chip, not only in terms of CPU and GPU, with improvements of “50% more memory bandwidth than M1” that allow these SoCs to handle even larger and more complex workloads. With M2 Pro and M2 Max, unified memory reaches up to 96GB, double that of the M2 chip, and up to 32GB of low latency unified memory. The M1 Ultra can be configured with up to 128GB of high-bandwidth, low-latency unified memory.

Ultrafusion

From a technical point of view, Apple explains that the “UltraFusion” packaging architecture, used to interconnect the die of two M1 Max chips, uses a silicon interposer (electronic interface) that connects the chips on more than 10,000 signals, offering a impressive 2.5TB/s low latency inter-processor bandwidth, more than four times the bandwidth of other multi-chip interconnect technologies; this allows the M1 Ultra to behave and be identified by the software as a single chip, with obvious advantages in particularly heavy and complex workflows.

The GPU is also able to take advantage of unified memory. Compared to the most powerful PC graphics cards on the market that offer up to tens of GB, the M1 Ultra chip can exploit even very large amounts of graphics memory, with obvious benefits for activities that require extreme 3D geometry and the rendering of massive scenes.

A similar approach to Apple’s unified memory has been exploited by Nvidia with CUDA 6, a parallel processing and programming platform that does not require copying operations to different memories but offers a unified memory system on top of the structure (programmers can operate on the contents without first explicitly copying them into memory, using a shared pointer).

We will hear more and more about this approach: from the point of view of programmers it is a panacea that allows you to speed up work; some SoC manufacturers have been offering hybrid architectures for some time, with ARM cores and integrated graphics cores, a choice that allows the processor to be relieved of various tasks, performing various operations faster and consuming less.

Does Apple also take advantage of the storage memory?

When necessary, Apple uses the memory destined for storage in its systems (Mac and iPad) as virtual memory; the high speed of the latter memories can be exploited for data management, in cases where a minimum latency is not required (typically for temporary storage during the execution of certain applications).

When necessary, the storage unit can be exploited by the operating system as part of the memory pool, a system that makes comparisons with older machines with Intel processors on which RAM memory and SSD units work differently useless.

How much unified memory do I need?

Greater efficiency, direct communication, energy savings are enormous advantages but the golden rules for choosing a Mac remain the same: the 8 GB models are very efficient for navigation, word processors and standard use, the 16 GB work best with PhotoShop and video editing or basic 3D but if you use huge audio, 4K and 8K video libraries with many layers or you have to manage a complex artificial intelligence system you also have to go up in RAM size.

Leave a Reply