Earlier this month, AMD announced Zen 2, its next-generation microarchitecture for desktop and server chips, for the first time. Along with Zen 2, AMD also unveiled initial details of their next-generation server chips, codenamed Rome.
Zen 2 follows the Zen / Zen+ architectures. The design targets the 7 nanometer TSMC process node. AMD rated both 10 nm and 7 nm. AMD states that 7 nanometers offer double density and offer 0.5 times the power with the same performance or 1.25 times the performance with the same power. The Zen 2-based chips are currently being sampled and are about to be delivered to the market in 2019.
AMD has made numerous improvements to Zen 2. To power the expanded execution units that were improved in throughput, you had to adjust the front end. For this reason, the branch prediction has been reworked. This includes improvements to the prefetcher and various undisclosed optimizations in the instruction cache. The OP cache has also been optimized by including changes to the cache tags and the same OP cache that has been enlarged to improve the speed of the instruction flow. The exact details of the Zen 2 changes were not disclosed at this time.
Most of the changes to the back end relate to floating-point units. The main change is the widening of the data path that has been doubled in width for floating-point execution units. This includes upload/storage operations and FPUs. In Zen, AVX2 is fully supported by the use of two 128-bit micro-ops per instruction. Similarly, the load and storage data paths were 128 bits wide. At each cycle, the FPU is able to receive 2 loads from the loading/storage unit, each up to 128 bits. In Zen 2, the data path is now 256 bits. In addition, the execution units are also 256-bit, which means that 256-bit AVX operations no longer have to be divided into two 128-bit micro-operations per statement. With 2 256-bit FMA, Zen 2 is capable of performing calculations from 16 FLOP/cycle, corresponding to that of Intel's Skylake core client.
AMD stated that Zen 2 IPC has been improved along with an increase in both bandwidths. As for security, Zen 2 introduces improved Spectre mitigations.
AMD's second-generation EPYC is codenamed Rome,the successor to Naples. The two are compatible with socket and platform. Note that Milan,Rome's successor, will be compatible with the same socket. Rome still uses a multi-chip approach to increase the number of cores, but the design of the system itself has changed dramatically from the previous generation. In Naples, AMD scales the 8-core design, called Zeppelin, to 32 cores by stitching together four of those SoCs through their proprietary interconnection called Infinity Fabric. This method provided eight memory channels and 128 PCIe lanes spread across all molds.
With Rome, AMD is bringing the idea of chiplets even further. Similar to the one they had started initially with Threadripper 2, Rome has calculation dies and an I/O die. However, this time, AMD removed the core execution blocks and moved them into new computing dies, taking advantage of TSMC's 7 nm process and leveraging the lower power and higher density. The calculation dies are then linked to a centralized I/O die that handles I/O and memory. The much larger I/O die is produced with a 14 nm process of GlobalFoundries.
In total, there are nine dies. One Die I/O and eight computational dies - each with 8 Zen cores 2. Neither the details of the individual calculation dies nor the I/O dies have been disclosed. There are many challenges in this type of design and it would be interesting to see how they were addressed. The I/O die creates deterministic and unified latencies all over the chip, but could affect better/sensitive case scenarios. The package is organized into four pairs of calculation dies.
With eight octa-core processing dies, Rome can offer up to 64 cores and 128 threads, doubling/quadrupling (AVX2) first-generation EPYC throughput. Although Rome remains with 128 PCIe lanes, it brings new support for PCIe Gen 4, doubling the transfer speed from 8 GT/s to 16 GT/s. There are eight DDR4 memory channels that support up to four terabytes of DRAM per socket. An interesting detail that AMD revealed with their ad on the GPU is that Infinity Fabric now supports 100 GB/s (BiDir) per link. If we assume that Infinity Fabric 2 still uses 16 differential pairs as with first generation IF, it would mean that IF 2 now works at 25 GT/s, identical to the NVLink 2.0 data speed. However, because AMD's IF is twice as wide, it provides twice the bandwidth per link on NVidia's NVLink.
There is a lot of mystery around AMD's I/O capabilities and plan for the future. By moving all "redundant components", such as I/O and southbridge, from the calculation matrix to the I/O matrix, AMD opened up their design to some intriguing possibilities. Because all controls can be found in the centralized I/O matrix, it becomes possible to swap calculation matrices with other logic types such as an FPGA (for example, from Xilinx) or a GPU. With Naples,this would have meant sacrificing part of I/O or memory, but with Rome it is no longer so. AMD has not announced such plans, but the option is there.
The key to AMD's event is their roadmap. A predictable roadmap helps improve customer confidence in the platform. AMD wanted to show that it was able to draw up a roadmap and execute it. To this end, AMD plans to launch Zen 2 in 2019. Zen 3 is on track and Zen 4 is in the process of completing the project.