PAM4 signal for higher rates, come to the RTX 3090 from NVIDIA


It seems that this morning Micron accidentally spilled the beans on the future of graphics memory technologies – and one of NVIDIA’s next-generation RTX video cards passed in the process. In a technical letter posted on its website, entitled “The Demand for Ultra-Bandwidth Solutions”, Micron detailed its portfolio of high-bandwidth memory technologies and the market needs for it. Included in this brief was information about the previously unannounced GDDR6X memory technology, as well as some information about what appears to be the first card it has, NVFIA’s GeForce RTX 3090.

The document appears to have been posted a month (or more) early, given the mention of the NVIDIA card, which we do not expect to announce sooner than at the NVIDIA September event. Furthermore, the document links to other as yet unpublished Micron technical letters containing GDDR6X. Despite that, the document comes directly from Micron’s public web servers, so what we have today is an unexpected sneak peak at Micron’s upcoming GDDR memory plans.

In any case, because this is a market overview instead of a technically deep road, the details on GDDR6X are slim. The document links to another, as yet unpublished document, “doubling I / O performance with PAM4: Micron updates GDDR6X to speed up graphics memory”, which would probably include more details about GDDR6X. Not least, even this high-level overview gives us a basic idea of ​​what Micron has in store for later this year.

The key innovation for GDDR6X appears to be that Micron is moving from using NRZ encoding on the memory bus – a binary (two state) encoding format – to four state encoding in the form of Pulse-Amplitude Modulation 4 (PAM4). In short, Micron would double the number of signal states in the GDDR6X memory bus so that it can transmit twice as much data per clock.

GPU Memory Math
GDDR6X
(RTX 3090)
GDDR6
(Titan RTX)
GDDR5X
(Titan Xp)
HBM2
(Titan V)
Total capacity 12 GB 12 GB 12 GB 12 GB
B / W Per pin 21 Gbps 14 Gbps 11.4 Gbps 1.7 Gbps
Chip capacity 1 GB (8 Gb) 1 GB (8 Gb) 1 GB (8 Gb) 4 GB (32 Gb)
No. Chips / KGSDs 12 12 12 3
B / W Per chip / stack 84 GB / s 56 GB / s 45.6 GB / s 217.6 GB / s
Bus width 384-bit 384-bit 352-bit 3072-bit
Total B / W 1008 GB / s 672 GB / s 548 GB / s 652.8 GB / s
DRAM Voltage ? 1.35 V 1.35 V 1.2 V
Data Rate QDR QDR DDR DDR
Signal PAM4 Binary Binary Binary

PAM4 itself is not a new technology, and is well used in other high-end devices such as network transceivers. More recently, the PCI-SIG announced that they will adopt PAM4 encryption for PCIe 6.0. So for a more detailed discussion on PAM4, here is our explanation of our PCIe 6.0 primer:

By in heel high level, what PAM4 does against NRZ (binary coding) is to take a page from the MLC NAND playbook, and double the number of electrical states that a cell set (as in this case, transmission) will hold. Instead of traditional 0/1 high / low signal, PAM4 uses 4 signal levels, so a signal can encode four possible two-bit patterns: 00/01/10/11. This allows PAM4 to carry twice as much data as NRZ without increasing the transmission bandwidth, which for PCIe 6.0 would result in a frequency around 30GHz (!).


NRZ vs. PAM4 (Intel Courtesy Base Chart)

PAM4 itself is not a new technology, but so far it has been the domain of ultra-high-end networking standards such as 200G Ethernet, where the amount of space for more physical channels is even more limited. As a result, the industry has several years of experience working with the signaling standard, and with its own bandwidth needed to grow, the PCI-SIG has decided to bring it inside the chassis by basing the next generation PCIe on it.

The downside to using PAM4 is of course cost. Even with its larger bandwidth per Hz, PAM4 currently costs more to implement at almost every level, from the PHY to the physical layer. That is the reason why the world has not taken the storm, and why NRZ is used elsewhere. The bare scale of PCIe mass deployment will obviously help a lot here – economies of scale still count a lot – but it will be interesting to see where things stand in a few years once PCIe 6.0 is in the midst of overhauling .

Until now, PAM4 signal has only been used for network and expansion buses, so using it for a memory bus, although a logical expansion, would represent a major leap in technology. Now Micron needs to develop memory that not only can do clean PAM4 modulation – which is no easy task – but NVIDIA needs a matching memory controller on the other end. It’s workable, and probably inoperable, but it’s a big change from how memory buses have traditionally served – even high speed buses like the ones used for GDDR.

According to the Micron letter, they expect to get GDDR6X up to 21Gbps / pin, at least to begin with. This is far from doubling GDDR6’s existing 16Gbps / pin rate, but it is also a data rate that would be grounded in the limitations of PAM4 and DRAM. PAM4 itself is easier to achieve than binary coding at the same total data rate, but four states just need to determine instead of two is otherwise a harder task. That a smaller jump is not too surprising.

Meanwhile, it leaves the core frequency of DRAM as a continuing question. As an upgrade, DRAM cells more or less flattened in performance years ago – you can only drive a mixed transistor / capacitor device so quickly – so newer memory technologies went over ever-larger parallels. In the case of GDDR technologies, this means for example that 16Gbps GDDR6 has the same core clock rate as 8Gbps GDDR5. So to achieve a 21Gbps data rate, it is not clear if Micron pushes the core DRAM clock speed higher, or if they redistribute it and rely on more parallelism (e.g. a larger pre-size). Looking at what late generation GDDR5 memory could do, I think Micron is just pumping the core clock speed hard for GDDR6X, but somehow it will be interesting to see what they do.

The other big wildcard for the moment will be cost. As I mentioned earlier, PAM4 has been around for a while; it is just expensive to use because of the required engineering and silicon. How much will PAM4 add to a memory chip at the cost? Clearly, this will be a premium memory technology, although at the same time it is a safe bet that Micron would not go this route as it would cost as much as HBM, which has already been cost-banned for consumer video cards.

By the way, there’s one last interesting nugget of information about GDDR6X in Micron’s white paper, and that’s about energy consumption. One of the indirect advantages of PAM4 is that by running the bus at a lower clock rate than would otherwise be necessary, the energy consumption requirements would decrease. In any case, there is no 2x difference, because the complexity of PAM4 encoding in other ways consumes power, but more efficient nonetheless. And according to Micron, this will also play for GDDR6X, with GDDR6X having a slightly lower energy cost per bit.

According to Micron’s letter, we are looking at an average device power of about 7.25 picojoules per byte for GDDR6X, versus 7.5 for GDDR6. Going by this data, this is also relatively close to HBM2, although well behind HBM2E. That said, because this is efficiency per byte, it means that actual energy consumption is a function of bandwidth; and although GDDR6X is a bit more efficient, it is adapted to be much faster. Going by Micron’s data, GDDR6X’s total energy consumption will be higher than GDDR6, by about 25%.

Overall, Micron presents PAM4 as the natural evolution of GDDR memory technology. And while this is packaged in obvious technical marketing, there’s a nugget of truth, in as much as the official data rates for GDDR6 are still top at 16Gbps. Rambus, for its part, has demonstrated 18Gbps GDDR6 in labs, but looking from the outside in, it’s not clear now if that’s commercially viable – no memory vendor currently has 18Gbps chips in its catalog.

But despite where Vanilla GDDR6 ultimately comes from, the memory sector as a whole has long stared at a calculation with memory bus speeds. Subsequent standards have employed various techniques to improve data rates, such as the QDR of GDDR6, but GDDR has always remained a one-end I / O standard using binary coding. With transfer rates per pin now over 16GT / second, one of those two core principles will eventually have to change, as we have seen in other fields using high-speed I / O.


AMD 2011 Technical Forum and Exhibition exploring memory options for GDDR5

PAM4, in turn, is likely to be the lesser of two evils. Binary coding for PAM4 throws is, if nothing else, the more energy efficient option. The other route would have been to throw IO with one end for differential signal, which is what the memory sector would like to avoid. Differential signal works, and it works well – GDDR6 even uses it for clocking (and not memory transfer) – but it consumes a lot of pins and even more power. Which is part of the reason HBM came round. So in one respect, PAM4 can be thought of as another way to prevent differential signaling on GDDR for at least another generation.

Finally, and although we are on the subject of memory standards, the striking lack of JEDEC is mentioned in Micron’s document. The body of the trade organization and standards is responsible for setting GDDR memory standards, including GDDR6, as well as Micron’s previous attempt at a memory technology release, GDDR5X. Given the premature nature of the short release, it’s not clear if GDDR6X is another JEDEC standard currently being privately developed ahead of a public launch, or if Micron is really going solo and owning one memory standard has developed.

NVIDIA GeForce RTX 3090: 12GB of GDDR6X with almost 1TB of memory bandwidth?

Last but not least, let’s talk about the second secret revealed in Micron’s short, NVIDIA’s GeForce RTX 3090. The adopted high-end video card is apparently Micron’s flagship case for GDDR6X, and the company has useful them typical memory configuration imposed.

In short, according to Micron, the 12GB video card will ship from GDDR6X in a 384-bit memory bus configuration. That memory will in turn clock somewhere between 19Gbps ​​and 21Gbps, which at the higher end of that range would give the card 1008GB / sec of memory bandwidth, just a shame for an actual 1TB / sec (1024GB / sec) of bandwidth.

NVIDIA GeForce specification comparison
RTX 3090 RTX 2080 Ti RTX 2080 GTX 1080 Ti
CUDA Cores Folle 4352 2944 3584
Memory clock 19-21Gbps GDDR6X 14Gbps GDDR6 14Gbps GDDR6 11Gbps GDDR5X
Memory bus width 384-bit 352-bit 256-bit 352-bit
Memory bandwidth 912 – 1008 GB / sec 616 GB / sec 448 GB / sec 484 GB / sec
VRAM 12GB 11GB 8GB 11GB
Architecture Ampere Turing Turing Pascal
Manufacturing process Shiny TSMC 12nm “FFN” TSMC 12nm “FFN” TSMC 16nm
Start date Fall 2020? 09/27/2018 09/20/2018 03/10/2017

Compared to the current generation of NVIDIA cards, this would be a big increase in memory bandwidth. At least we are looking at 36% more bandwidth than a GeForce RTX 2080 Ti, and at the high end of that estimate, that figure will be a full jump of 50% in bandwidth. This is still well below what NVIDIA’s Ampere-based A100 accelerator can do (1.6TB / sec), but it would be phenomenal for a card with GDDR-type memory on a 384-bit bus. And it goes without saying that it would go a long way to help feed the animal that is a next generation video card.

In any case, this is clearly far from the definitive word for GDDR6X as NVIDIA’s RTX 3090, so we’ll have more to look forward to when NVIDIA’s September event arrives.

Credit for main image: Micron, Bare DRAM (DDR5) Die