NVIDIA’s RTX 30 Series showcased a ton of performance claims and specs and started dropping jaws – but somewhere between all the hype and third-party reviews, the promised doubling in performance has disappeared without a trace. Today we are going to investigate a very interesting phenomenon that wraps up the NVIDII GPU and why not everything seems like it. Nothing is presented as gospel truth for you to believe and you are encouraged to use your own judgment according to taste.
NVIDIA’s RTX30 series has double TFLOP, so where does all the work go?
The argument is simple, Jensen promised to give graphics power twice in an ampere GPU so we should see shading performance (without any bells and whistles like DLSS or RTX) in most of the titles. This, very curiously, is not happening. In fact, even though the RTX 3090 is more than double the number of shedding cores, the RTX 2080 is 30% to 50% faster in shedding performance in gambling titles than TI. TFLPS is, after all, the function of clocks that multiply clockwise. Somewhere, somewhere, performance is being lost.
One of three things is happening:
- Ampere’s loan shading core is somehow secondary to Turing and the cards can’t really deliver that FP32 TFLOPs number (in other words Jensen is wrong).
- Something went wrong with the BIOS / microcode or lower-level drivers of the card
- High-level drivers / gaming engines / software software stacks cannot scale large enough to properly utilize the set of shading cores present in Ampere cards.
Fortunately for us, this is a problem that we can easily investigate using the scientific method. If the shader core of Ampere cards is somehow secondary to Turing, then we can’t get double FP32 performance using * any * application. Easy right? However, it becomes a little difficult if we can get the claimed operation on * any * application. When it removes any faulty hardware, then we need to find out if the software software stack / high-level drivers are faulty or whether it is a microcode issue. While you can handle hardware versus software with a very high level of certainty, you cannot do that for the software side. You can, however, make a pretty good guess. Our logic flow diagram is as follows:
Rendering apps are built to use a ton of graphics horsepower. In other words, their software is coded to scale more than games (there have been cases where games have refused to work on more than 16 major calculations in the past). If the * one * rendering application can’t show twice the performance compared to blaming the hardware. The core is not secondary. If * all * can take full advantage of rendering apps, then even a low-level driver stack is not to blame. This will point the finger at APIs like DirectX, GameReady drivers and the actual code of gaming engines. So without further ado, let’s take a look.
VRAY is one of the most shading intensive benchmarks for GPUs. It’s essentially a cinebench for the GPU. It also helps that the program is optimized for CUDA architecture so it presents a “best case” scenario for NVIDIA cards. If the Ampere series doesn’t deliver doubling in operation here, it won’t do anywhere else. In VRY the RTX 3090, RTX 2080 achieves more often than the shading performance of the TI. Remember our flow figure?
We have a program that can actually double the performance in the ‘world’ workload, which clearly means that Jensen wasn’t lying and the RTX30 series is actually capable of claiming performance statistics – at least as far as hardware goes. So now we know that the performance is being lost somewhere on the software side. Interestingly, one has a slightly worse scale than VRAY – which is little evidence of a lack of low-level drivers. In general, however, rendering apps are more easily surpassed than gaming apps.
We took a panel of 11 games. We just wanted to test the games on shading performance, no DLSS and no RTX. There was no specific method of picking the title – we emptied the games we fell into. We found that the RTX 3090 is on average 33 %% faster than the RTX 2080T. This means that, for the most part, the card works like a 23.5 TFLOPS GPU. Rendering apps is clearly hitting the display by moving into games. The RTX series should be a hit and there is a huge difference between the performance targets in its actual output. Here, however, we can only guess. With so many fluctuations between different games, game engine scaling is clearly a factor and drivers don’t seem to be able to take full advantage of the 10,000+ cores that RTX 3090 has.
So what does this mean? The Amazing World of Software Bottleneck, Fine Wine and No Negative Impact Scaling in Lineups
Because the problem in the RTX 30 series is so obvious that it’s based on software (NVIDIA has literally made the GPU so powerful that current software can’t take advantage of it), that’s a very good problem. AMD GPUs have always been praised for being “fine wine”. We have noticed that NVIDII’s RTX 30 series will be the mother of all beautiful wines. The level of performance growth we expect for these cards to come through software next year will be phenomenal. As game drivers, APIs, and game engines advance in scaling and learn how to deal with the metric butt-ton (sorry language) of the shading cores present in these cards, and DLSS will mature technically, move closer to the 2x performance level – but Finally, go beyond them.
While it is unfortunate that not all of these displays are usable in a single day, this may not be the fault of NVIDII (remember, we are only beside the problem software, we do not know for sure whether the drivers or game engine APIs are functional. Is responsible for the loss) and one thing is certain: you will see that parts of this display will be unlocked within months of the software side maturing. In other words, you are looking for the first NVIDIA fine wine. While previous pay generations usually unlocked their full display on the first day, NVIDII does not do RTX 30 series and it would be good to remember when you make a purchase decision.
Aside from fine wine, there are other very interesting side effects. I expect no negative performance scaling as the roster moves down. Since the performance of the RTX 30 series has essentially been software-butlinked and the number of cores that the bottleneck revolves around appears, this should mean that less powerful cards will experience significantly less buttliking (and therefore more scaling). In fact, I’m going to make a prediction: the RTX 3060T for example (with 512 more cores than the RTX 2080T) should experience better scaling than its older brothers and still beat the RTX 2080T! The lower the core count, the better the scaling.
While this situation represents an unsafe area for NVIDII, we think this is a good problem. As Multiple Core Count CPU by AMD. As the introduction of, forcing the game engine to support more than 16 cores, the aggressive approach with NVIDIAS core count should also force the software side to catch up with scaling. So in the next year, I expect RTX 30 owners to get software software updates that will drastically increase performance.