Intel’s run on the GPU market begins with Tiger Lake on board Graphics


Intel is looking to replace Nvidia as the
Increase / Intel seeks to replace Nvidia as the “one stop GPU store”, with a comprehensive line of GPUs focused on everything from laptops to gaming to the data center.

Intel

At Intel Architecture Day 2020, most of the focus and buzz ran on the upcoming Tiger Lake 10nm laptop CPUs – but Intel also announced advances in its Xe GPU technology, strategy and planning that will drive the sector into the upcoming few years can shake.

Integrated Xe graphics are probably one of the best features of the Tiger Lake laptop CPU. Although we have not yet officially sanctioned the test results, let alone third-party tests, some leaked benchmarks have shown that Tiger Lake’s integrated graphics beat the Vega 11 chipset in Ryzen 4000 mobile by a large margin of 35 percent.

Assuming these leaky benchmarks unfold in the real world, they will be a much-needed shot in the arm for Intel’s flagship reputation in the laptop space. But there’s more to Xe than that.

A new challenger appears

Intel's 7nm Xe architecture is intended to cover the full range of GPU applications, but Ponte Vecchio - the first Xe product - is specifically focused on high end deep learning and training in data center and supercomputer environments.
Increase / Intel’s 7nm Xe architecture is intended to cover the full range of GPU applications, but Ponte Vecchio – the first Xe product – is specifically focused on high end deep learning and training in data center and supercomputer environments.

Intel Corporation

It’s been a while since every third party really challenged the two-party slot on high-end graphics cards – for about 20 years you were the only realistic GPU choice for high-performance Nvidia or Radeon chipsets. We first got wind of Intel’s plans to change that in 2019 – but at the moment Intel is only really talking about its upcoming Xe GPU architecture in Ponte Vecchio, a product aimed at HPC supercomputing and data center use.

The company was not really ready to talk about it at the time, but we found a slide in Intel’s Supercomputing 2019 deck that mentioned plans to expand Xe architecture into workstation, gaming and laptop product lines. We haven’t seen an Intel desktop gaming card yet – but Xe has replaced both the old UHD line and the more capable Iris + replacement, and Intel is much more willing to talk about extended futures now than it was last year.

When we asked Intel executives in 2019 about that “gaming” slide, they seemed pretty noncommittal about it. When we asked again on Architecture Day 2020, the shame was gone. Intel does not yet have a date for a desktop gaming (Xe HPG) card, but its executives expressed confidence in “market-leading performance” – including onboard hardware raytracing – in that segment soon.

A closer look at Xe LP

If you follow our previous coverage of Tiger Lake architecture, the first graphic in the gallery should be very familiar. The Xe LP GPU enjoys the same benefits of Intel’s redesigned FinFET transistors and SuperMIM capacitors as the Tiger Lake CPU does. Specifically, this means stability over a wider range of voltages and a higher frequency up across the board, compared to Gen11 (Ice Lake Iris +) GPUs.

With greater dynamic range for voltage, Xe LP can operate at significantly lower power than Iris + could – and it can also scale to higher frequencies. The increased frequency-up means higher frequencies at the same voltages that Iris + could also manage. It is difficult to overstate the importance of this curve, which affects efficiency and performance on not only but all workloads.

However, the improvements do not end with increase in voltage and frequency. The high end Xe LP has 96 output units (compared to Iris + G7s 64), and each of those output units has FP / INT Arithmetic Logic Units twice as wide as Iris + G7s. Add a new L1 data cache for every 16 EU subslice, and an increase in L3 cache from 3MiB to 16MiB, and you can start to get an idea of ​​just how big an improvement Xe LP really is.

The 96-EU version of Xe LP is rated for 50 percent more 32-bit Floating Point Operations (FLOPS) per clock cycle than Iris + G7 was and operates at higher frequencies, to boot. This is quite similar to the leaked Time Spy GPU benchmarks we mentioned earlier – the i7-1165G7 raised a Time Spy GPU score from 1,482 to i7-1065G7’s 806 (and Ryzen 7 4700Us 1,093).

Improving buy-in with OneAPI

One of the biggest business keys to success in the GPU market is lowering costs and increasing revenue by relying on multiple markets. The first part of Intel’s broad-based strategy and low production and design costs for Xe is scalability – instead of completely separate designs for laptop parts, desktop parts, and data center parts, they aim to scale Xe relatively easily by adding more subslices with more EU as the SKUs move upmarket.

There’s another important differentiator that Intel really needs to break into the market in a big way. AMD’s Radeon line suffers from the fact that no matter how appealing they may be to gamers, they leave AI practitioners cold. This is not necessary because Radeon GPUs cannot be used for AI computing – the problem is simpler; There’s a whole ecosystem full of libraries and models designed specifically for Nvidia’s CUDA architecture, and no other.

It does not seem likely that a competing deep-learning GPU architecture, which requires massive code rewriting, could succeed, unless it offers something much more attractive than something cheaper than some more powerful hardware. Intel’s response is instead to offer a “write once, run everywhere” environment – specifically the OneAPI framework, which is expected to reach production loss status later this year.

Many people expect that all “serious” AI / deep-learning workloads will run on GPUs, which generally offer massively higher throughput than CPUs – even CPUs with Intel’s AVX-512 “Deep Learning Boost” instruction set – can possibly . In the data center, where it’s easy to order any configuration you want with little in the way of space, power or heating constraints, this is at least close to true.

But when it comes to workloads for inference, GPU performance is not always the best answer. While the GPU’s massive parallel architecture offers potentially higher throughput than a CPU can, the latency involved in setting and breaking short workloads can often make the CPU an acceptable – or even superior – alternative.

An increasing amount of inference is not done at all in the data center – it is done at the edge, where constraints on power, space, heat and cost GPUs can often push out of the running. The problem here is that you can not easily get the code written for Nvidia CUDA to an x86 CPU – so a developer has to make hard choices about which architectures to plan for and support, and those choices affect code sustainability and performance under the road.

Although Intel’s OneAPI framework is really open, and Intel invites hardware developers to write their own libraries for non-Intel parts, Xe graphics are of course a first-class citizen – like Intel CPUs. The siren call from deep learning libraries once written, and once maintained, to run on dedicated GPUs, integrated GPUs and x86 CPUs may be enough to attract serious AI dev interest in Xe graphics, where simply compete on performance would not.

Conclusions

As always, it is a good idea to maintain some healthy skepticism when suppliers make claims about unreleased hardware. With that said, we’ve seen enough detail from Intel to make us sit on the front of the GPU and pay attention, especially with the (strategically?) Leaky Xe LP benchmarks around them so far to secure.

We believe the biggest thing to note here is Intel’s holistic strategy – Intel managers have been telling us for a few years that the company is no longer a “CPU company”, and it invests as heavily in software as it does in hardware. In a world where it’s easier to buy more hardware than to hire (and manage) more developers, this strikes us as a sham strategy.

High-quality drivers have long been a trademark of Intel’s integrated graphics – although gaming may not have been first-class on UHD graphics, the user experience has been overwhelming, with “just working” expectations across all platforms. If Intel manages to extend this expectation “it just works” to deep learning development, with OneAPI, we think it’s got a real shot at breaking the current Nvidia slot on the deep learning GPU market.

In the meantime, we look forward to seeing Xe LP graphics debut in the real world when Tiger Lake launches in September.