The secret weapon of the “Radeon RX 6800” is the “Infinity Cache” derived from the CPU -PC Watch



[ad_1]

RDNA 2 architecture features

Information on the next generation “Radeon RX 6000” series of GPUs that adopted AMD’s RDNA 2 architecture has been completely removed. Check out the Hothot review posted at the same time for your performance analysis, but here I would like to explain the architectural aspects of the Radeon RX 6000 series.

RDNA 2 with 54% improved energy efficiency compared to traditional RDNA

As previously reported, RDNA 2 uses the same 7nm process as traditional RDNA. In other words, the characteristics of the transistor do not change significantly. Despite this, energy efficiency has improved by 54%.

Numerically, the Radeon RX 6900/6800 has 26.8 billion transistors, which is low considering the competitor’s GeForce RTX 3090/30 80 has 28.6 billion. Due to the difference in the generation of the manufacturing process, the die size is compact, 519.8 mm square versus 628 mm square.

Comparing just these numbers, it’s no wonder we can deliver near-competitive performance at a lower cost (RX 6900 for $ 999, RTX 3090 for $ 1,499). However, the 54% energy efficiency improvement of RDNA from the same 7nm process is striking.

According to the company, this 54% breakdown is an “operational clock improvement,” such as adopting a high-performance library, redesign of the speed-focused pipeline, high clock design adopted in the CPU, activation of the clock and redesign of the pipeline. It is said that it will be summarized in “performance improvement per clock” as “CAC and power optimization” that suppresses the location and movement of data, the adoption of Infinity Cache and the optimization of the geometry / tessellation part.

RDNA 2 architecture

RDNA 2 Architecture Overview

Half the power with the same clock, 1.3 times more clock with the same power

Breakdown of energy efficiency improvement

Minimal implementation for late races

GeForce RTX adds “RT core” that specializes in late races and “Tensor core” that speeds up AI processing and takes steps to reduce noise in late races, while Radeon has 4 per CU / clock It only contains a “throttle of beams “which performs beam-to-box or single beam-triangle intersection processing, and has not been as widely implemented as the GeForce RTX. Therefore, the real-time sprinting performance is inferior to that of GeForce RTX.

However, with this beam accelerator, RDNA 2 achieves more than 10 times the sprinting performance of conventional RDNA. The 128MB L3 “Infinity Cache” is helping to do that.

Limit volume (BVH) is one of the important methods for performing late runs. The object is surrounded by a large box that circulates and the intersection of the ray paths is judged, in BVH this box is handled in the form of a tree and the number of intersection tests can be reduced by making it hierarchical. Infinity Cache maintains more BVH working sets to reduce delays at intersections.

Lightning Accelerator Implementation

Infinity Cache is 10 times faster than software processing even with minimal implementation of beam accelerators

First of all, the late-loop implementation in DirectX itself does not need to use dedicated hardware and can be supported by a conventional general-purpose arithmetic unit, but on Radeon, it is faster than processing with software simply implementing a simple implementation called the accelerator. Ray. I made it possible. By the way, the noise reduction in the later races will be done with the company’s original “Fidelity FX”.

The Radeon development team finds it impractical to implement the performance and quality of real-time sprinting with today’s semiconductor technology. Furthermore, even with the existing rendering technology, various methods are being increased and it is possible to achieve a graphics level that is not inferior to that of late runs. Therefore, the cost was not spent much on the implementation of late runs.

Late races performed by RDNA 2. In any case, the meaning of strengthening the effect is strong.

However, in RDNA 2, the calculation unit used for general-purpose operations has been improved. First, it has become possible to operate a combination of integer / floating point operations with different precision. This is supposed to be for Tensor operations, but it should be understood that it can also behave like a Tensor kernel.

Also, the rendering has been improved so that it can handle eight 32-bit pixels per cycle. It also works with rasterization and supports variable rate shading. In addition, it also supports the mesh shader and sampler feedback specified in DirectX 12 Ultimate.

Structure of the computing unit, the smallest building block of RDNA 2

Now it is possible to perform calculations with different precision

Improved rendering with support for variable rate shading

Supports DirectX 12 Ultimate



[ad_2]