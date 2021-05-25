Ray tracing has become the present of videogames, thanks to the commitment made by the greats in the sector, NVIDIA and AMD, and the growing support that this technology is receiving from developers. In this sense, it is important to highlight that the support of this technology in new generation consoles, PS5 and Xbox Series X-Series S, is also being key to favor its adoption.

Those of you who read us daily already know what is ray tracing, how does it work and what does it mean for the world of videogames. In this article we made a deep analysis with all its keys, we saw why it is so important and we discovered its mechanics, that is, the model of lightning, impact-failure and collisions, three aspects that are fundamental when explaining the enormous impact that has this technology at the performance level.

During the last months we have also seen different performance tests, using NVIDIA RTX and NVIDIA Radeon RX 6000 graphics cards in games prepared to take advantage of, in one way or another, ray tracing, and the conclusion we have reached has been very clear: NVIDIA leads the way, and in a pretty clear way.

However, I know that many of our readers do not quite understand why there is such a big difference in performance, in ray tracing, between the NVIDIA RTX 30 graphics cards and the AMD RX 6000 graphics cards, and that is why I have decided to share with you an article, where we are going to answer that question. As always, if after reading this article you have doubts, you can leave them in the comments.

Ray tracing: preliminary considerations

The workload of ray tracing it’s the same, regardless of the type of architecture we use. Calculating the intersections and collisions of the rays is the basis of this technology, and represents a “very heavy stone” that, on graphics cards that lack specialized hardware, has to be calculated through the shaders.

The added stress of ray tracing applied to a GPU without specialized hardware is so great that it cannot, except in very specific cases, offer a minimally acceptable performance. It is not complicated to understand, the same elements of the graphical core have to take care of the classic rasterization tasks, and to this are added intersections and collisions, which represents a huge consumption of resources, and raises the milliseconds needed to generate each frame.

Thus, for example, to generate a frame per second without specialized hardware, using only shaders, we could record a time of 51 milliseconds, while combining shaders and RT cores that time would be reduced to 20 milliseconds. If we add the tensor cores, present in NVIDIA graphics solutions, the time would drop to 12 milliseconds.

Well, what do those numbers mean? Well, it is very simple, to maintain 30 frames per second the GPU must render each frame in 33.33 milliseconds, and if we want to maintain 60 frames per second that time is reduced to only 16.66 milliseconds. With a rendering time of 51 milliseconds, the performance would be terrible, while with a time of 20 milliseconds we could play fluently, and with a time of 12 milliseconds the experience would be totally optimal.

The hardware dedicated to accelerating ray tracing carries out tasks typical of that technology to ultimately reduce the time required to generate each frame, but both NVIDIA and AMD have followed different approaches, and this has made their actual performance very different.

AMD and Ray Tracing: A Limited Approach

We started with AMD. When NVIDIA confirmed its bet on ray tracing back in 2018, the Sunnyvale company decided to wait for such technology to begin to standardize. This brought the Radeon RX 5000 to market without dedicated hardware to accelerate ray tracing, which placed them, from a technological perspective, in a clear inferior position to the RTX 20.

The Radeon RX 6000 therefore became the first generation of AMD graphics cards to feature dedicated hardware to accelerate ray tracing. When Microsoft discussed the technology behind the Xbox Series X SoC we were able to confirm how AMD had implemented dedicated ray tracing hardware in its RDNA 2 architecture, and since then my expectations were significantly lowered, and my forecasts were not good at all. In the end I was right in almost everything I said in this regard.

In the RDNA 2 architecture, the foundation of the Radeon RX 6000, we have a ray tracing acceleration unit for each compute unit. A computing unit has 64 shaders and 4 texturing units, but said ray tracing acceleration unit share resources with texturing engines, which means that they cannot work simultaneously.

To all the above, we must also add, two other important limitations exhibiting those ray-traced acceleration units. The first, and most important, is that those ray-tracing acceleration units work with the ray-triangle intersections and with the box delimiters, which are the most intensive and most resource consuming, but the BVH cross intersections, which are a step prior to those, are carried out by the shaders.

The impact of BVH cross intersections can be reduced through specific optimizations in games to reduce rendering time, but it is not always feasible, and when it is not done, or is not executed properly, the performance loss is notable, since very valuable resources are consumed that could have been dedicated to shadowing tasks. Their second limitation is that they lack the ability to work asynchronously.

And why has AMD used this design in their Radeon RX 6000? I think because it was the most effective in terms of cost and space on the chip. We must not forget that RDNA 2 is an architecture that was designed to become the central pillar of new generation consoles, and that they use APUs, a solution where space on the chip is not only very limited, but is also distributed between CPU and GPU.

Spend a lot of space integrating specialized ray tracing hardware it was not a viable option, especially when you have doubled the maximum number of shaders, and you have decided to use the infinite cache to improve bandwidth without having to resort to buses of more than 256 bits, or to memories of more than 16 GHz. Infinite cache takes up a lot of space on the chip, although at the same time its presence is justified, not only by what we have said, but also because, well used, it can help improve ray tracing performance, since certain loads have a minimal dependence on capacity, and a huge dependency on bandwidth.

NVIDIA: Ampere enshrined, and promoted, the foundations of Turing

NVIDIA’s approach is totally different from AMD’s. The green giant integrated RT cores as a type of hardware dedicated to fully download ray tracing task shaders. This means that each RT kernel computes the BVH transverse intersections, the ray-triangle intersections, the box bounding intersections, and the collision system. In the case of the RT nuclei present in Ampere (RTX 30), these also calculate the interpolation of each triangle in time.

Each SM unit has 64 shaders, 4 texturing units, and an RT core on Turing, and 128 shaders, 4 texturing units, and an RT core on Ampere. These cores do not share resources with the texturing engines, they can work fully independent and asynchronous, so that when the SM unit fires a beam, the RT cores take care of carrying out the whole process of hitting failure, as well as the collisions. This work can be performed asynchronously, as we have said, allowing the task scheduler to order the completion of all work related to ray tracing, computational and graphics loads and, if applicable, the work of the cores. tensor, simultaneously.

In the Ampere architecture, rendering a software ray traced frame using the shaders requires 37 milliseconds. With the support of RT cores, the time is reduced to 11 milliseconds, and if we also apply the tensor nuclei the time drops to 6.7 milliseconds. These are truly impressive figures that confirm that NVIDIA has managed to “tame” ray tracing with Ampere, although I believe that the most interesting is yet to come, and that with the RTX 40 we will see a much bigger leap.

I remind you, before finishing, that NVIDIA also uses the tensor cores to carry out a significant part of the workload that ray tracing represents, noise reduction, one of the final steps, and one of the most important, that is done to complete the rendering of each frame. Without it, the images would arrive laden with noise, and would appear dirty and dull. We must not forget, in addition, that the tensor cores allow the activation of the DLSS technology from NVIDIA, an intelligent reconstruction technique that reduces the number of pixels without loss of image quality, thus lightening the burden of ray tracing.

AMD is working on its own alternative, tentatively known as FidelityFX Super ResolutionAlthough we still do not know what it will be really capable of, and we do not have a confirmed release date, so we have to wait. All in all, and seeing what NVIDIA achieved with the first generation of DLSS, it is likely that this AMD technology need a revision to finish maturing.