New World of Graphics, DirectX 12 Performance Preview
New World of Graphics, DirectX 12 Performance Preview

In the Windows 10 released by Microsoft, there is an important update that seems to be unimpressive to many people but is actually very important. That is to add DirectX 12 support. After adding DX12 support, developers will be allowed to further improve the graphics quality on the existing hardware. Then it is easy to see how strong DX12 is.


About a year and a half ago, AMD began to update the graphics API. AMD called it "Mantle" technology, which can be seen as the product of removing abstract and inefficient parts of traditional high-level APIs such as DirectX 11 and OpenGL 4 from AMD's own Radeon graphics card. At the same time, "Mantle" The technology also allows developers to access the interior of the GPU through a relatively low level, similar to game consoles, which in some cases provides developers with much more convenience than DirectX and OpenGL can do.

AMD is the first manufacturer in the industry to disclose their low-level API, and AMD will certainly not be the last. In 2014, DirectX 12 was officially released, followed by OpenGL and Apple Metal. Naturally, their starting point is to improve graphics performance. After the slow development of graphics API for so many years, it feels like a renaissance. It is undoubtedly a great good news for performance control.

On the PC platform, we have seen the early performance of "Mantle". This technology can indeed improve performance to a certain extent, especially when GPU and CPU work together. It sounds great, but this API is exclusive to AMD. What about Nvidia and Intel? This time it's Microsoft's traditional cross platform API, DirectX, which has similar efficacy to AMD's products, but uses a more common API to benefit GPUs from more manufacturers in the Windows ecosystem.

DirectX 12 It was first published in GDC 2014, when Microsoft announced a new API and their intended goals, which were simply demonstrated through a very early code, and the technical details of the specific working mode of the new API were not disclosed too much. After that, Microsoft devoted itself to the development of integrating DX12 into Windows 10, and in the latest Win10 preview version (No. 9926), The early preview version of DX12 has finally met us.


 

So today, when Microsoft's latest API slowly takes shape and becomes stable, and the first DX12 application has been written, Microsoft and its partners can finally make us feel the big move they are holding. So for the performance test, we naturally chose the current first DX12 application, the Oxide Games' Star Swarm benchmark. During the test, We will focus on the following issues:

Can DX12 improve performance like Mantle? Can DX12 solve CPU bottleneck problems that DX11 cannot solve? What about the operation of low-level APIs on common APIs and mismatched hardware?

Technically, the DX12 API is only a small part of the grand plan - just like the recent DX11 minor version upgrades pushed by Microsoft, DX12 also appears in the new version of Windows Display Driver 2.0 (WDDM 2.0). WDDM 2.0 is the largest update since WDDM was launched on Vista, which also shows that DX12 itself represents the leap of GPU ecosystem on a Windows platform.


 

At present, Microsoft has not released all the details of WDDM 2.0, and there may be more information on GDC 2015. However, WDDM 2.0 is based on DX12 and adds necessary functions to the kernel and display driver to support the necessary API, DX12's WDDM 2.0 related functions include external storage management and dynamic provisioning. These two functions cannot be enabled in WDDM 1.3. WDDM 2.0 also incorporates more basic CPU performance optimization in DX12, such as adjusting storage residency and controlling how DX12 obtains external control resources.


 

At present, some test results of A card and N card are as shown in the figure above. In brief, the latest products of A and N can fully support WDDM 2.0, but their early products are physical. Although GCN 1.0 series graphics cards of A card can run WDDM 2.0, there will be texture problems in Star Swarm, which will not occur in products after 1.1. Among the N cards, only the new ones can run, and Fermi architecture does not support it (of course, the prerequisite is to update the driver of the latest beta version).


 

As for the operating system, the situation is much simpler. Windows operating systems before Win10 generally do not support WDDM, because in Windows systems, WDDM is a kernel level component. If you want to force WDDM 2.0 to support the current or earlier operating systems, Microsoft will either castrate WDDM 2.0 functionally, or re develop those old system cores. When Microsoft updated Direct3D 11.1 and WDDM 1.2 for Win7, it found that even such a small update would lead to compatibility problems, so it seems understandable that Microsoft blocked the old operating system from DX12. Microsoft's compensation for this is that users of Win7, 8 and 8.1 can upgrade to Win10 operating system for free, which is simple and rude, Of course, the premise is that the user uses the genuine version.


 

At present, the software for testing DX12 preview version only comes from the Demo of Oxide Games' new version of "Star", which was first released in 2014 to cover up the effectiveness of Oxide's Nitrous engine and Mantle. This Demo depicts the scene of two fleets fighting in a huge space, depicting thousands of warships and many other visual effects, Such workload can easily make the existing high-level APIs fully loaded, resulting in a decline in rendering capacity, to show the function of the new low-level APIs.


 

In short, it is a score running software, and Nitrous engine will also appear in several games that we will meet soon. It will also wait until GDC 2015 to explore the details. For this software itself, which can be regarded as a score running software, it uses a real-time calculation method. The two AI fleets fight each other, and each time they run a score running, the results of the battle are different, The software has a built-in RTS mode that is reliable enough to ensure that each score run is as close as possible to the performance test. During the test, we also found that the data of this score run software is reliable and the overall performance is stable enough.


 

The N cards participating in the test today are GTX980 (Maxwell 2), GTX750 Ti (Maxwell 1), GTX 680 (Kepler), and the A cards are R9 290X (GCN 1.1), R9 285 (GCN 1.2), and R9 260X (GCN 1.1). In terms of CPU, we use the same i7-4960X CPU to roughly simulate i7 (6 cores), i5 (4 cores) and i3 (dual cores). However, it should be noted that although we cannot control the L3 cache size of 4960X, the impact on the overall performance should be slight.

As for the processor of Family A, although no test has been carried out here, according to the CPU module design of AMD, the performance should be between the 2~4 core processors we simulated.


 

The above is the test result. Looking at this result, we can finally answer the first question: Can DX12 improve performance like Mantle? Yes.


 

Continue to explore DX12. The biggest improvement of DX12 over DX11 is to remove the bottleneck effect of the CPU. Under DX11, a lot of work is done by a single thread, so the score running result is also limited by the single thread performance of the CPU, and cannot fully play the CPU's efficiency. This is what DX12 is committed to solving, The opening of lower level APIs means that Oxide can more directly control the submission of computing tasks, and the task allocation between CPU cores is more reasonable.


 

From the test results, Starry's performance is less sensitive to the number of CPU cores than DX12, that is, during the test, The part of the CPU that exceeds the four cores is basically meaningless, and the score is only related to the performance of the GPU. That is to say, the DX12 greatly reduces the user's need for multi-threaded processors, because from the current performance, even if only the four cores are used, the score is beautiful.  

So we don't need to use the six core settings below, but in our next test, The efficiency of DX12 is so high that the GTX 980, the strongest graphics card in our test, is needed to show the difference between dual core processors and quad core processors. The difference between the scores of the remaining A and N cards when changing the number of CPU cores can be almost ignored, which also shows that the efficiency of DX12's batch processing submission mode has been greatly improved, So that Oxide only used two CPU cores to complete batch submission and AI simulation.


 

Now that we talk about batch submission, let's take a look at the statistical data of Star Demo to analyze the batch submission. From the data that can be said to be difficult to execute (especially the data of card A), the time of batch submission has dropped from tens of milliseconds or even hundreds of milliseconds to the lowest level of 3-5 milliseconds, which has simply improved the performance of a dimension, The time required for CPU to process batch submission has been completely eliminated. In just a few milliseconds, 120000 more drawing requests can be submitted. This optimization can be said to directly improve the performance of DX12, which is absolutely a good thing for other upcoming games.


 R9 290X@DX11 CPU usage.


R9 290X@DX1 two CPU usage.


GTX980@DX11 CPU usage.


GTX980@DX12 CPU usage.

In addition, we can also observe the CPU usage recorded by the operating system itself. Under DX11, GTX 980 and R9 290X are used, The CPU usage is very uneven. Most of the time, 1~2 CPU cores bear a lot of workload. By DX12, the workload is evenly distributed to all four cores.

That is to say, the single thread performance of many CPUs can no longer meet the progress of GPU. That is to say, with the appearance of the "CPU bottleneck" phenomenon, developers must tap the potential of CPU multithreading performance. DX12 is a good solution, allowing developers to optimize their own applications to use multithreaded processors.


When the CPU is locked, monitor the GPU performance. However, since the performance of the GPU can still be fully utilized in the Star Wars test, the performance here is ultimately directly related to the GPU.


At present, in the Star Wars test, the performance of the N card is generally good - even considering the actual performance gap, The performance of the N card is also unexpectedly good. Among the graphics cards we tested, the N card has no pressure to occupy the front row. The GTX980's performance is more than 50% higher than the R9 290X. The GTX685 also has 25% advantage against the R9 285. The results here are for reference only. Before the official release of DX12, everything may change.

At the same time, it is worth pointing out that due to the poor performance of card A under DX11, card A users will become the biggest beneficiaries of DX12. The performance of GTX980 under DX12 has increased by 150%, and the performance of R9 290X has increased by an exaggerated 416%. As for Mantle, it is not discussed here because card N does not support it.

In these data comparisons, we really care about the GTX750Ti. Although the performance has also improved, the performance improvement of only 26% is not worth mentioning in front of the performance improvement of several times.


The previous test results are also very clear. Except for the GTX980, all other card running Star Wars only need dual core CPUs. That is to say, when you configure your own machine in the future, as long as you have DX12 support, you can tilt the cost to the GPU. Of course, if you want to play with liquid nitrogen cooling dual channel Titan, you'd better buy a top CPU. You must have money.


From this result, the future of DX12 is really bright. At present, the detailed functions and developer support of DX12 have not been announced, and more information can only wait until GDC 2015. So strictly speaking, today's evaluation can only be said to be a "preview" of the performance that DX12 can provide.

We must say that we are very, very satisfied with the improvement brought by DX12, but the problem still exists. Even if DX12 opens those APIs, how will hardware manufacturers and developers use them, and what will the results be? We don't know yet, but as far as the preview results are concerned, CPU efficiency and multithreading performance have been greatly improved, which is absolutely possible.

For developers, there is no doubt about the potential of DX12, but the difficulty of DX12 development also determines that the transition from DX11 to DX12 will never be a simple and fast process. How long will it take? We don't know.

 

original text


Share:
RC
edit
Benefits!
Share Weibo Share WeChat
 Aigoji WeChat

Aigoji WeChat

 WeChat

WeChat

Recommended products

Sorry, the product you are looking for is not available in the product library

on trial