About the methodology of CPU tests in the gaming sector... [en]

What's about testing the gaming performance of CPUs? Which different approaches can be taken and what is right, what is wrong? Should CPU tests really be done using realistic resolutions? What are the arguments for tests with low resolution, what are the arguments against it?

A test needs valid conditions...

If a certain property of a product is to be measured, then one would assume that exactly this property is being analyzed and not anything else. If the noise level of a fan is to be measured, then it should be done without background noises in a special acoustic room with sound insulation or the graphics card should not be slowed down by a wrong PCIe configuration. There are further examples. In the end, it is always a matter of creating the best framework of conditions and eliminating disturbing influences so that the test object can be viewed in isolation. All these are the prerequisites for a valid test. That sounds logical so far, doesn't it?

Rankings

In order to be able to compare a set of products of the same type with each other, certain characteristics are mapped to numbers. An ordered list can then be created. In other words, there is a winner, so to speak, who leads the list. You can also look at it from a different perspective. Different companies compete against each other, the test is a kind of competition. If you as a tester do not create a fair environment, you will basically deprive the better one of the (deserved) victory. So fair play is the order of the day.

The thing about the resolution

What does the resolution have to do with the CPU? Nothing, really. The resolution is a matter of the GPU. This is where the scanning/rasterizing of the 3D scene is done. This has nothing to do with the CPU. You often hear the theory that the details are reduced or increased depending on the resolution. Until today I haven't come across a concrete example that would prove this. In practice the details and therefore the draw calls are independent of the resolution. If someone knows a counter-example, please contact me.

Which settings are correct?

Sometimes it is not enough to simply set the resolution to 720p. Modern CPUs in combination with low level APIs can even load high-end cards almost completely despite this low resolution. A reasonable approach is to maximize all settings first and choose 720p as resolution. If a high GPU load is observed nevertheless, render scaling, edge smoothing, anisotropic filtering and ambient occlusion should be minimized. Post processing can be switched off in principle. If there is uncertainty about the remaining settings, a tool like Nsight from Nvidia can be used to analyze the draw calls.

Test practice

Let us get to the heart of the discussion. If rankings should be created in such a way that all disturbing influences are removed and the resolution is solely a matter of the GPU, why do testers often choose a resolution that slows down faster CPUs? In many cases 1080p leads to a GPU limit. Stronger CPUs are clearly disadvantaged because the results at the top are drawing closer together. Let's take a critical look at this: the fundamental principle that the test object must be considered in isolation within a fair test is often not considered at all. How is that possible?

"Nobody plays at 720p..."

A common argument heard in discussions on this topic is: "Nobody plays at 720p!" or "That's completely unrealistic! You could easily counter this by saying that you can't do real CPU testing with the influence of the GPU. This is impossible in principle, because in this case it isn't a valid CPU test. Discussion finished? No, it's just getting started. I have been thinking for a while about how I could summarize my experience with such discussions in a compact way. How about a pro-con list? Good idea, let's go.

Contra: Nobody plays at 720p!
Pro: If the current Steam statistics are anything to go by, 1080p is actually one of the most commonly used resolutions in practice. For notebooks and APUs, 720p is nevertheless relevant. Upscaling technologies like DLSS lead to an internal resolution being below FHD.

Contra: That is completely unrealistic!
Pro: CPU tests do not have to be realistic at all, it is about a ranking, which should reflect the relative performance. In order to establish the reference to practice, the results of the CPU test should be combined with those of the GPU test. GPU tests are usually performed in 1080p, 1440p and 2160p. This means that it is known what the graphics card can do in the respective resolution. The CPU tests with low resolution give information about whether the CPU can also do this or not. Therefore it is important that a very strong, preferably even an overclocked CPU with fast memory is used for the GPU tests. Just like the CPU, the GPU should not be slowed down. This goes without saying.

Contra: It must be enough if the fastest graphics card is combined with the lowest resolution used in practice. At the time of writing this is an RTX 2080 Ti and FHD resolution.
Pro: Firstly, even the fastest graphics card available on the market leads to GPU limitations in 1080p and on the other hand it is not recognizable how much reserves the CPU has for stronger graphics cards or especially other settings.

Contra: Game performance of future hardware and games cannot be predicted.
Pro: I am always amazed at this argument against low-res tests. You just have to look at the current games. These will still be played in the future, hopefully. If you upgrade your graphics card or just adjust the GPU-heavy settings, you know if the CPU can handle it or not. This is a simple, but very important principle of low-res tests. Furthermore, the gaming performance of current CPUs with the same core count and the same memory speed can very well be carried over to the future, if you don't overdo it. Development happens with a certain inertia. The workloads do not differ completely within a period of one year, for example. Exceptions prove the rule.

Contra: GPU limiting behaves in a non-linear way, which means that a faster CPU can stand out from a weaker CPU even within the GPU limit.
Pro: That is correct. This effect actually exists. But it is so small (usually about 3%) that it can be neglected. If you combine the CPU with the GPU tests it is therefore advisable to include a certain tolerance. This is ultimately not a supporting argument to do without low-res tests completely.

Contra: Nobody needs CPU tests, bottleneck tests are completely sufficient.
Pro: You lose information through GPU limiting when you do bottleneck testing. You gain the same information by matching CPU against GPU tests. If you don't have GPU tests to compare, you can of course do bottleneck tests. But you have to be aware that these are not valid CPU tests.

Contra: 1080p tests with medium settings are as good as 720p tests with ultra settings.
Pro: The CPU has nothing to do with the resolution. However, there are settings that can affect the CPU, e.g. the level of detail, so this cannot be equivalent. Medium settings also put less load on the CPU in the end, so you can't estimate the performance well. In my opinion you should distinguish between a game test and a ranking. Within a game test it can and should be examined how the different settings affect each other. Sometimes an astonishing optimization potential is revealed without having to significantly forego visual effects. A ranking serves rather to stress the hardware through worst case scenes and settings, so that architectural advantages can be better distinguished if necessary.

Contra: Nobody cares about low-res test.
Pro: This Twitter poll says otherwise. Also, real CPU tests reveal which CPU is really the strongest. Low-res tests can be used to determine architectural differences and show the actual effects of different settings and overclocking. No one should care about all this? Hard to imagine.

Contra: You do not need low-res tests. Finally, there are application tests to measure CPU performance.
Pro: The workloads of normal applications are different from those of games. Applications tend to be compute- and bandwidth-intensive, whereas games tend to be latency-intensive because they often process small data packets. This is not a general rule, it is rather a tendency, because there are exceptions.

Conclusion

Tests have to look at the test object in isolation, disturbing influences have to be eliminated as good as possible. This is exactly what low-res tests offer. They are a basic prerequisite for a fair ranking. Sometimes additional GPU-heavy settings have to be adjusted to eliminate GPU limitations. In the end, the information content with regard to the CPU is always higher and the practical reference can be made via the GPU tests. So what else is there to say against it?

I sometimes get the impression that testers, perhaps even against their own opinion, are adapting to the general demand for "realistic" CPU tests. I think this is not the right way. One should rather try to inform about the low-res approach in order to eliminate possible difficulties of understanding by people.

Tags:

CapFrameX Frametime Analysis Software
Back to overview

Featured Blogposts

metrics explained
Explanation of different performance metrics
5/31/20

Frametimes, FPS, median, percentiles, x% low ?

Continue reading
post teaser thumbnail
how capframex calculates fps
The challenge of displaying performance metrics as FPS
6/27/20

Why does my analysis show fps values that are lower than what I've seen in the game?

Continue reading
post teaser thumbnail

Sponsors
841fd6cf-f6df-4504-ac5b-a627cafc4084