2023-03-21 Render & Cycles Meeting

To add a bit to this conversation, an important matter to take into account is power consumption, in my experience the RTX3090 or the RTX4090 don’t get 100% of its power consumption limits when they are rendering, no matter if the usage goes to 100% (BTW the 3090 reaches 100% but the 4090 don’t), in games with ray tracing for example the power consumption reaches way higher levels, I assume there are parts of the GPU that are not used when rendering.

So in the end don’t use real time benchmarks or synthetic benchmarks to evaluate a GPU, the Blender Benchmark is a way better system to evaluate it :slight_smile:

I am a little sad that further nothing is known about HIP RT and improved rendering in viewport for Linux. I’m especially waiting for viewport to not crash. From what I read it seems there is no chance of improvement in version 3.5 a bit of a pity because I bought a Radeon, encouraged by assurances about the introduction of HIP RT and overall performance improvements.

1 Like

And that’s the problem. BOD basically feeding people misleading data and people draw wrong conclusions based on that data. Garbage in, garbage out. You have to either, update the definition of the score or change the score itself so it would represent what it should - the actual number of samples given CPU/GPU can compute per minute in Blender. In my case that number should be around 219 instead of 704 that the site shows.

The wording on the site can be tweaked. It’s meant to be an abstract score, with a bit of background information of what it is measuring for those who want to dig deeper. The benchmark page doesn’t say which resolution it’s rendering at, if you assume a smaller resolution it’s accurate.

The Linux crash is a driver bug, so whenever that is fixed 3.5 would most likely stop crashing too. It’s out of Blender developer’s hands.

For HIP-RT or any other development, we don’t give any assurances but just openly communicate what we are working on or planning. Nothing in these meeting notes should be taken as a promise. In general I would not recommend to buy any hardware based on speculation about future improvements, all we can be sure about is benchmarks as they are now.

3 Likes

I very much agree. even past normal users, power consumption per sample is a primary concern when considering which GPU to use for setting up a render farm. This is why workstation cards are generally high vram and low TDP, as I understand- so you can throw half a dozen on a giant motherboard and get away with only a 1.5kW power supply and lower electricity costs per sample, as opposed to a proper 4090, in which half a dozen would run over 2kW on peaks and is relatively inefficient per sample if you run it at full power, if these graphs are anything to go by.

I am a proponent for allowing viewing of individual scene scores, but also I agree with brecht here that it’s merely an issue of semantics and rhetoric that could be made a little clearer. I personally don’t think it matters if the values are added up or kept separate, so long as you get a good sample size and can compare hardware on a 1:1 scale, which, because even the order that sub-pixels are sampled in is always the same, and the score calculation is always the same, is accurate. The data is accurate and can be used,
but the bottom line is that 3D artists understand every scene, and what every artist makes is different, and every system will react to every scene a little differently(there are even CPU’s out there which are better silicon with a DOA core disabled and sold as a lower tier product, like my 1070 that should have been a 1080 but began life as a top-scoring 1070 before I ran it too hot for 4 years oops.), so samples per minute is inherently a very soft value, as shown by the hard benchmark scene being slower than the simple ones. Therefore we add up the results of a number of scenes and take a median result for each hardware piece in nearly every system it will fit in.

While I highly appreciate that the benchmark was updated, I still think it can be—and should be—so much more.
Hardware-software interaction is complex. A benchmark demystifies component selection for those who are not interested in hardware but still want to make informed decisions.
But it also schools the “know-it-alls” by showing that real life isn’t always 1:1 of what theory, or spec numbers, might suggest. Hardware quirks, software quirks, OS quirks, driver quirks… the list goes on.

A benchmark should remove all requirements of insight and knowledge (real or assumed) and present a single number (minimum requirement) of expected performance for a specific configuration. Detailed breakdowns are appreciated.

If I had a Mac Pro with two Duo modules, it can’t be that the benchmark arrives at a score and thinks to itself (in the background) “oh, I see you have 4 GPUs of the same type, so I’m just going to divide the performance score by 4 and just hand you that”.

I’m not sure what the benchmark is doing now, but I find it highly curious that a Pro Vega II and a Pro Vega II Duo score “the same”. Surely, if we only get ONE number it has to be the total performance of a configuration? I don’t want my Mac Pro with two Pro Vega II and my MacBook Pro M1 Max to score the same, while my Mac Pro is twice as fast in Blender.

I would like Blender’s benchmark to reach Geekbench levels of popularity. My wish would be that it launches with a wonderful splash screen showcasing what Blender is about. Blender is for everyone and it’s free, so it should be one of the first benchmarks testers (mainstream tech and YouTube) look for when evaluating computer performance. It should be front, center, and brand awareness building. :heart:

Geekbench is a generic instruction set, cycles with cuda, hip, oneapi, Metal, I think it is a generic computing optimization benchmark for each.
hip vs cuda performance ratio is actually not as strong as 2.93 opencl vs cuda, 2.93 classroom scenario 6900xt surpassed 3090 cuda, hip can not do.

Nvidia GPUs typically run in a reduced power state (P2) when running CUDA applications. So the power draw of a 4090 while running Cycles in CUDA or OptiX is actually closer to 300 watts in my testing rather than the 450 watt normal power limit, or the 600 watt overclocked power limit.

1 Like

I can confirm this, the 3090 is very efficient when rendering, also the 4090, I can say that right now I have 5 computers rendering at full power an 8k render and the total power consumption, including eveyrhting else that is connected, is around 1,6Kwh, this includes:

1x2080Ti
2x3090
2x4090

So in the end I think this is considerably efficient :slight_smile:

(On the other hand, I saw a single 3090 system using 700wh while running the game “Cyberpunk”)

Seems like you’ve missed my point entirely. Once again: The website states that the benchmark score is “the amount of samples per minute that a CPU or GPU can compute”, but the actual value shown is NOT “samples per minute”. It is a combined ‘spm’ result of all three tests which the website neglected to mention. That is the misleading part.

no, we understand, and I agree with brecht that saying
“each benchmark will record your score in samples per minute, then the total of all scores will be your final score.”
on the site-
or changing the website’s score calculation in a way that we can with our current data (which does include spm) is a better solution than throwing out our current 5 years of benchmark data and changing the benchmark.

but on the other hand, SPM is not a very good measure of a system OR a benchmark. it’s a good measure of BOTH as a whole BUT NEITHER individually.

so my end answer is “allow the user to view separate SPM results without needing to parse raw data themselves.”

I understand these arguments and I have no complaints about the work of Blender developers. I also have hardware with Nvidia graphics card and radeon I bought for testing and just to check the progress. Besides, I have always cheered for those who are chasing the leader. It seemed to me that AMD is a serious company that treats its customers seriously and when it announces something, it takes these announcements seriously. As you can see, Apple has recently realized that support for Blender is an important thing. But their hardware is dedicated primarily to creative work. I hope AMD will wake up and notice that the graphics card market is not just about games. If they opened up the source code to their drivers completely, I think progress would be decisive and ultimately AMD would gain a lot of money from it. I suspect Intel will be the first to finalize its support for raytracing. And they have open source drivers.

1 Like

I think amd do not have to give up so soon opencl, intel’s oneapi relative to the floating point performance of the efficiency of the implementation is quite good, 2.93 opencl is a textbook level optimization.

1 Like

3.0 cuda rendering speed increase of 2x-7x, even if it is even 2x-5x speed increase, hip but only 1.5x-3x than opencl, if the use of opencl is not also close to 2x-5x?

They don’t have to change the benchmark or throw out the data. They can either switch to the actual median spm (many reviewers use it) or they can simply update the description and add the part about the score being combined result and if you want an average spm then you have to divide the score by 3.

Median score will further obfuscate the data with more extreme cases where the same card can handle one scene much better than the other.

Adding individual scene scores criteria along with added overall score would solve all the issues in my opinion.

I don’t see a problem here. Quote: «In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample. For a data set, it may be thought of as “the middle” value. The basic feature of the median in describing data compared to the mean (often simply described as the “average”) is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of the center.»

It would be nice.

the problem is that a median of 3 data points is far less accurate in general than an average. here, I’ll RNG(0-100) some numbers for us. two set of 3 and two sets of 11. 50 is the true answer.

12,64,89 (median = 64, average = 55)

8,78,88 (median = 78, average = 58)

88,13,7,67,66,47,0,60,8,79,45 (median = 47, average =43.6)

64,58,77,30,99,41,29,12,20,8,73 (median = 41, average = 46.5)

as you can see, for small data sets, median performs poorly, and can even fail easily below 10. it is very weak to small or “clumpy” datasets, but very good for diffuse, high sample ones. Say we have all the numbers above for a test expected to take 50 or so seconds, but one user accidentally had a priority render running in the background, and it took an hour, or a broken GPU that was severely undervolted.
64,58,77,30,99,41,29,12,20,8,73,88,13,7,67,66,47,0,60,8,79,45,8,78,88,12,64,89,3544 (median = 58, average = 168)
notice how much that 1/28 chance of a bad data sample did to the dataset. It would lead people to decide that the card is 30% as fast as it really should be.

and that’s why we don’t use a median when calculating the cumulative score for only 3 tests. an average would be better, but still would not be indicative of card that, say, handle raster better, or volumes better.
so add them up.

@JohnDow so, TL;DR, we’ve been agreeing with you this whole time that we should make the individual scores more easily available (you can already see and browse and parse them), but disagreeing that samples per minute means anything concrete at all. After all, my $400 lenovo 2011 laptop outperforms my 2017 workstation in erosion simulation, and only erosion simulation.

The scores, despite being a concrete metric, don’t really mean anything beyond a value-based metric with which to compare the result of 2 sets of uncontrolled variables in a small range of common use cases. it would only mean something if you controlled variables and ran it twice on the same system after changing a single component…

and it seems you’ve seen these exact tests. those are not the kinds of tests opendata will give you. you need to have the hardware or a coordinated effort for that.

Try light linking now: Blender Builds - blender.org

I like the implementation so far. Looking forward to it getting into master.

Since light linking is getting developed, does that mean we’re also going to get light-specific controls like in Eevee?