2023-03-21 Render & Cycles Meeting

Attendees

  • Brecht Van Lommel (Blender)
  • Weizhen Huang (Blender)
  • Thomas Dinges (Blender)
  • Patrick Mours (NVIDIA)
  • William Leeson
  • Michael Jones (Apple)
  • Brian Savery (AMD)
  • Colin Marmond

Notes

  • Blender 3.5 release is near. One last minute issue was found in OpenImageIO, where multithreaded Cycles image texture loading would crash. This is still planned to be fixed.
  • Sergey and Brecht started working on light linking support in Cycles, because this is a widely requested feature and to test in on an upcoming open movie with the Blender Studio. Thereā€™s a basic inefficient version working. The main challenges to resolve a good user interface on the Blender side, and making it work efficiently with the light tree.
  • Weizhen worked on speeding up the light tree building with multithreading, to help reduce wait times when starting to render or changing the scene geometry. On a test scene with heavy geometry, the main branch is now 11x faster on a 20 core machine.
  • Based on a request from the Blender Studio, Weizhen worked on improving some cases with bump and normal mapping where Eevee gave better results than Cycles. The main change is that a diffuse and specular BSDFs now use different methods, darkening for diffuse BSDFs and keeping the reflected vector above the surface for glossy BSDFs.
  • William submitted a pull request for faster geometry updates and Brecht reviewed it. There was some discussion about the implementation details of this, how best to structure things. In particular multi-device rendering would ideally be as abstracted from the host side as much as possible. Details will be in the pull request review.
  • Brian reports that AMD is testing a fix for the compiler that is keeping the HIP kernels disabled for now. This has to go through some validation, but expected is there will be something for Cycles developers to test in a few weeks.
  • The AMD HIP-RT patch was updated, Brecht reviewed the latest state. The two questions were around some hardcoded values for per thread memory (which may be ok for now while a better solution is looked at in the HIP-RT library), and some code that could move from the Cycles kernel to hipew since itā€™s about the HIP API (this is being looked at).

Other News

Practical Info

This is a weekly video chat meeting for planning and discussion of Blender rendering development. Any contributor (developer, UI/UX designer, writer, ā€¦) working on rendering in Blender is welcome to join and add proposed items to the agenda.

For users and other interested parties, we ask to read the meeting notes instead so that the meeting can remain focused.

31 Likes

I donā€™t believe my eyes. Is this true?!

5 Likes

yes ! blender/blender - blender - Blender Projects
The dream is becoming true

2 Likes

Would this mean converting worlds into real objects? Since the world is a light source, it should be ā€œlinkableā€ too. And I think this leads to being able to use more than one of those at the same time (i.e. one for chars and another for the set, or a general one and a second one slightly tweaked specifically for hair speculars, shadows, or whatever), which is not possible at the moment.

I havenā€™t seen ā€œPrincipled v2ā€ mentioned in Render & Cycles meeting notes for quite a while. Is that just because changes are incrementally being merged with the main branch? Or has development on Principled v2 been paused?

1 Like

Lukas Stockner whoā€™s taking on that project hasnā€™t been active for about two weeks now. I believe thatā€™s the reason for less information on its development

Awesome, great news!

1 Like

Nice news also come from Weizhen work

1 Like

Please ask the 7000 series gpu running on hiprt whether you can get a close to rt game performance improvement ratio (compared to 6000)

does this mean HIP-RT is on schedule for the 3.6 beta? Iā€™m very eager to see how different GPU designs do on the Open Data listings- I donā€™t think I know of any non-gaming or non-synthetic benchmark that target the hardware raytracing of these cards specifically. itā€™s been far too long that weā€™ve had multiple cards with silicon specifically for calculating light paths but only support for one manufacturer (one that now seems to have abandoned high vram and not competing with its own workstation pricing). Iā€™d love to see what comes of it- if nothing else, itā€™ll give a way to directly compare driver and silicon implementations without the nonsense of a game engine acting as a black box that favors some architectures in unpredictable ways.

I donā€™t know if this is off topic, but 3dmarkā€™s port Royalā€™s performance indicator of 7900xt using the latest driver has exceeded 4080, 4080ā€™s score is17600

1 Like

Thatā€™s what I was referencing, though. Thatā€™s a synthetic benchmark, AKA, a perfectly repeatable, known test that spits out a roughly arbitrary ā€œscoreā€ thatā€™s usually a mix of certain weighted result values. Thatā€™s similar to running on a treadmill, dividing your heart rate by 300 and top speed by 30, subtracting the first from the second, and saying ā€œthe closer to 1 you score, the more fit you are.ā€ while it can be a good rule of thumb, itā€™s hardly an accurate gauge of how well youā€™ll handle a steep forest trail, and because computer hardware engineers use these tests too, there can be some bias for synthetic benchmarks built into computers, at the silicon level!

Thatā€™s why people value using Blender as a benchmark in the first place- it hits the GPU with a real world load in a situation that does produce a true production output. The opendata score is ā€œaverage samples per secondā€, and is also a synthetic test of sorts, but unlike port 3Dmark, there are purely path tracing results. Even the most raytracing-heavy games rely firmly on rasterization to produce most of the image, and their raytracing is sparse, requiring AI-trained algorithms, temporal noise reduction, and such to turn it from a noisy, 1 sample, 240p mess to a smooth reflection.

Blender can provide both that and a non-synthetic benchmark, which you can simple test as ā€œhow long does each X use to render these scenes in my system?ā€ and because of that nature, you can even do more qualitative testing like ā€œhow much better is a $300 nvidia GPU vs the $300 intel CPU at the exact same workload and task, and which scenes is either totally incapable of running?ā€

TL;DR, 3Dmark a synthetic benchmark, mostly for games, but blenderā€™s unbiased path tracing is far more hardcore raytracing, and eevee is pretty direct raster, and both better represent a production workflow with a focus on quality file output and data processing, rather than display and visual output, whether you use it as a synthetic benchmark or not.

on that note, I thought cinebench was GPU capable, but it looks not, and itā€™s synthetic anyway, and the only other production benchmark I can think of.

2 Likes

So Iā€™m not asking too much, the 7900xtx should render 80% faster than the 6950xt, not just 20%.
hip and cuda is not the same can not be used as a reference, then the same hip benchmark 61t than 26t strong 80 percent should be required.

oh, yeah, I did some vague math a while back using opendata results for both radeon and arc, and it came to I believe about a 60% boost if you were to base it off some deltas between raytraced gaming performance to blender samples/second.

Of course, according to open data no one out of nearly 2000 benchmarks has ever used CUDA on a 40-series card, so we cannot actually measure the effectiveness of raytracing hardware over normal GPU compute.

@brecht Do you think itā€™s possible we could get some CUDA data out of Open Data, too? itā€™s really hard to know exactly whatā€™s going on with generational performance or how raytacing cores affect rendering beyond the 30-series, and I think a third data point would be very useful data. I know youā€™re busy, and might no longer be super involved with open data, but Iā€™d appreciate it if you could pass it along. I donā€™t imagine it would be difficult to have the benchmark ask or have an option to run the test with a different compute type selected.

2 Likes

The new OpenData benchmark (released around the release of Blender 3.0) doesnā€™t let you pick between the CUDA or OptiX backend from the user interface. Hence why CUDA hasnā€™t been tested on modern devices with modern versions of Blender in OpenData.

However, if you run the command line version of OpenData, CUDA can be used (although most people donā€™t use the command line version.)


Hereā€™s some data for a RTX 4090:

Monster:
CUDA: 4344.76
OptiX: 6606.05

Junkshop:
CUDA: 1896.88
OptiX: 3074.19

Classroom:
CUDA: 1908.89
OptiX: 3153.56

Total:
CUDA: 8150.53
OptiX: 12833.80

3 Likes

Sorry to interrupt, but what is the point of that OpenData website??? According to the ā€œinformationā€ provided on that website my GPU (1660GTX) can render 704 samples per minute (OptiX / Windows / 3.4 / Median). I decided to put that information to the test and here is what I got:

Default / OptiX / 1660 GTX / Blender 3.4.1

Classroom: 104 samples per minute
Junkshop: 219 samples per minute
Monster: 357 samples per minute

Median: 219 samples per minute

As you can see this real world value of 219 is nowhere near 704 from the OpenData database.

PS. Looks like that 704 number is a misleading combined score. Even though the site says nothing about it.

Taking a median of 3 scores is not a very good measurement from a statistics standpoint, and averaging them would take away from the fact that the samples are not all of the same thing, so OpenData adds up the results. because itā€™s samples per minute, it doesnā€™t really matter how long it takes to render. Basically, it renders a large and photorealistic, medium and pseudo-realistic, and small cartoony scene at you, all with a variety of materials and lighting, and adds them up.
now, I agree that itā€™s not the most visible- you need to download the raw data to do it, but timings and samples for individual scenes are publicly available, as well as the specs of the system that generated the readings. Itā€™s in JSON format, so itā€™s easy to parse if you make a script.

Also, please note that GPU brands often engage in a rather unreputable practice of naming their laptop cards the same as their desktop cards, and suggest to customers that they have similar performance. Note on the steam user survey, their 3060 card, is actually 4 cards, with different vrams, core counts, and clock timings.

sweet. I noticed it was probably the benchmark launcher. It already asks you what device you want to use, maybe there should be a second menu for available compute types. Assuming you donā€™t have a binned or nerfed silicon, thatā€™ll be a handy little data point, though the general idea seems to be that;
RTHW (raytracing hardware as a generalized enhancement/chip) has diminishing returns with respect to cuda. however, Iā€™m going to graph it out now, and Iā€™ll need to take note the actual number of RT cores. Iā€™m not sure how to normalize this data. I tried, but kept coming up with diminishing returns for the 30 and 40 series, but exponential returns on the 20 series.
Untitled
Iā€™ll need to learn about optix to really graph this out. Iā€™m not sure how the cuda cores are utilized differently, if at all- I canā€™t imagine they wouldnā€™t be.

I think itā€™s likely that core clocks, driver versions, memory bandwidth, and PCIe lane utilization all factor heavily into this, which would suggest itā€™s too complex a problem to get clean data out of, so we might just be better off accepting it as somewhere between a score-based synthetic and ā€œitā€™s good for x but not for yā€ qualitative benchmark.

1 Like

Here is the problem. A big one. Blender Open Data shows you a median Benchmark Score that the website describes as:

When an average Joe sees that score, he thinks that is the number of samples given GPU / CPU can render in a minute because there is nowhere in the text does it explicitly say, that it is a combined ā€˜scoreā€™ of all tests. WIthout a proper description that number is highly misleading.

Nothing on that page refers to a median. The median referenced is a median of all the scores a device got in a certain category, not the 3 scores each individual test generates- and the test itself tells you explicitly that the samples per minute of each separate test are. While a median is bad to use for small data set, it quickly overtakes the average for accuracy, as it is more resistant to outliers. Thatā€™s why itā€™s used for the grouped data, which you can group as you please.

I checked again, and youā€™re right, it does add up the values for the tests, which would otherwise be right. This means the benchmarkā€™s tests cannot be added to or changed and remain accurate.
I agree that the calculation on the list should reflect the number of samples made. they are normalized, though, so this means every score is simply 3 times what it should be, for a samples/minute test.

However, this would put us in a bit of a bind, as the current benchmark does still test a good number of blenderā€™s capabilities, and its main use (that of comparing many different hardware components using the same criteria) would be jeopardized if you suddenly change the valuation of tests, how theyā€™re calculated, or the tests themselves. best case- a script can be run to retroactively recalculate the scores based on the existing data.

I suggest you make a new thread and ping francesco about it, as heā€™s big on UI correctness and technical accuracy and Iā€™ll pitch in for getting CUDA back on the menu- but in a few days if possible. the team is working double-time to get the latest version out for full release as we speak and probably needs a day or two of rest after!

ping me if you make that thread. Iā€™d like to see what comes of it- as noted above, changing a benchmark basically means throwing out all previous data and accepting that some of the older cards may not receive any tests, but I do think thereā€™s merit in a change. after all, the old victor scene was more demanding with hair, and none of them have a large amount of prominent volumetrics at the moment.

The benchmark exists as a way to compare how fast GPUs render in Cycles, for users to choose a GPU. Itā€™s not meant as a general benchmark to for GPU performance, and the absolute number has no meaning beyond comparing between different GPUs for the same Blender version.

If someone wants to run benchmarks and draw broader conclusions they can try but itā€™s difficult to account for all the factors and not something we design the benchmark for.

5 Likes