Thoughts on making Cycles into a spectral renderer

The issue is that as indirect bounces accumulate, instead of moving “out” toward the primaries, the mixtures move “out” toward the spectral sample point locus. This means that the mixtures become problematic immediately, and the compression has to cover the entire locus.

This is a very challenging and yet unsolved problem. Spectral path tracing has made the issue of gamut mapping the footprint, that amounts to the entire spectral locus, a larger dilemma. And gamut mapping the “up and down” volume is challenging enough!

7 Likes

This post might seem a bit abrupt but I just said this in another thread

and I wanted to include a set of images but second thought this is not really related to that thread so I am posting them here.
This is what we currently get from the Spectral branch with blue light on blue cube:


And here is what TCAMv2 does:

Wow, it just looks amazing. It still looks a bit purple but it is so subtle now. Just sadly it’s not open source.

6 Likes

Just looked up this thread… Also had a little bit discussion with @troy_s a bit earlier and got a bit more confused with the whole color thing :sweat_smile: . It’s nice to see the progress and everything. I’ll have to test it myself. It’s not quite obvious to see what the current plan is, I’ll take it as it being “not working with all cycles x features but a lot is already working”.

Why filmic gets involved as well? It’s supposed to be a post process color mapping? how is it related to spectral rendering?

Hi @ChengduLittleA, great to hear you’re interested!

Filmic is relevant here because it is very easy to create colours outside of sRGB in spectral rendering engines. We need a way to take those ‘wide-gamut’ colours, and bring them into the destination colour space (usually sRGB) without impacting the look of the image too much. Standard Filmic was not designed to handle colours outside of sRGB, so putting such colours through it results in less-than-ideal results.

Already, the results of spectral cycles can be exported as EXR and hand-processed, but we need a better default for displaying the results.

6 Likes

So there are two kinds of rendering involved, first one being the rendering of Cycles or Eevee etc., this is the rendering from 3D scene data to Open Domain/Scene Referred light data. In RGB renderers, the light data is tristimulus data, while in spectral rendering, the light data is real light data with wavelengths etc.

Note that there is no a single monitor that can directly display the result of the first kind of rendering, in RGB rendering in Rec. 709/sRGB gamut, the problem is only about intensity, As I have said in another thread, you can have Nishita sky texture using real world’s sun strength but your monitor can never emit the same power as the sun.

The second kind of rendering is the “View Transforms” like Filmic, which take Open Domain data and render them into an image your monitor can display.

However, with spectral rendering, there is the second problem besides intensity. It is gamut. A quote from Troy in this thread not long ago:

The gamut of spectral rendering, because of its nature of dealing with wavelengths instead of RGB Rec.709 gamut, the Open Domain result from the first kind of rendering would have the gamut of the entire visible spectrum.

The problem now, if I understand correctly, is that most gamut mapping approaches only deal with mapping from a certain colorspace to another, but not from the entire visible light range to Rec.709/sRGB gamut. As you can see, although TCAMv2 is much better in dealing with wide gamut in my previous post, the out-of-display-gamut blue is actually still skewing to purple. And that is actually currently the best option we have.

I believe this quote from Mark summerizes the whole thing:

1 Like

It might seem like pedantry, but it’s worth noting that nothing is “real” here. It’s just another model that marches closer to some other ideas of how such a model should behave.

It might seem logical to suggest that if we render in the “same primaries” as the display, it’s not quite as simple as solely being about the “intensities”. That could be a sane starting point of viewing the problem, however the idea of “appearance” of the tristimulus values is very tricky here. For example, as we increase the “strength” of a tristimulus position, the “colourfulness” must increase in a correlated manner. This can be challenging to contain to strictly “intensity” in this example!

And if we get too hung up on appearance, we forget about the medium and how the medium expresses content. There is no solution without working backwards from the medium of expression in my mind.

They are both the same really! Consider transforms as being nothing more than some series of transformative processing that tries to convert tristimulus, or spectral, into an image.

More explicitly: Electromagnetic radiation, or any model loosely similar to it, is not an image. Further, an image is not merely a simplistic transformation of stimulus. That stuff in the EXR? It’s tristimulus data, not an image!

It’s arguably deeper than this too! TCAM v2 makes some really sane design choices in the pragmatic sense; given that appearance modelling is still subject to all sorts of complexities and unsolved facets, and given that the output can vary extensively across different mediums, TCAM v2 attempts to focus on “chromaticity linear” output. This is sane, and “better” given the contexts listed. However, being sane given some constraints means that there is still much work to be done on the front of image formation.

It is simply unsolved to greater or lesser solution degrees. TCAM V2 is a sane design solution give the constraints.

From my vantage, specific facets of the human visual system must be prioritized when considering image formation. Specifically, the rabbit hole I’ve been chasing now for far too long is notions of “brightness”, which trace all the way back to black and white film. It’s a tricky as hell surface, hence why there’s no direct solution just yet. Hopefully sooner rather than later. The idea of “brightness” has dimensions that support ideas of “colourfulness” and as such, are critically important.

Layering on wider gamuts and other nonsense isn’t helping things, at all, as we can easily see that if we take an open domain tristimulus render using the exact primaries of the display. Even this simple example remains unsolved in any satisfactory manner. Anyone who professes to show a “solution” to this basic problem is rather easy to refute. So if we can’t solve BT.709 rendering in any reliable manner, what’s the betting we are knowledgable to solve wider ranges of chromatic content? Close to zero.

2 Likes

I’m even more confused…

Maybe it’s me just needing a visually “not obviously weird” result. E.g. I just want a light to behave just a little bit closer to spectral mixing, I don’t even care what kind of mapping or primaries or spectrum sampling method is used, the only thing I want to achieve is a yellow light on a green object won’t result in a greyish tint.

RGB mixing is fast, but that is only remotely physically meaningful if, I mean only [ if ] there’s just that three specific wavelength in the scene (and your display isn’t using laser leds either). So any method that have multiple wavelength in-between is much closer to real life lighting experience. So from the look of this project, the algorithm is principally fine in this aspect.

The point on exposure(?)/sun light/sky thing is a valid point as well, but be it an artistic tool, you simply adjust everything till it looks nice enough, isn’t that the point of using subjective(?) stimulus as the way of thinking?

Mapping after rendering like filmic stuff isn’t gonna solve any of the problem, in the case of this “greenish tint” problem, it will make all the same greyish color into that greenish color, not hard to understand that being a problem, no matter how wide the “gamunt” is.

I’ve been following @troy_s for a while now. They have quite some in-depth research on this aspect. I don’t think I need to dig that deep to get a visually satisfactory result. I’ll keep follow along the project :smiley:

1 Like

The challenge can be seen clearly here, if we focus on the seemingly simple statement.

You are describing a colour. That means that you are describing the sensation of a colour. This is sometimes referred to as the “perceptual” facet of colour. This facet is nowhere in the tristimulus data.

That is the crux of the issue.

The actual problem here is that we are all bound by the current medium we are looking at.

If we have data that represents spectral stimulus, the meaning of that stimulus cannot be expressed at the display.

This leads to two further questions:

  1. What should we see in a formed image at the tail end of a medium?
  2. What does data that represents spectral stimulus mean to a medium?

That becomes a much larger issue when we consider that the idea of larger gamuts would hope to hold some idea of a consistency of image formation across different mediums. The image shouldn’t “appear different”^1 across different mediums. Note that tricky word “appear” again.

The dumb data is just dumb data; it doesn’t give us any hint as to how to form an image in a medium. This is the depth of engineering that was lost when electronic sensors and digital mediums entered the mix.

The larger point I would stress is that even if we had a massive Uber Theatre 2000 with 100 emitters per pixel, and a dynamic range that is infinite, the goal is not simple simulation of the stimulus data. It never has been!

This is vastly more challenging that it seems, and doubly so when we tackle the consistency across image mediums.

This is the crux of the problem in our era. Everyone on a MacBook Pro or an EDR display or out in print requires that the creative choices carry across to different mediums. Otherwise you’d need to author one image for every medium!

Further still, if it were all simply about “adjusting everything” then there’s no problem to begin with; render in sRGB primaries and simply tweak to what you see coming out of the display!

Sadly, it’s a tad broader of a problem than that.

——

  1. Subject to image formation versus image replication intention within the medium capabilities.
2 Likes

The two further questions you brought up makes sense, my understanding is that even if I have this image that holds this so called spectral data, it doesn’t mean or connects to any real world unit or how pixels should be illuminated. (Right? If so, then could there be let’s say an arbitrary mapping that specifies “what the hell is a 1.0”? Or if that’s not what you meant then I don’t think I understood correctly).

One thing tho at least for the current state of image technology, it’s almost a must to have image making being dictated on the final medium, which is dumb I know, but I don’t really see a case where e.g. a emmisive medium can be translated to a reflective medium and “give the same visual feeling”? (Maybe yes then probably you need to specify all related physical properties in those mediums as well as the viewing environment?)

For the “sensation of a color” thing… I do agree that plain tristimulus data doesn’t carry the “sensation”, but there’s also the fact that just like white balance in photos, you can shift the whole thing around and a red will appear a red because “the relative chroma(? I’m really bad with tech names) of those color patches stays”? (Or… then I may assume under current color tech, this transformation doesn’t preserve that kind of relationship in any meaningful way which then becomes a problem?)

(And then do we have a definition of what this tool, which supposedly should enable perpetual color translation onto all mediums, does?)

I still don’t think I fully understand the problem but still quite curious on it… Hope you guys don’t find me irritating because my head doesn’t take technical terms that well. :sweat_smile:

100% this! And once we dig around this area for a bit, the implications are sort of fascinating!

That’s part of the complexity.

In an additive tristimulus encoding system such as RGB, there’s obviously three channels. 100% doesn’t actually mean much with respect to what the hell we are trying to do regarding forming the image. 100% with respect to what?

A super simple example would be to consider a simple default cube illuminated entirely with a BT.709 blue point light very near the corner of the cube, such that it illuminates three visible sides.

If that light is moderately “bright”, how is the image we engineer related to the stimulus?

The beauty of this ridiculously simple example is that it cuts through a boatload of fascinating questions:

  • Should the corner very nearest the light be pure BT.709 blue?
  • What if we have two shots where the subsequent one has a brighter value for the light, how would the two images formed look different?
  • How do the ratios of the tristimulus data relate to the ratios presented at the medium? Why?

Etc.

These are precisely the sort of lingering questions that flesh out the overall ideas. With that said, there are arguably many more lower hanging questions that have tremendous impact on the resulting image before we get into refinements such as viewing conditions!

We can get a taste of those massive impacts by working through the ridiculously goofy example above. In each case we will likely find a heck of a lot of nuanced depth from the ridiculous example!

Absolutely!

This loops back to the idea that we are not entirely obsessed with the human visual system. We are in fact creatively privileging specific facets. What those are, how they interact, and how we transform them between mediums is still an unsolved problem.

Sadly, this problem is arguably more unsolved since we lost Kodak and Fuji, who both had incredible teams of minds researching actual image formation engineering! That incredible era, and most of the relevant research, has been somewhat lost in terms of general familiarity. The good news is that there are signs that video games are picking up the breadcrumb trail!

The idea is, as best as I can tell, exactly the subject of “ratios” and having the proper domains to express those ratios. You are also 100% correct to infer that this is largely an unsolved problem.

Most of what is out there is purely peddled garbage, which amounts to some magician standing on the stage waving their hands.

What has always been inspiring about the Blender crowd is that a disproportionate audience ratio actually cares very deeply about these sorts of subjects. We need to keep kindling those flames!

Asking questions is something that takes a good deal of courage in this era, somewhat sadly. Hopefully we can all ask more questions, and perhaps work our way to better answers than what we currently have. Let me stress, it’s a low bar.

Thanks for the explaination!

For the “pure blue light on a corner of a cube” example, I have some assumptions but not necessarily understood it in full, here’s my thoughts:

  • if we were to specify numerically, that say a certain blue light with a given spectrum, size, and a lumin density of some unit value, sitting at a given position relative a unit sized cube with a perfectly matte and specified reflectivity, then a given lens f-stop and sensor iso/din, and a specified exposure time, and it viewing position, finally we define a specific measure point on a film is supposed to be rgb(0,0,1) in electric signal, aaand let’s put “why this value” aside for a moment, does this idea/extent of specifity even give a start point of representing the rest of the color electrically?
    • if not, then can having numerous “specified points” give a start point?
    • is this kind of specifity even close to perpetual experience?
      • if the answer is a no, then consider the experience in film photography, how much of our perpetual experience is shaped by “how analog films behave”?
      • then also, is there any research on how current “flawed” rgb manipulation model affect how people think “what a brighter blue looks like”?
      • an additional problem from this is that it’s not even a easy thing to mentally describe/compare extreme chroma and brightness in our visual experiences (or anything sensory), this makes the previous problem a lot harder to tackle because then we ourselves can’t really trust our gut " Oh yeah this is definitely right-er"…

A thought on film: I believe analog format is generally easier to specify because you can’t really control how chemicals react, basically it’s like “alright this formula gives a white result when I shot this white target” (I mean, yeah of course there’s much finer control and stuff, but you get the idea). The thing for digital is that there’s no such thing naturally happening, the fact that you can freely specify the values is causing us trouble.

Also, my understanding is that film has a very wide dynamic range, analog thing really don’t have a hard “clamp” , like it goes very near a full overblown but it can sort of wiggle there. Electrically, nah, you reach 100% voltage and boom you stuck. (Or maybe a “very long tailed logarithm response” can do it?")

This also goes into the “film experience”. As how much of our mental experience think “Oh this kind of very high brightness will lead to a orange ish look” being actually we knowing film behaving this way, and you know human eyes don’t tend to shift color like that, and hell you don’t even look at bright target for any meaningful time before burn in on your retina, so human experience on that part isn’t reliable.

Then the problem becomes, what do we use as a thing to compare to? (Then this thing will likely become our mental visual cognition model or something?)

edit:

Additionally, the few papers I grasped recently has shown the “perpetual brightness across the spectrum” (or whatever that curve is called) thing is largely sampled from a lot of human tests, do we even consider the fact that some people have Tetrachromacy condition where they have an additional red cone cell somewhere at shorter wavelength than normal red. (And all sorts of eye conditions that we don’t generally have a understanding of how precisely it’s affecting luma/chroma sensitivity) so this is another problem of sampling people…

(I don’t really read a lot of related papers, so if I missed anything please specify. Thanks!)

It would not be “1” in film, because film had density and was subtractive. As with paint. That means that “no density” means nothing on the film, just as with nothing on the canvas. And with the opposite? It would be thickest density or paint, which in the case of film means no light is getting through.

Beyond that, it avoids asking the question and details of the endless engineering that went into the film; it’s not just some photochemical reaction. Think about X-Ray film, UV, or IR, or you name it. Each completely engineered, right down to the rates of change and electromagnetic sensitivity.

The “Why” in each case is critical.

Again, this misses why. As explained above, the fact is that even if we could represent the stimulus 1:1, we wouldn’t. It’s flatly wrong to think about this as simply “get the stimulus out of the medium” which is why the “scene” versus “display” dichotomy is rather myopic, and grossly overlooks far more important questions.

Aesthetic vernacular is a real thing. However, it is sane to see a division broadly between subtractive mediums such as film and paint, and the ghastly trap of additive digital RGB. It is easy to overlook, and in fact, as above, many folks miss out on some critical difference.

Try it. It will quickly become a point worth considering. It’s a fun experiment!

Again though, no one sits down with watercolours and think about some form of mimesis; the medium is limited and because of those limitations, magic happens. Black and white film isn’t an interesting medium in spite of the lack of colour, but again, because of it.

Completely false, as per the above examples. Every single facet of film was engineered, from spectral sensitivity, to rate of change, to nature of design such as creative or technical / analytical, to the couplers and DIR / DIAR… it was an incredible series of engineering from the ground up, with a significant body of research specifically dedicated to image formation. Jones, Judd, MacAdam, and a litany of many other researchers explored this very subject.

It is an all-too-frequent misconception to believe that it’s all physics and science and stuff! Indeed it was photochemical, but the engineering was extreme.

But we aren’t “stuck” are we? We are stuck with crappy ideas and zero to nothing in terms of engineering. When the best that the Academy can barf up is ACES, you have a glimpse as to the dearth of thought on this subject, and the trove of knowledge forgotten.

Too much of this subject is buried in the human visual system, which of course plays a role. A larger role is the medium and the negotiation to an image.

And in terms of digital additive RGB, we are in the dark ages.

I think I have a little bit idea of what you are talking now… it’s really not how to make our displays show as close to physical reality, the problem is actually:

  1. we lack of a definition/concept of how this specific medium (in this case the LCD display) should or should not look like? And we don’t often think why are we using a certain algorithm for anything? Just like we kinda know certain paints and film have a specific look but we don’t think about it about computer images?
  2. Our display as a device is woefully lacking in terms of engineering in the one task that is to display (thinking about CRTs even have those color powders extensively engineered), unloading everything on software compensation where it’s fundamentally wrong because you simply can’t adjust on a crappy physical base?

If I’m understanding those above pointscorrectly, then the task is much more intertwined… what would be a first thing we can bring changes to?

And thinking about the grey boundary thing recently, would be “a color transformation function who respects and keeps all relative chroma/grey condition (?forgive the terms)” be a first step given our existing tech?

Or this may also tie in the perpetual “L” coordinate in respect to high saturation and stuff?

I understood film may be a bad analogy to emmisive medium, but my point was, even arbitrarily, you kinda need to at least define a physical reference point for any the numerical representation to make any sense right… so there will some how be a more or less “standardized” measure, but this might also not be preferred because the displays are a complete mess?

  • User-1 asks a question.
    • troy answers with 2 questions.
  • User-1 asks 2 questions in response to those “questions as answers”.
    • troy answers with 4 questions (2 for each).
      .
      .
      .

A few years later…

  • 100 users ask 1 question.
    • troy answers with 200 questions.

As much as I am not a fan of the “perfect is the enemy of good” saying, it seems “perfectly” fitting this scenario.

Instead of working with what we have/know right now, make an initial “attempt”, and improve it little by little (as we acquire more knowledge). It “looks like” we are trying to answer all the “how ? what if ?” even if we don’t have the answers “yet”, and we end up deep down multiple rabbit holes, to the point where we forget why we entered them in the first place…where is the exit btw ? o_O

Sorry in advance if I offended somebody, but I had to let it out somewhere and this topic seemed like the “perfect” victim.

I was wondering how Digital cameras working,especially how the light gets stored as values.
Here are interesting papers i found

Standard for Characterization of Image
Sensors and Cameras

I think in these papers are many usefull insights,how the photons of light get to a digital value.

I don’t really mind questions, as long as they could bring us on to a same page.

@pixelgrip i believe this is a more fundamental problem than how digital image sensor work… E.g. you have any digital camera pointed to some colored light it’s immediatly over blown while eyes/films responds perfectly. It’s hard to convince anyone with this simple “electrical clamp” I believe the problem @troy_s is referring is related to this.

I’m very interested in this research, although I’m not as technical or whatever, so if someone is more experineced, I’d very much hear what they think and if I or any other people understood it further, it helps to bring changes.

Are you sure?Modern Digital Cameras with enough bit depth can capture almost every picture you can imaging.Its about how much light falls on the sensor and how much you expose the data.
Your Eyes close or narrow its pupils at bright light, like a Camera its aperture, to controls how much light comes in.
The classic film has wavelength sensitive layers with silvernitrate where you get a negative,Eg if you overexpose your film,everything is black or dense from the nitrate.

The reason for posting the paper is to understand how digital cameras handle the problem with high dynamic range.Interesting that colormanagement is not a big topic for the image capturing with its raw data in the beginning.Except maybe that different RGB filter systems as bayer mosaic or foveon as layer RGB layer is used often.
The amount to dynamic range in a camera is limited typical by the bitdepth.

With the camera papers,i had the idea if you treat the render like a digital camera.
In Blender we have presets for cameras eg the sensor sizes and image formats.If you now would use the primary RGB filters the sensors are uses(maybe with CM),then with fitting Fstop and exposure settings you should getting the same result for raw data.

For brighter wavelength i found two answers.One is that wavelength and speed of light are constant for hue.If you increase the brightness,you increase the photons only at same hue.(the wave amplitude is increased at the same wavelength and speed)

The second is for blackbody blue has a higher energy as red.Like with fireflames.

About brightness i found this interesting short article.
this part

The term brightness should be used only for non-quantitative references, e.g. in the context of physiological sensations; that is recommended by the U.S. Federal Standard 1037C, for example. For actual quantitative references, one should usually use one of the following terms:

  • The radiance is defined as optical power (radiant flux) per unit area and solid angle; its units are W cm−2 sr−1. This quantity is used in radiometry, where the physical properties of light and not its visual perception are relevant.
  • The luminance is the luminous flux per unit area and solid angle, with units of candela per square meter (cd/m2). This is a quantity of photometry, where the spectral response of the human eye is taken into account.

What Troy has been talking about is that people don’t realize that the HDR radiometric data captured by the digital sensor is not an image, it is just an “electromagnetic radiation” information, the raw data you form an image from. He is not denying the dynamic range potential of digital imaging, he is suggesting that most people mistaken the “electromagnetic radiation” information as image, therefore they ignore the step of the real image formation (ie the second kind of rendering I mentioned). Then you get people viewing EXRs using Standard sRGB Inverse EOTF, some see the color skews and blame EXR format for skewing the color etc., without understanding that EXR only contains the result of the first kind of rendering, “electromagnetic radiation”, and the image you see still needs to be formed separately. He has been talking about the lack of attention of the most essential part which causese all the relevant researches to be back in the film era.

Therefore, studying how digital sensors capture their dynamic range etc would not help here. What we need is studying about the image formation from those electromagnetic radiation data.

At least this is how I understand the whole thing.

I completly agree to the first block you wrote.

Why not? Troy was asking

If you have the same primary Filter useing specific Digital camera in your Blender camera with same Primarys CM.Then in theory it should render the blue object as if its was taken with the camera.

Sure.Think about that the raw files from a camera very similar to a exr rendering,or it should similar depending how accurate the renderengine with its materials and setup was used.

Remember if there is no Filter or foveon layer or a prism system to get the trichromatic seperation in a digital camera,you would have a monochromatic picture.

Since we have always RGB channels in a EXR render we have a sort of Filtering or CM already.

The possibility to change those CM to Camera “Filter or CM” would be nice.Until now we can not change or select anything of this.