Eevee internal limitation for extremely high resolution renders?

Why does Eevee require approximately the same amount of an internal resource regardless of whether you’re rendering the full camera view or a region? I’m not going to say this internal resource is memory because peak memory usage as displayed after the render is lower for a small region vs. the whole camera view.

I’d like to get our company’s marketing department over to Blender and off Maya because of the extremely slow imports/open times. Everything is looking good except there is the requirement to render things at high dpi(150dpi to 300dpi) for print material. This is where Blender falls down a bit. Even on an 11GB video card, I cannot get Blender Eevee to render the default cube at 1920x1080 @ 1000% or 19200x10800, it just never finishes(I’ve waited 30+ minutes while researching this). 950% works fine though on just about every computer(Windows & Linux) I’ve tried it on, with varying speed differences. My worst case scenario, something we’ve produced in the past, is a 20 foot by 10 foot pop-up display at 150dpi. That’s 36000x18000 pixels.

I thought maybe I could get around this by partitioning renders into render regions. So that 19200x10800 render divided by 4 would be equivalent to a 1920x1080 @ 500% for each region, which for a single screen render is actually quite fast(11.85 seconds). However, the region appears to hit the same internal limitation as 1920x1080 @ 1000% no matter what size the region is and doesn’t finish in a reasonable amount of time.

1920x1080 @ 975% is the highest resolution I’ve found that actually works.

1 Like

Hi Ted,
Eevee is a OpenGL tech, and as so, it as all the problems that any realtime viewport (aka games, other dcc viewports) as, using extremely high render textures, that aren’t support by OpenGL/DX/Vulkan (I put it all here, because is the same). So I would advice to use Cycles with Cuda or Optix, that is not bounded by this, since it uses compute shaders. It still is bounded by the GPU memory, but if I recall Cycle support memory offload, using normal RAM when it exceeds the GPU RAM.

Daniel

That wouldn’t explain why much smaller render regions are not working either, unless there’s a mistake with the allocator somewhere not taking into account the region size.

It would also be much easier to sell 1-2 minute renders vs. multi-hour renders.

1 Like

I don’t have an aswer to your question but as a work around you can use camera shift options to shift the plane to render, divide the image in a 3x3 use the center region as main “center” and animate the shift to the sides and top render smaller renders with the proper perspective without need to crop image.
imagen

About the original question, its a good one, why crop doesn’t work? . . .

Very interesting workaround but the values required for Shift X/Y don’t seem to be very intuitive, any tips for matching the original frame? edit: Oh, I see, looking through the camera view makes it more obvious, it’s going to deal with the aspect ratio.

edit2: It seems more complicated than just changing the camera Shift. You also need to change the focal length to adjust the frustum for the 1/3rd size(assuming 3x3) but then Shift values are back to not being obvious. I’m sure there’s some math, just need to find it.

I mentioned the crop because I did take into account the idea that by default rendering a region still includes the entire camera view, so that could be a GPU memory thing but I thought for sure checking Crop would cut that memory down. Guess not.

Cryengine had a way to made High Resolution screenshots that could be a workaround if it works automatically in blender.

In cryengine the viewport was rendered in little parts, maybe 9, and then the engine merge the images to make a single image output. This solve a lot of problems with memory in the graphic card and performance.

It can have problems with screenspace effects. But could be a perfect solution for a lot of works.

Good point, this would probably break Eevee’s ambient occlusion(at least in the sense when you stitch all the images together, it may appear wrong at the stitches).

But for example, Actually eevee have a workaround for that. The Overscan parameter. THat render in each render a few more space in each border to solve that problem.

I have done a comparative of the render time of a simple scene at different resolutions, 1024x1024 base resolution.

The best render performance was in 4098x4098 resolution. After that the performance drop a lot.

With a 4k tiles system can be reduced the render time a 70% in a 10kx10k image. Of course only on the paper, but probably the render time can be reduce a lot with this solution.

@fclem

1 Like

That’s the best part about the Eevee render times, scene complexity(object count) has hardly anything to do it with it. A scene with 51,000 objects renders almost as quickly as just the default cube.

I just need a way to get higher resolutions out of it.

Hi @TedMilker, I think @dgsantana made a good guess. Most likely its a hard GPU memory limit it’s running against. I think its not just slow, it think it won’t finish the render at all. Did any of your tests above 1000% finish rendering? And did all your testmachines have a 11GB vram gpu?

I tested with 6GB and 8GB ones too, they can go just as high as the 11GB machine: 1920x1080 @ 975%. It can actually go a little higher than that, last night I experimented with higher numbers, edging to just up over 200 million pixels but no further. I don’t have access to that file at the moment. I’ll post the number when I get home though.

It sure seems to me like there’s an internal buffer somewhere that’s hardcoded to 204,800,000 and I’m hitting the limit of it. Also, like I’ve said before, even trying to render a very small region of 1920x1080 @ 1000% will not finish, that should use nowhere near the GPU memory, which seems like a bug.

I’ve tried looking at Blender’s code but I can’t even figure out how Eevee gets the resolution_x, resolution_y, resolution_percentage from the scene when it renders. It’s tough with no real experience with Blender’s codebase, even with the diagrams they provide.

Existing bug on this: https://developer.blender.org/T70305

It hasn’t been looked at yet. The initial problem in that report was trying to use a smaller render % but Blender seems to still allocate the full resolution. It still ties in that it may be an OpenGL texture size limitation if Eevee is attempting to render to texture in some form. Many OpenGL implementations have a max 16k x 16k resolution I think.

Thanks for the link!

Funny, I saw a Maya command line render output the OpenGL max texture size of 16384x16384 while rendering today and thought that could be related to this. :slight_smile:

Yes might relate, not sure, 480MB sounds pretty low. But it might be a good starting point to ask there if it relates or if a separate report should be made.

Hmm, ok indeed strange that they all have exactly the same limit. I tested it here on a 8GB GPU on the latest 2.82 alpha and somewhere between 800% and 900% blender crashes.

I’m using Blender 2.80 release.

Just rendered the default scene(cube) at 1920x1080 @ 975% on two machines:

Windows 10, EVGA 8GB 2070 RTX: Time: 01:10.13 | Mem:4536.58M (0.00M, Peak: 8302.22M)

Ubuntu Linux 18.04, EVGA 6GB 1660 Ti Mini ITX OC: Time: 5:30.43 | Mem:4536.27M (0.00M, 8301.89M)
(Linux does not like this render, becomes unresponsive and throws up warnings about a program failing with no info)

My home machine is Windows 10 with an EVGA 11GB 1080Ti SC Black, I can post timing if needed but it’s similar to the 2070.

For OpenGL there is a limit at 16kx16k for textures, and in Eevee a render can use many render textures for intermediate results for screenspace effects, SSS, SSR, AO and so on, excluding of course the textures use for the shaders.

I think the solution for this is similiar to what @Alberto already say, is to use a tile rendering approach in a seamless way to the user, but that can be a lot of work to get the SS effects right.
I work with real time 360 in Unity all the time, and SS effects are almost not usable at all, since 360 in reality are 6 renders stitched together.

As for the region rendering, I may be wrong but there will always be an allocation of the full screen render texture, even if the resulting pixels are only in a small subset of them. For the region to work as you want in reality the “render” camera would need to be transformed (position, rotation, fov, …) to render just that part as if it was a full render and then present to the user that in right place in the viewport.

Ted, have you tried using Cycles with CUDA or OPTIX? It also very fast.

Sure it’s fast but it’s not that fast compared to Eevee. Hour and 40 minutes to render with Cycles+CUDA with 128 samples.

I submitted a bug but I’ll work on Nahuel’s idea of partitioning the frustum into 9 quadrants and rendering those individually in case there really is no way to get around allocating the entire frame(which seems like an awful, awful waste of space).

1 Like

@Alberto
I would like to remind you to redo these tests after the new optimization that Clement e Jeroen are developing and will be implemented

https://developer.blender.org/diffusion/B/history/tmp-overlay-engine/

EEVEE: GLSL Renderpasses

RenderViewport: Texture Format

Why not script it for a company I once worked for I devided the render area and eventually in compositor post I joined them again worked nice. My reason for it was making A2 size prints. I don’t have the script anymore but I remind it’s not that extreem
Devide and shift one can work in normal sizes while a script can do the final.
One could even distribute the render task to other pc in the network but that’s a boot more coding I did it over 4 PC’s back tthen