Realtime GPU smoke simulation

Just finished the CPU version of my solver. The good thing → it uses the whole CPU for almost all the time, because everything is parallel. The (bad) thing → it simulates the domain of resolution 400 (the exact same scene as the explosion in animation before) at about 18.5sec/frame on a 32 thread machine compared to 0.114sec/frame on GPU. The good thing → the CPU is quite old as it’s the second generation threadripper so newer ones will be faster and the GPU is an RTX 3090 so the comparison is not fair by any means.
The potential good thing → there is no “adaptive domain” feature for now so it could be much faster

And for the comparison with Mantaflow I have no idea for now as my computer is quite busy with the current test as I want to compare the results from the CPU and GPU to check if the solver is working correctly.

10 Likes

AFAIK Mantaflow has some problems with CPU utilization. There was some improvement patch recently but even after it, the CPU usage isn’t as good as in your solver.

I don’t know much your work differs from concepts used in Manta. It would be good to compare exact same simulation on both to establish performance baseline. There is a lot of settings which can alter baking time without changing the smoke shape significantly.

Some time ago I did 300 res smoke simulation in Manta for one of my projects. I used piroclastic flow as loose reference. Domain size was around 20x30x10 meters. Emitter size was 18x1x1 meters and was placed around 4 meters above the ground. Smoke had around hundred collision objects (hidden in the render below). Upres factor 4, noise and some vorticity.
One more thing - one of the collision meshes had massive size about 60 milion polys, so this also impacted smoke baking time.
Actually scratch that. I checked the scene and baking was done with much simplified clone of this giant mesh, so nothing unusual.

Only half of the sim (seen in lower left corner) was visible in camera in final render, so I didn’t care about residual smoke artifacts left in the place of the emitter (elevated red part in the upper right).

Baking 50 frames took almost 2 days on a Threadripper 3970X. CPU was running at average around 25%-30%. So 18,5 sec per frame is off the effing charts. If your solver can bake sim this size and complexity on a CPU even in the matter of couple hours that is phenomenal.

Different angle:

10 Likes

Alright. I’ve just finished CPU adaptive domain feature :smiley: . It took me almost 3h from the concept to the final product :frowning: however it works and to be honest quite nicely. It manages to reduce the 400^3 domain simulation from the explosion above from constant 18.5sec/frame to about 7 on average (about 1-2 in the beginning and later there is a growth). The CPU usage is again 100% constant so that’s great

10 Likes

Hi, thanks for your feedback :smiley: . As I looked some time ago the main problem of Mantaflow is that it’s highly object oriented and that makes it so slow (in my opinion) but there may be other problems too. And I’ve played with that solver a lot in the past (with some ok results) however even with some tricks it’s a bit laggy. Maybe tomorrow I will make a little comparison scene between those 2 solvers. That will require that I write the blender particle system importer to JFlow first but it shouldn’t be so difficult.

And about your scene it’s hard to tell how much of performance hit are all those colliders as such high-poly one could be the problem but we can test it when I add object collision to my solver

3 Likes

Rechecked and the high-poly mesh wasn’t used for baking finally. I used optimized copy - around 1k poly. I just forgot about it and needed to check this in the file.

1 Like

Alright, so we shall see in the future.
(can’t tell when exactly as creating collisions is difficult and sometimes I’m as sharp as a round rubber ball for physiotherapy)

I just got about 12 bluescreens just to get to the point when I realized again that CUDA does not like std::vector anywhere in the code. Like I had to do a struct wrapper for the vector and then pass it to the rest of the code and it works but without it it does not.
That’s quite annoying

1 Like

Alright, importing blender particles works fine, also with subsampling. Now I’m simulating a simple explosion scene in JFlow in 300^3 domain (it takes about 1.6sec/frame to simulate on the CPU). Later I’ll try to make the same looking results in Mantaflow to compare the speed.
On the GPU it is 0.075sec/frame

6 Likes

Aaand Mantaflow with the same particle system and settings set to match JFlow results with 300^3 resolution and adaptive domain turned on took about 20sec/frame.

The animation is currently being rendered so later I’ll upload a short comparison video, however looking at raw data for the 300^3 domain JFlow is around 14x faster than Mantaflow on the CPU and 267x faster using the GPU. In terms of quality both are stable, however without the final renders I cannot tell which one is better looking.

Edit: I’ve just checked the final Mantaflow render and the adaptive domain ruined the whole sim with bugs :frowning: . So I would have to turn it off for the final sim, which would take around 60sec/frame :upside_down_face: while JFlows adaptive domain works fine even tho it was a 3h implementation. So it is even faster then.

9 Likes

Simple Mantaflow vs JFlow comparison video

24 Likes

Looking at your numbers, it’s pretty impressive, I must say. Gonna be blunt :wink: I think the Mantaflow sim looks more pleasing and physically correct atm. For me it comes down to the secondary shapes, the flowery heads of the broccoli, if you will. I think they call it shredding in a new, and very popular, sim application :wink: I realize it’s early days tho, and you’re making phenomenal strides.

5 Likes

The overall shape of Mantaflow sim is also a bit better in my opinion, also I like the look of fire in still frames more. However in motion I prefer mine a bit more in terms of stability and consistancy. As I realized also after the fact is that I totally messed up the noise settings in my simulator so I will upload the improved version soon (now I’m tweaking them and the results are much better but still not ideal). Also the settings between both solvers are very different so I will play a bit more with them to make them look more similar. So in summary I think both has some advantages at this point.

*But I will work to finally make mine a bit better.

6 Likes

Looking forward to an update. This, to me, is the most exciting thing going on in Blender atm. Cannot thank you enough! :slight_smile:

2 Likes

It does look very nice. As far as I understand it this will be an external utility which can export to blender?

Or are you planning on adding a new fluid sim inside blender?

The first is also cool, but the second would be even cooler imho :wink:

2 Likes

If everything goes according to the plan then both. I mean first it would be separate as blender for now is not able to handle so much data in reasonable time and I’m too stupid to implement all necessary improvements at this moment, however when the solver will be ready I will sit down and try to make it as an addon or internal. But first I want to make it more useful

8 Likes

My advice is to not to wait too long with publishing your code somewhere. If you keep a project unpublished it can be tempting to keep tweaking it forever. At least that’s how it works for me :-D.

But it’s your project, so you can just ignore my prattling…

1 Like

Thanks for the advice :smiley:
The last solver was public from the beginning, however it did not help at all with the speed of development and sometimes it even slowed it down as I had to write more comments than I usually do so this time I’m trying the opposite.
I mean not like 100% opposite, but I want to have at least GUI when I publish it

3 Likes

Ok, so it’s almost 5AM here so no sleeping. I’ve just wanted to find a bug in the code to make the simulation more stable, however I had an idea soooooo now there is an adaptive domain for the GPU as well and that means simulating few times faster than before, which was already fast. And the bug is also gone. So win win situation :slight_smile: .
I’m going to bed for now and later I’ll do some more tests

22 Likes

Here is a test of the new shader I’ve made for the explosions.
It’s the resultion of 600^3. Simulated on the GPU with the new adaptive domain feature. It took around 0.06sec/frame

28 Likes

That looks amazing! The new shader really helps to sell it!

My only slight feedback would be that the smoke tends to disappear very quickly after it is created, and it ends up looking a little unnatural, as if it just evaporates very suddenly…

Otherwise, it’s pretty incredible, great job!

2 Likes