How does Cycles render volumes that fast?

First of all: I do have no idea if thats the right forum to ask this question but it came to my mind so here we go…
Im coding my own voxel/volume/smoke renderer and I did pretty much everything that I know about optimization, but Im still serveral time slower than cycles.
Im just curious about how cycles handles volumes / what performance improvements were done etc.

This is not directly related to Cycles, but you can find a lot of information on efficient volume rendering here:

