GSoC 2019: Fast Import and Export

I wrote a quick an dirty C++ obj importer last year (@someonewithpc has the code, but it’ll probably less work to write a new one than to clean that thing up) and memory usage was much much lower than the python one.

importing sanmiguel low poly obj from http://casual-effects.com/data/index.html (628 megs) (Great site to get some datasets to benchmark with btw @someonewithpc)

language time in seconds peak memory
c++ 11.2 1.7gb
python 321.9 10.3gb
2 Likes

Since I saw the first commits in this branch, I added a daily updated Win64 build of the Fast IO branch over here on Graphicall.

5 Likes

Accidentally reading this, I’m wondering if the python importers are slow because of python’s nature or because the python code itself or even the algorithms invovled never have been optimized after profiling.

And another in between solution if the answer to my first one is definitely that python’s execution speed is the limit, might be Cython which could also end up being a very reasonable balance between “hackability”, reuse of existing code and speeding up bottlenecks with the performance that one usually can get by using C.

1 Like

to be honest, the only “hacking” i’ve done on the importers is try to make them faster because I have work that needs to get done and can’t wait 45+ minutes for an OBJ to load lol

To the point about speed vs memory tradeoffs:
Often (though not always) what helps one also helps the other.
For example, if (as I suspect) the Python code is slowed by lots of splitting lines into strings (which involves allocating memory and copying substrings into the copy), making lists, etc., then C/C++ code can avoid much of that – going faster and using less memory – by keeping the input in its original form and using pointers into that (say, with a length) to represent strings internally. Like LazyDodo, I also prototyped a faster importer once, and just using that trick made it a lot faster and use less memory.

Of course, other times there really is a time/space tradeoff. For instance, having extra data structures to allow fast lookup of things rather than having to go through a long list to look up things typically is worth it (if the lists are long-ish) but takes more memory.

Yep, that was my thinking too. But I’m adding an option to skip deduplicating normals and uvs (and I guess I should add one for deduping vertices, it would be easy), if you’d rather use less memory. This is for exporting, though. I haven’t looked too much into the importing side, and it seems that that is where most people have problems, contrary to what I thought…

One fundamental design question for import would be: do you read the whole file into memory (or memory map it), or try to process in large hunks? The former is a little easier to program, and if you wanted to use the point-into-the-original-data trick mentioned earlier, may be necessary if strings need to persist after a line is processed. But it may also be slower, and even though this is virtual memory, may slow things down if the file size much exceeds physical memory size. There’s lots of discussion on this topic to be found on the web, with soundest advice being: try a test of both ways if you can on a modern computer and see which wins.

1 Like

In my experience, memory mapping may work well on local disks, but if you use it to read files from e.g. Windows network share you can hit cases where it makes things orders of magnitude slower. Benchmarking this for all relevant real world cases is tricky.

If memory usage of reading the whole file is a concern (which I think is usually not the case even for big files), then e.g. manual reading in chunks of 100s of MB may give more predictable performance.

1 Like

I don’t think peak memory usage is that big of a concern, however using 10.3gb of memory to deal with what is essentially a 600mb text file does rate pretty high on my ‘something is screwy’-radar. We can and without a doubt should do better there.

Right, that’s what I mean, having that 600MB in memory is not a problem usually. Or at least not the main problem that we have now.

Is it possible to break the slow parts of the import process down into abstract concepts that could be performed in C++ land and leave the bulk of the import process in python? I ask because, while it will be awesome to have faster OBJ and STL importing, other importers will always end up behind the curve (IE: FBX, etc).

That’s part of a stretch goal, but it should be doable. But it would just be better to port them too, imo

I’ve posted my proposal on the wiki, finally.

My concern is that this might not work for certain formats, but it’s just speculation! For example, a format might require that there are no loose vertices (not part of any face) and that vertices are not shared among faces. To determine if a vertex is used by multiple faces it is not sufficient to look at one set of verts at a time (iterative) unless you build up additional data structures which vertices you saw being used by faces already - if I’m not mistaken. I just hope that the framework permits to copy data or create additional state if absolutely necessary.

If their FBX implementation works very well, then I guess yes? Maybe you can assess if it’s compatible with the framework at all?

To me it sounds like a real UX issue if you were to select multiple files in different formats, then the file dialog would close and for each file you would be represented a popup to select the desired importer and configure the import settings (and wait for the actual import in between). You probably would want to batch configure by format upfront, but still be able to abort the process. If one import fails, you would probably want to continue anyway. I don’t see a pleasant solution for the interaction yet.

If you mean a single importer being partially implemented in C++ and partially in Python, then I guess the answer is no - even if technically possible, it would be a performance killer to convert the data structures back and forth. But of course not all importers will be rewritten in C++ and the existing Python importers will keep working I assume.

Is the new importer going to import/export .obj sequences? Blender currently has the ability to EXPORT .obj’s, but no easy way to import them. Thanks.

1 Like

I don’t know if this is the right place to ask, but I think an option to stop importing or exporting via a escape key or something like that would be very convenient.

Several times I got screwed by exporting an entire scene because I didn’t check the “selected only”. The only way is to kill the blender process, and get a auto save, but this is not convenient.

3 Likes

I agree with this, however in addition I would suggest that “selected only” should be enabled by default for the export operators. I think it’s the more intuitive behaviour - most actions in Blender only affect the selection (or only the active object).

7 Likes

Thank you very much. I would really appreciate if you could smuggle proper alembic support (all attrs) plus maybe a live link option (so like streaming, no data preloading).

Best!

2 Likes

Is there any possibility to improve DXF/DWG import in this GSOC? Just asking because usually DXF/DWG files have A LOT of data and they are so slow importing that sometimes they even hang Blender.
The same thing happens with FBX.

(It may not be in the goals for this Gsoc, I’m just asking)

Cheers!

1 Like

You can create data structures that can be read by C++ and Python. So you don’t have to convert anything.

1 Like