GSoC 2020 : Fast Importer and exporter for PLY And STL Format

Project Description

Currently in Blender, the import and export of the models of type STL and PLY with millions of vertices takes minutes. Since the importers and exporters are written in python, which is the main area of concern as python is slow when compared to statically typed language like C, C++. The goal of the project is to decrease the import-export time of the files with extension STL, PLY. These extensions are simple file formats to do and they can pave the way for some Complex extensions such as FBX.

Predesign:
The first significant improvement that I discovered is the memory mapped file read/write operation in c tends to do its job quite faster than the python file IO.

M/y mapped File IO
Note: edited benchmark due to major variation.

Parsing:
Another area where performance can be increased is parsing of STL or PLY.
The whole method can be ported to C for performance gain.
Import and export core logic for both the file format can be ported in high-performance C and can be optimized.
C allows direct parsing of the values like double unlike python where a line is read as a string and then casted to a double variable.
i.e.
In C:

fscanf (fptr, “%f”, &blocks);

In Python:

for i in line.split():

    if i.isdigit():

Because of extra step for converting value to Digit, the
C parsing is more efficient than its python equivalent. Since we are dealing with numbers this will be beneficial for us to read/write numbers natively fast.

File Specification for PLY contains the vertex count, face count we can speed up import time.

element vertex 8 { define “vertex” element, 8 of them in file }
element face 6

Advantages:
These functions ported code in C can quickly import and export high vertex-based meshes. User Experience can be increased with progress indicator.

End Result: C modules (that can load with Python inside Blender) to do the slow parts of importing and exporting(likely: file reading/writing and the parsing/formatting of raw text).

BIo
Name:

Shivendra Pratap Singh

Contact:
sps014 on blender.chat and developer.blender.org

3 Likes

I redid your benchmark, and i’m not seeing the 2x perf difference, on a 8x subdivided cube file ply export ( 524.222.597 bytes)

SDD

fgets                    :  2.48100s
python readlines         :  2.84170s

HDD

fgets                    : 17.61400s
python readlines         : 17.91237s

So while fgets is still fastest, it’s not by all that much.

When doing benchmarks like this be sure to be aware of any OS level file caching going on, without taking that out of the equation I got suspiciously good results from my HDD

HDD Without clearing the Filesystem cache first.

fgets                    :  2.44900s
python readlines         :  2.85088s

If you are on windows you can use this helper header to clear the “Stand By List” which will result in the flushing of the filesystem cache.

1 Like

There is a memory mapped approch ( on linux it will work for file size less than 2.1GB in size) , the idea is to load file in the memory and as you read deallocate read memory.
If ram is filled than paging taking place.On taking account of available ram we can facilitate user to load file faster.
There is tradeoff between space and time.
I have not fully researched this approach, there are visible flaws , like m/y wasting , i read this from an article.
Will love to hear feedback on this.

#include <stdio.h>
#include <stdlib.h>
#include <err.h>
#include <fcntl.h>
#include <sysexits.h>
#include <unistd.h>
int main()
{
int fd;
size_t bytes_read, bytes_expected = 100000000sizeof(double);
double data;
char infile = “file.dat”;
if ((fd = open(infile,O_RDONLY)) < 0)
err(EX_NOINPUT, “%s”, infile);
if ((data = malloc(bytes_expected)) == NULL)
err(EX_OSERR, “data malloc”);
bytes_read = read(fd, data, bytes_expected);
if (bytes_read != bytes_expected)
err(EX_DATAERR, “Read only %d of %d bytes”,
bytes_read, bytes_expected);
/
… operate on data … /
free(data);
exit(EX_OK);
}

I have not tested this code and due to various factors speed may differ.

1 Like

Sounds like a poor mans memory mapped file i did benchmark those too but left them out not to confuse things further than they need to.

SDD Reading 500M Text file (Hot read)
Boost_mem_mapped         :  0.18100
Boost_mem_mapped_istream :  3.67800
fgets                    :  2.48900
python readlines         :  2.89192

HDD Reading 500M text file (Hot read)
Boost_mem_mapped         :  0.18200
Boost_mem_mapped_istream :  3.69100
fgets                    :  2.44900
python readlines         :  2.85088

SDD Reading 500M Text file (FS Cache flushed)

Boost_mem_mapped         :  2.13100
Boost_mem_mapped_istream :  5.72900
fgets                    :  2.48100
python readlines         :  2.84170

HDD Reading 500M text file (FS Cache flushed)
Boost_mem_mapped         : 14.40200
Boost_mem_mapped_istream : 15.67800
fgets                    : 17.61400
python readlines         : 17.91237

test code (requires the header linked earlier)

#define _CRT_SECURE_NO_WARNINGS 1
#include <iostream>
#include <time.h>
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/iostreams/stream.hpp>             
#include <algorithm>  
#include <iostream>   
#include <Windows.h>
#include <string>
#include "MemHelpers.h"

//HDD
//std::string path = "f:\\downloads\\my.ply";
//SDD
std::string path = "c:\\my.ply";

double boost_mapped()
{
    boost::iostreams::mapped_file mmap(path, boost::iostreams::mapped_file::readonly);
    auto f = mmap.const_data();
    auto l = f + mmap.size();
    double t = clock();
    uintmax_t m_numLines = 0;
    while (f && f != l)
        if ((f = static_cast<const char*>(memchr(f, '\n', l - f))))
            m_numLines++, f++;
    t = clock() - t;
    t = t / (CLOCKS_PER_SEC);
    std::cout << "m_numLines = " << m_numLines << " in " << t << " seconds\n";
    return t;
}

double boost_mapped_istream()
{
    using boost::iostreams::mapped_file_source;
    using boost::iostreams::stream;
    mapped_file_source mmap(path);
    stream<mapped_file_source> is(mmap, std::ios::binary);

    std::string line;
    
    uintmax_t m_numLines = 0;
    double t = clock();
    while (std::getline(is, line))
    {
        m_numLines++;
    }
    t = clock() - t;
    t = t / (CLOCKS_PER_SEC);
    std::cout << "m_numLines = " << m_numLines << " in " << t << " seconds\n";
    return t;
}

double fgets_bench()
{
    FILE* F = fopen(path.c_str(), "rb");
    if (!F)
        return 0.0f;
    char buffer[64];
    double t = clock();
    uintmax_t m_numLines = 0;
    while (fgets(buffer, sizeof(buffer), F))
    {
        m_numLines++;
    }
    t = clock() - t;
    t = t / (CLOCKS_PER_SEC);
    printf("m_numLines = %lld in %.4f seconds\n", m_numLines, t);
    return t;
}

bool cache = true;
int main()
{
    MemHelpers::InitMemHelpers();
    if (cache) boost_mapped();
    if (!cache) MemHelpers::ClearStandbyList();
    printf("Boost_mem_mapped : %.5f\n", boost_mapped());
    if (!cache) MemHelpers::ClearStandbyList();
    printf("Boost_mem_mapped_istream : %.5f\n", boost_mapped_istream());
    if (!cache) MemHelpers::ClearStandbyList();
    printf("fgets : %.5f\n", fgets_bench());
    printf("press a key...\n");
    getchar();
    MemHelpers::ClearStandbyList(); //Wipe the FS Cache for the python test
    return 0;
}
1 Like

Read and write of any language invokes kernel read write function, we try to minimize the calls of top level calls for read ie. language call overhead.

We can’t use boost in Blender i guess for m/y mapped files.

Thanks , it was really helpful for my further understanding.

Perhaps, however there is a win32 implementation of mmap available in blender (which means mmap is available for all platforms), so you could do it without boost, i just used boost since it was easy for testing, i’m not expecting any big performance difference between boost and nmap. however since mem mapped files offer little perf increase over fgets and some rather large down sides i doubt it’s worth the hassle.

1 Like

Will try mmap, and will post my benchmarks with m/y mapping.

There was a GSOC project last year about fast I/O for obj, IIRC it didn’t make it in to the master. But I think you should still look into it!

Obj is a complex format to implement,animation,material data require a lot of work.hence the idea is take small step and implement STL and PLY.

That’s probably a good way to ensure you’ll be able to finish on time and do things well, and not fall into the same difficulties that caused the last project to fail.

1 Like

Please also read the discussion in Clarifications for Fast IO project; And the proposal about memory mapping files.

2 Likes

I have already started creating this project, i think completion of this task is not bound by my selection chances, i feel happy to contribute to Blender,and will further try to contribute regardless of my selection.

1 Like

I have Messaged you my draft kindly provide feedback.

This topic is dear to my heart! I do a lot of work with large (>5M) point clouds, and while my own c++ ply importer is fast, the entire pipeline slows when it comes time to import into Blender. Thank you for working on this!