Because no one mentioned it directly: Often enough, it is not about the programming or scripting language you use or on what hardware you run it. The most important thing is the algorithm.
If the algorithm which can solve the problem has a bad runtime complexity, then it does not matter if you write perfect C++ code or a sloppy Python script. If the time it takes to compute the result grows exponentially to the input size, then it might work for very small datasets, but you will reach the point at which it will take forever (and that is not an exaggeration) very soon, even on the best computer in the world. This is likely to occur for algorithms with quadratic runtime complexity too, but it might still complete with medium sized datasets in an acceptable time frame.
You ideally want linear runtime complexity. If you double the input data size, then it should not take more than double the time. Constant time would be best, but unlikely to be achievable in the domain of data processing (hash indexes have amortized constant time complexity, but you still need to account for I/O bound operations reading/writing data which will scale linear at best). Something in the ballpark of linear or logarithmic complexity is possible for importers in general I would say, but only if there is no operation that takes more time on each call as data gets imported - also see https://blender.stackexchange.com/questions/7358/python-performance-with-blender-operators/7360#7360
Carlo said that the Split Geometry feature is very costly in terms of performance. My guess is that the algorithm to achieve this has bad runtime complexity. It might be possible to trade memory for speed using index data structures to make it operate faster, but there are also problems for which there is no algorithm known to mankind which could speed up the task it carries out down to a reasonable time given the dataset that has to be processed by it. Multi-threading does not help here at all. You need a fast algorithm, and if you want to utilize multiple cores, then the algorithm needs to allow to split the work into reasonable sub-tasks which can be carried out in parallel. You should always benchmark such a multi-threaded solution against a single-thread variant, because multi-threading comes at a cost (a lot of overhead for coordination, i.e. work splitting and result merging) and may turn out to be slower in the end - but this can be heavily influenced by the hardware you run it on, the compiler and build settings used and the actual implementation.
Aside from all that, you should also consider the amount of time it takes to write a basic importer using a naive algorithm in a few hours versus the time it would take to write an importer as good as possible, using all kinds of optimizations, over a period of several months. Is it worth to spend a lot of time to cut down on import time (or pay someone to do it for you) or can you also just sit and wait a bit longer on each import? Maybe buy better hardware to save a few minutes instead of spending thousands of dollars on a professional programmer for the perfect importer? If the actual bottleneck turns out to be the Blender API, then it won’t help much anyway.
TL;DR: If an implementation uses a very naive algorithm or has a flawed logic, then it may not even matter if the code is written in Python, C++ or something else. Regarding Python vs. Cython performance and multi-threading, the question should be if it’s even worth to re-implement the existing addons, or if the used algorithms can be improved for much larger gains at lower costs.