The “official” fast way of extracting mesh data from Python is using foreach_get().
However, foreach_get only follows the fast path (a memcpy in rna_raw_access) when the Python API and the underlying C data layout match, which in the case of the Mesh API is not too often.
Most Mesh data in Blender is stored in SoA fashion (MCol, MLoopUV, FreestyleEdge, CustomDataLayer…) but the Python API usually exposes it as AoS (MeshVertices, MeshEdges, MeshFaces…), so in most cases, foreach_get has to retrieve the data through Python, at a great performance cost.
The alternative is to retrieve pointers with as_pointer(), and use C/C++ and the internal Blender structs to copy the data.
I do this in Malt, but similar approaches are followed by other render engines like LuxCore and appleseed.
(Cycles follows this path too, but it also has easier access to the internal Blender API)
This is much faster, but not every mesh buffer type is easily accessible with this method.
For full access, a copy of the Mesh DNA headers is needed, but this is also harder to maintain since the Blender DNA data layout can change very often.
If the CustomData_get_layer functions were exposed to Python, only some (rarely changed) structs from DNA_meshdata_types.h would be needed by addons and the maintenance cost would be much lower.
I think even something like Mesh.get_data_as_pointer(domain, data_type) where domain is an enum value from {VERTEX, EDGE, POLY, LOOP}, data_type is an enum value from CustomDataType and the return value is a void pointer would be enough.
Does this seem feasible?
I’m open to contributing the patch myself if I get the ok.