CSV Import Node
I have been thinking about this for a while, I think it’s about time I get started. sharing a rough outline on how this node would function, for the first implementation I’ll be keeping it simple.
Output of the node
A bit weird starting directly with the output but by nailing this down first it makes things clear. The CSV Import node would output a PointCloud
where the headers (first row) will signify the attributes of a point and each row (except the first row) would be individual points in the point cloud. Geometry Nodes have a good support of point clouds and point cloud operations can naturally be chained after the import CSV node.
Columns and their data type
Before parsing the full CSV, a pre-parse step would go over the first row, to figure out the headers/attributes. the second row would be parsed to figure out the data type where the we will try to parse the value, following a precedence order
of supported data types
.
- supported data types (I just came up with this list while writing this post out, nothing is concrete, need to figure out required data types)
- int 8/16/32/64
- float 32/64
- string
- precedence order (similarly need to figure out a ‘good’ precedence order)
this would be order in which we will try data types and break on the first successful pass
User overrides
Looking at the above parsing logic the first question that comes to the mind, “what if I want to use float64
but float32
has higher precedence”. well for this the CSV import nodes would allow overrides (most probably from the n-panel for the node).
Essentially the user overrides would be a map of column name
→ data type
, during parsing the second row we will check this map before trying to figure out the data type. if the column name is there in the map the parser would only try that single data type and fail by notifying the user if there is an error.
I also thought of adding
null/void
as a data type telling the parser to ignore that column and not import the data for that column . this can be helpful when working with large csv files and save up on processing time and memory
I believe this implementation is simple and straightforward and can function as a good starting point. looking for feedback and any other ideas