Hello folks,
TL;DR: I think OpenAPI might be nice for Blender to use, but it likely means a binary Python package that needs bundling with Blender.
Feel free to scroll down to the “Open Questions” section below if you just want to know the questions I have & join in the discussion. The rest of this post illustrates the issues I’ve seen Blender face when dealing with JSON APIs, and my ideas about why OpenAPI would be nice.
Recently I started to look into the technical side of the Online Assets project. For that I’ve also investigated a bit how the Extensions system deals with communication between Blender and the extensions server.
In short, the JSON data is parsed “by hand” in Python code. There is no schema file, and the little code that validates response from the server is all hand-written. Same for JSON-handling C++ code (by the Asset Browser).
This approach allows for quickly building something that “works”, but it makes it hard to then lift that to something that’s easy to understand and maintain. In my experience this happens a lot when JSON data is parsed: typically the code ends up shoveling dictionaries from function to function, and it’s unknown (or at least hard to figure out) what is actually in there.
Concrete Examples
Here are some examples of JSON parsing in Blender. I don’t want to shame anyone, all the code was written by people who had limited time and a lot on their plate.
def pkg_repo_data_from_json_or_error(json_data: dict[str, Any]) -> PkgRepoData | str:
# ... snipped for brevity ...
if not isinstance((blocklist := json_data.get("blocklist", [])), list):
return "expected \"blocklist\" to be a list"
for item in blocklist:
if not isinstance(item, dict):
return "expected \"blocklist\" contain dictionary items"
# ... snipped for brevity ...
return PkgRepoData(version=version, blocklist=blocklist, data=data)
Although the above code does try to put some semantics in the PkgRepoData
class, the blocklist
and data
fields are still just dictionaries. And this is actually where 99% of the information in this particular JSON document is stored.
static void init_indexer_entry_from_value(FileIndexerEntry &indexer_entry,
const DictionaryValue &entry)
{
// ... snipped for brevity ...
if (const std::optional<StringRef> value = entry.lookup_str(ATTRIBUTE_ENTRIES_DESCRIPTION)) {
asset_data->description = BLI_strdupn(value->data(), value->size());
}
}
The above code takes a DictionaryValue
, so it’s unknown what is expected in there until you read the function body. The caller of this function also gets a const DictionaryValue &value
parameter, and at the top level it turns out to be return value of a Value::as_dictionary_value()
call.
That value is declared in the public API of that class. To actually understand what should be fed into it, you’d have to dive multiple private/static functions deep into the code. This makes it not only hard to use, but also hard to find what code needs to change if that JSON data ever changes format.
My Investigation
As you can see, with this “it works” approach there is little time spent on making it actually clear to understand what the code is doing, and what the data is it is operating on. This is also why I want to address this topic, because people always have limited time, and always have a lot on their plate. I want to take this opportunity to adopt a way of working that makes it easy to get clear, unambiguous code.
One of the many standards for creating APIs over HTTP is OpenAPI. It is at the core of Flamenco, and used for all communication between Flamenco Manager, Worker, the Manager’s web frontend, and the Blender add-on. For me, the advantages are clear:
- Contract-first approach. This gives a single source of truth, with all the clarity that this brings. Especially when others are expected to also offer online asset libraries that Blender can interact with.
- Code generators available for multiple languages. After writing the specification file, it’s easy to generate code, removing the need to write boilerplate code all the time.
- The generated code mirrors the schema. No more passing
dict[str, Any]
in your Python code, but rather explicitly typed objects. - Depending on the generator, the JSON returned by the server is validated against the schema. Even though extensions.blender.org is a Blender-managed server, 3rd parties can also create such servers, and this expands when we include online asset libraries in the mix.
- The generated Python client code could be published as package on the Python Package Index, making it easy for others to interact with our OpenAPI services.
My Proposal
If we are to move forward with OpenAPI, my proposal would be:
- Generate Python code only. Because OpenAPI is meant to also generate the client, and not just the data model, generators for C++ are tightly bound to specific HTTP libraries.
- Track the generated code in Git, just like hand-written code. That way Blender can be rebuilt without needing any OpenAPI tooling installed. These files are expected to change very little once development of the feature is done, anyway. This also makes it simple to see the effects on the code when our OpenAPI specification file changes, and IDEs can do code completion as normal, making the surrounding code easier to write.
Tracking the generated code in Git has worked quite well for Flamenco.
Open Questions
Now to the biggest open question: which OpenAPI generator would we use?
Of the three OpenAPI code generators I’ve investigated, two (OpenAPITools/openapi-generator, MarcoMuellner/openapi-python-generator) use Pydantic (for validation, type annotation checking/handling, etc.). This is a very common library in the Python ecosystem, so it’s not really a surprise that they are leaning onto it. At its core there’s a compiled module, and this is always something hairy to deal with. There’s pre-built wheel files for all our supported platforms, though.
The third code generator (openapi-generators/openapi-python-client) relies on the attrs
package for that, which is pure Python. Their repository seems active, but the maintainer is still calling out for help, they document themselves as “work in progress” and “not supporting all of OpenAPI”, and some fairly trivial bugs are still open. So I don’t know how much we want to rely on it.
So that’s my question for my fellow developers, mostly the platform maintainers: what’s your view on adding more Python packages, one of which is a binary one?
And the second question: Is OpenAPI really the thing to use here? Or are there better solutions? I’ve also lookat at Protobuf (#129626) which we could use from Python or directly from C++ once files are downloaded. But having a binary format “on the wire” also has downsides (mostly: not human-writable or human-verifyable), and of course the Python library for that is also binary.