OpenAPI for services (Extensions & Online Asset Libraries)

sybren · November 29, 2024, 2:54pm

Hello folks,

TL;DR: I think OpenAPI might be nice for Blender to use, but it likely means a binary Python package that needs bundling with Blender.

Feel free to scroll down to the “Open Questions” section below if you just want to know the questions I have & join in the discussion. The rest of this post illustrates the issues I’ve seen Blender face when dealing with JSON APIs, and my ideas about why OpenAPI would be nice.

Recently I started to look into the technical side of the Online Assets project. For that I’ve also investigated a bit how the Extensions system deals with communication between Blender and the extensions server.

In short, the JSON data is parsed “by hand” in Python code. There is no schema file, and the little code that validates response from the server is all hand-written. Same for JSON-handling C++ code (by the Asset Browser).

This approach allows for quickly building something that “works”, but it makes it hard to then lift that to something that’s easy to understand and maintain. In my experience this happens a lot when JSON data is parsed: typically the code ends up shoveling dictionaries from function to function, and it’s unknown (or at least hard to figure out) what is actually in there.

Concrete Examples

Here are some examples of JSON parsing in Blender. I don’t want to shame anyone, all the code was written by people who had limited time and a lot on their plate.

def pkg_repo_data_from_json_or_error(json_data: dict[str, Any]) -> PkgRepoData | str:
    # ... snipped for brevity ...

    if not isinstance((blocklist := json_data.get("blocklist", [])), list):
        return "expected \"blocklist\" to be a list"
    for item in blocklist:
        if not isinstance(item, dict):
            return "expected \"blocklist\" contain dictionary items"

    # ... snipped for brevity ...

    return PkgRepoData(version=version, blocklist=blocklist, data=data)

Although the above code does try to put some semantics in the PkgRepoData class, the blocklist and data fields are still just dictionaries. And this is actually where 99% of the information in this particular JSON document is stored.

static void init_indexer_entry_from_value(FileIndexerEntry &indexer_entry,
                                          const DictionaryValue &entry)
{
  // ... snipped for brevity ...
  if (const std::optional<StringRef> value = entry.lookup_str(ATTRIBUTE_ENTRIES_DESCRIPTION)) {
    asset_data->description = BLI_strdupn(value->data(), value->size());
  }
}

The above code takes a DictionaryValue, so it’s unknown what is expected in there until you read the function body. The caller of this function also gets a const DictionaryValue &value parameter, and at the top level it turns out to be return value of a Value::as_dictionary_value() call.

That value is declared in the public API of that class. To actually understand what should be fed into it, you’d have to dive multiple private/static functions deep into the code. This makes it not only hard to use, but also hard to find what code needs to change if that JSON data ever changes format.

My Investigation

As you can see, with this “it works” approach there is little time spent on making it actually clear to understand what the code is doing, and what the data is it is operating on. This is also why I want to address this topic, because people always have limited time, and always have a lot on their plate. I want to take this opportunity to adopt a way of working that makes it easy to get clear, unambiguous code.

One of the many standards for creating APIs over HTTP is OpenAPI. It is at the core of Flamenco, and used for all communication between Flamenco Manager, Worker, the Manager’s web frontend, and the Blender add-on. For me, the advantages are clear:

Contract-first approach. This gives a single source of truth, with all the clarity that this brings. Especially when others are expected to also offer online asset libraries that Blender can interact with.
Code generators available for multiple languages. After writing the specification file, it’s easy to generate code, removing the need to write boilerplate code all the time.
The generated code mirrors the schema. No more passing dict[str, Any] in your Python code, but rather explicitly typed objects.
Depending on the generator, the JSON returned by the server is validated against the schema. Even though extensions.blender.org is a Blender-managed server, 3rd parties can also create such servers, and this expands when we include online asset libraries in the mix.
The generated Python client code could be published as package on the Python Package Index, making it easy for others to interact with our OpenAPI services.

My Proposal

If we are to move forward with OpenAPI, my proposal would be:

Generate Python code only. Because OpenAPI is meant to also generate the client, and not just the data model, generators for C++ are tightly bound to specific HTTP libraries.
Track the generated code in Git, just like hand-written code. That way Blender can be rebuilt without needing any OpenAPI tooling installed. These files are expected to change very little once development of the feature is done, anyway. This also makes it simple to see the effects on the code when our OpenAPI specification file changes, and IDEs can do code completion as normal, making the surrounding code easier to write.

Tracking the generated code in Git has worked quite well for Flamenco.

Open Questions

Now to the biggest open question: which OpenAPI generator would we use?

Of the three OpenAPI code generators I’ve investigated, two (OpenAPITools/openapi-generator, MarcoMuellner/openapi-python-generator) use Pydantic (for validation, type annotation checking/handling, etc.). This is a very common library in the Python ecosystem, so it’s not really a surprise that they are leaning onto it. At its core there’s a compiled module, and this is always something hairy to deal with. There’s pre-built wheel files for all our supported platforms, though.

The third code generator (openapi-generators/openapi-python-client) relies on the attrs package for that, which is pure Python. Their repository seems active, but the maintainer is still calling out for help, they document themselves as “work in progress” and “not supporting all of OpenAPI”, and some fairly trivial bugs are still open. So I don’t know how much we want to rely on it.

So that’s my question for my fellow developers, mostly the platform maintainers: what’s your view on adding more Python packages, one of which is a binary one?

And the second question: Is OpenAPI really the thing to use here? Or are there better solutions? I’ve also lookat at Protobuf (#129626) which we could use from Python or directly from C++ once files are downloaded. But having a binary format “on the wire” also has downsides (mostly: not human-writable or human-verifyable), and of course the Python library for that is also binary.

lorenzoangeli · December 1, 2024, 6:29pm

For a different kind of project I’ve ended up using OpenAPI Generator through the docker container to generate the end user sdk (python, js, go, etc…)

hope it helps.
L.

sybren · December 2, 2024, 10:54am

Thanks for the idea. I’m not too worried about the code generating part, that’s done by only a few developers who actually choose to work on this project. My concern is mostly about the Python runtime dependency (Pydantic), which will always be needed by Blender if we are to work with the generated code.

lorenzoangeli · December 2, 2024, 4:10pm

That comes as part of the module (python) dependency (from my pyproject.toml)

[tool.poetry.dependencies]
python = “^3.8”
urllib3 = “>= 1.25.3 < 3.0.0”
python-dateutil = “>= 2.8.2”
pydantic = “>= 2”
typing-extensions = “>= 4.7.1”

so will get installed when you install the SDK python module.

One suggestion, remember to set the operationId for each method of the schema, so API will come out clean and redeable

sergey · December 2, 2024, 4:42pm

Thanks for the detailed description!

I have a few thoughts after reading it. There is no specific order to them, and they might or might not be related. Here we go

I agree that having a clear schema is very helpful. But then there is this question about the tool: it needs to be offloading burden, not making it harder. If it is hard to get things setup, up and running, then its probably not unfair to ask: is it the right tool we’re choosing? And we shouldn’t forget that it should also be easy for people to maintain and build on top. Otherwise any tool will just fade away. Basically think of it: would it be something developers will keep naturally using after you’re not paying 100% of attention after the end of the project.

Another question is about this binary Python package. Is it something that is needed at runtime, or only at a build time? It is not that intuitive to me that you have something code-generated, but still need a binary package (or any other library for that matter) to use the generated code.

How big is the package anyway?

This an similar tools (like TypeAPI, and also things you’ve mentioned in the post about C/C++ and OpenAPI) seems to be all generating client that takes care of the client-server communication. In a way it seems odd. Regression testing, dependency injection, mock setups all seems to be rather hard to hook up. Is it just quick of the designs, or are we missing something? Somehow it feels it should be a separate layer.

Maybe it worth mentioning why JSON validation (instead of the full REST API generator) is not considered? Maybe it is a better compromise as it solves some major issues of type-less schema-less json, while not leading to “bloated” client or development environment?
I’d imagine one point against it would be the fact that validation then would need to be implemented in both sides (client and server).

Who is the source of truth for the OpenAPI schema in your vision? Is it on the server side repo, or on in the blender.git ? Maybe it can be on the server side, and then a lot of things are easier on the client: we just copy some generated python files and are good?

sybren · December 2, 2024, 5:03pm

I agree. My hope is still that we can use the MarcoMuellner generator, as that would avoid having to install a JRE to run the generator. Among other things, that depends on their answer to my question on their tracker.

Yes, it’s needed at runtime. It’s used to do the validation of the received JSON, and to convert the JSON-dict into explicit Python classes.

How big is the package anyway?

On Linux, it adds ~11 MB, on macOS/arm64 ~12 MB.

This is by design. OpenAPI is meant to generate the client & the server stubs as well as the data model. In the Python client code there is a decent separation between the code that handles the HTTP stuff and the data model. I’ve already used the generated code to construct JSON that’s saved to a file on disk, and then loaded from that file & converted in Python classes again. No HTTP necessary, but still the advantage of clear code, JSON validation, etc.

JSON validation is part of OpenAPI. Fortunatley OpenAPI actually builds up from JSON-Schema, a standard for validating JSON. One advantage of OpenAPI is that it also generates the code for us, reducing our dependency on hand-writing (and maintaining) more boilerplate.

The source of truth is the OpenAPI spec file: a YAML file that (in my vision) is hand-crafted. It’s what gets fed into the code generators.

We could generate the Python client code in a project separated from Blender. It can then even be published on the Python Package Index and/or bundled with Blender as a wheel. Not sure if that’s the way we want to go, though. It wouldn’t solve the runtime dependencies, as that code would still depend on Pydantic.

Jean-Silas · December 2, 2024, 6:32pm

How many network requests are we actually looking at, and how will those numbers scale when folks try to access absurdly large libraries? There’s a theoretical threshold here where the performance benefit of pydantic handling its core logic in Rust is worth the package size, but it isn’t clear when/where that threshold would be hit.

Overall, I’d say that reaching out to the folks at Ynput would be the most productive thing to do. They’re in the best position to comment on request validation at-scale in python for VFX-specific workflows.

sybren · December 3, 2024, 10:51am

This API is not meant for the huge asset libraries that some websites offer. It is meant for smaller libraries, that can be served simply as static files.

sergey · December 3, 2024, 2:50pm

Oh man, that’s like orders of magnitude more than what was needed to fly to the moon!
Is it before xz compression or after?

It is weird that you need all this code to “just” validate the received JSON. On another hand it’s not THAT much compared to a lot of other things we’ve been adding to Blender.

If it allows us to have a better quality and easy to develop online aspects of Blender, it probably worth it.

But still asking myself whyyy its so big

Sure, otherwise we’d not consider it, right

What I meant is that maybe with some reduced amount of requirements to the parametric search of libraries we can get some other alternatives we can look into.

Like, if we say: validation is essential, http codegen not so much, then maybe there is a library that we can just add without too much discussions about it because it’ll be so small.
The point about generating boiler plate for converting JSON to real objects is valid, but maybe we can do it in less than 11 megabytes of code, because, how hard could it be to just assign some properties and arrays once the schema is validated?

If dependencies are needed at runtime then indeed is not SO relevant where the YAML file is stored.

Pydantic is not that big though, the wheel is 456 KiB, unpacked is about 1.6 MiB.
What are those 11 MiB consists of?

Jean-Silas · December 3, 2024, 4:40pm

Treat “absurdly large” as a stand in for “the largest asset library you’ll ever use in performance benchmarks.” My question was about how the number of requests per second scales as the size of a library increases — are you dealing with bursts of requests, or sustained request rates?

sybren · December 6, 2024, 10:22am

It’s the on-disk size once installed. In this case in my venv-deps-test/lib/python3.12/site-packages, but it would be equal in size when bundled with Blender.

But still asking myself whyyy its so big

It’s mostly _pydantic_core.cpython-312-x86_64-linux-gnu.so (5 MB) and Pydantic’s Python code (3.9 MB).

JSON validation is part of OpenAPI.

Sure, otherwise we’d not consider it, right

uhu uhu uhu

What I meant is that maybe with some reduced amount of requirements to the parametric search of libraries we can get some other alternatives we can look into.

Like, if we say: validation is essential, http codegen not so much, then maybe there is a library that we can just add without too much discussions about it because it’ll be so small.

I’ll see what I can find. Pydantic is used for the most-useful-to-us part of OpenAPI though, and it’s a really popular framework, so I’m not surprised that 2 of the 3 OpenAPI generators I found use it in their generated code.

Pydantic is not that big though, the wheel is 456 KiB, unpacked is about 1.6 MiB.
What are those 11 MiB consists of?

That’s what tricked me too, at first I even thought Pydantic was pure Python. But one of its dependencies is “Pydantic-Core”, which is 5.4 MB on disk.

The way I got to the 11 MB was (and I just did this again to verify):

$ python3 -m venv venv-deps-test
$ . ./venv-deps-test/bin/activate.fish
$ du -sh venv-deps-test
16M     venv-deps-test/
$ pip install python-dateutil typing-extensions pydantic
$ du -sh venv-deps-test
du -sh venv-deps-test/
26M     venv-deps-test/

So maybe it was 10 MB. Tiny!

sybren · December 6, 2024, 10:26am

Requests per second is quite irrelevant for the “static files first” approach we’re looking at right now. Blender will do an update check once per week or so, and download the entire index if it updated.

Even with a million people running Blender and updating once per week, that’s one request every 1.7 seconds. Or 0.6 requests/sec. Ok, with a million people running Blender even once per week is getting closed a request every second

The initial use case is for things like the Blender Essentials pack, which isn’t going to change often.

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

OpenAPI for services (Extensions & Online Asset Libraries)

Concrete Examples

My Investigation

My Proposal

Open Questions