Developer Discussion: Handling Errors

In the past few days, the topic of handling errors came up. It seemed like there was interest from multiple developers to have a discussion, so here is a thread!

The Error Problem

Currently in Blender, errors are handled in multiple ways:

  • Returning an “empty value” (e.g. nullptr, -1, std::nullopt) for all sorts of errors.
  • Returning a special enum that encodes the result (e.g. eV3DProjStatus) or even a special value (e.g. #define IS_CLIPPED 12000)
  • Using BLI_assert to “catch” errors

We do have ways of reporting errors to the user (e.g. with the BKE_report systems) but this system is often used in a way that makes the error not very useful. Here is a classic example (can be found everywhere):

if (something == nullptr) {
   BKE_report(reports, RPT_ERROR, "<something> not found");
}

or

if (try_do_something() == false) {
   BKE_report(reports, RPT_ERROR, "Could not <do something>!");
}

If a developer wanted to report an error with a bit more information, they likely would have to resort to implementing a custom enum or something of that sort.

With most of Blender using C++ now, surely we can do better than this!

An Error Type

This is my personal proposal, but I’m sure other developer will have other ideas and I welcome them to post theirs in this thread!

I think most of us agree that we want some type that bundles a value_type and an error_type together. Here is one idea that I saw other projects use:

template<typename T, typename E>
class [[nodiscard]] ErrorOr {
   ...
 private:
   std::variant<T, E> value_or_error_;
};

And here is some pseudocode on how you’d use it:

ErrorOr<MyData> do_something_that_can_fail() {
   MyData data;
   ...
   if (we_failed) {
      return Error::from_string("Oh no we failed!");
   }
   ...
   return data;
}

static void here_we_go() {
   ...
   ErrorOr<MyData> result = do_something_that_can_fail();
   if (result.is_error()) {
      BKE_report(reports, RPT_ERROR, result.error().message());
      return;
   }
   MyData data = result.value();
   ...
}

Now this pattern of checking for is_error() then getting the value is something that would be done a lot. Here is one way to make it (maybe) better:

Macros (I know what you’re thinking)

You can use a macro like this one to make things a bit nicer. Obviously this has pros and cons. with the example above you could use it like this:

ErrorOr<void> do_some_computation() {
   ...
   MyData data = TRY(do_something_that_can_fail());
   ...
}

So the TRY would give you the value if it’s not an error, otherwise it returns (from do_some_computation !) the error. You can imagine doing similar things, like asserting if there is an error, etc.

Now this would be straightforward to use in new code, but what if you’re dealing with old code that can’t handle errors yet (e.g. when it would be a pain to refactor the return type to ErrorOr) ?

Not Handling the Error

We could imagine having a function on the error type that allows us to explicitly ignore errors, when we can’t (yet) handle them.

Something like

...
MyData result = do_something_that_can_fail().release_value_and_ignore_errors();
...

Ideally this function would be easily searchable so that the code can be found and refactored at some point.

That’s it! Let me know your thoughts!

1 Like

We can indeed improve the error handing a lot!

One thing I’d separate is the error handling and error reporting. I think those are separate topics:

  • How function can return value, or clearly indicate that value could not be calculated because of reasons
  • How interface-facing code could report the internal error in the interface

The ErrorOr<MyData> is something close to what I believe is the good way forward. It is very similar to Abseil’s StatusOr<T>, and to Pigweed’s pw::Result<T>.

The comments I have about ErrorOr:

  • It sounds quite negative. Why Error is the first thing it mentions, making it sound that we actually expect thing to fail most of the time.
  • Often it is not really an error, but more of a status. Like, “hey, you reached end of file”, or “currently no data for you, but come back later, it might become available”.
2 Likes

I am in favour of the used pattern. It makes the error handling explicit. Is also close to how rust-lang deals with the issue. Here the object is Result<Value, Error> The error is often just an enum and in our case we could use it as it gives more information during debugging. In release builds we might want to compile it away.

1 Like

Not clear whether you want to open the exceptions can of worms here, but there is also the question about what to do about very rare but possible errors that we typically don’t check for (at least not consistently).
E.g., memory allocation fails
or more realistic scenario: some user tries to create more than 31 bits worth of some element and some int usage causes things to fail silently and catastrophically.

@filedescriptor’s proposal already covers a lot of what my proposal was going to.

I am very much in support of an ErrorOr-like approach to error handling in Blender, where errors are explicit return values. It’s the same approach as Rust, which has an equivalent Result type.

I’ve had a lot of experience with this error handling approach over the last 8-9 years coding in Rust. In my opinion and experience, it’s far more robust and easy to reason about than something like exceptions. It does introduce some additional boiler plate, but a lot of that can be mitigated, and it ends up being worth it.

With that said, Result is not the entirety of Rust’s error handling story. And I think there are some additional things that would be really good for us to steal/copy from that. Specifically, Rust has a double-pronged approach to error handling that I think is important, because ErrorOr/Result is not appropriate for all kinds of errors.

Two Different Kinds of Errors

Rust makes a distinction between two kinds of errors:

  • Expected errors. These are errors that occur as a normal part of program execution. For example, the user may try to open a file of the wrong file type, or that is corrupted. Or the user might try to insert a key on something that’s not animatable. These are expected cases that need to be handled.
  • Unexpected errors. These are violations of assumptions/invariants in the code, and thus don’t have a reasonable path for recovery or handling. For example, a function that assumes that the list given to it is never empty, and thus has no code path to handle that case, but which is then given an empty list.

Importantly, unexpected errors almost always indicate a bug somewhere in the code. In the empty list example, it might be that the function shouldn’t assume non-empty lists, or it might be that the calling code misunderstood the semantics of the function and is using it incorrectly. Or it might be something else. But reaching that state means there’s a bug somewhere, and normal operation of the software can’t be guaranteed to continue correctly.

Handling Unexpected Errors

Rust takes the stance that unexpected errors should simply abort the program.

This might sound extreme, but the rationale is that if you reach a truly unexpected state, it’s hard to know what will happen. You might corrupt user data, for example, which could actually be worse than losing the user’s work since the last save.

So for these kinds of errors, programmers are expected to use asserts. To use the non-empty-list example again: if a function really believes it will never be passed an empty list, and is therefore ignoring that case (and hopefully documents that fact in its doc comment), it should assert that the list is non-empty.

Returning something like a Result for these cases is not only overkill, and not only can it make code unnecessarily verbose, but it is also misleading because it gives the impression that such an error could/should be a normal part of correct program execution.

Blender already has BLI_assert for this kind of error, and I think it makes sense to continue using it for that. Rust also makes a distinction between normal asserts (which trigger in all builds, and represent critical invariants) and debug asserts (which only trigger in debug builds) that might be useful to adopt.

Handling Expected Errors

@filedescriptor already covered this well with his description of ErrorOr. Rust has an equivalent type Result<T, E>.

Just to resummarize: Result essentially amounts to a tagged union, and can be either T (the normal return type) or E (the error type), but never both. (Rust calls these “ok” and “error”, respectively.) This forces calling code to handle the possibility of an error if it wants access to the return value. Which is a really good thing.

Rust has a lot of nice features that make working with Result ergonomic. I strongly suspect not all of these can be reproduced in C++. But I think we can get at least some of them. The TRY macro that @filedescriptor highlighted is a good example (in Rust it’s the ? operator).

Turning Expected Errors Into Unexpected Errors

Higher-level code very often benefits from making assumptions that lower-level code doesn’t or can’t.

For example, say we have the following “lower level” function:

Result<int, ParseError> parse_unsigned_int(char *text) {
  // Try to parse the string into an integer.
}

The higher-level code that calls this may already know that the string being passed is a valid unsigned integer string, due to the additional context and knowledge it has at the higher level.

In these cases, the higher-level code essentially wants to “convert” the expected error of the lower level function into an unexpected error, because it knows that in this context it is indeed unexpected.

This can be done fairly straightforwardly with an assert:

auto int_result = parse_unsigned_int("123456");
BLI_assert(int_result.is_ok());
int my_int = int_result.get_ok();

However, this is pretty verbose if you’re doing it often. It also can’t be inlined into expressions, which forces the creation of temporary variables that clutter things up.

Rust provides various convenience methods on Result to handle cases like this. Here are some examples:

// Extracts the inner `T` value if ok, otherwise aborts.
// This essentially turns an otherwise expected error into an
// unexpected error.  It makes a lot of sense in cases where
// the context should ensure that the error case never happens.
int my_int = int_result.unwrap();

// Same as `unwrap()` above, except that no check is done for
// the error case at all, and thus is undefined behavior if
// it's an error.  This can make sense in performance-sensitive
// code.
int my_int = int_result.unwrap_unchecked();

// Extracts the inner `T` value if ok, otherwise returns a default
// of twelve.
int my_int = int_result.unwrap_or(12);

// The converse of `unwrap()`: extracts the inner error `E` if
// it's an error, otherwise aborts.
int my_int = int_result.unwrap_err();

In general, I recommend stealing a lot of things from Result’s API.

What If You Don’t Need a Value, Only An Error?

There are situations where you may want to indicate a possible error, but there is no “ok” value to return because conceptually the function has no return value.

In Rust this is addressed by the unit type (), which is simply a memberless tuple. Functions that might return an error but have no ok value to return can return Result<(), ErrorType>.

It might seem odd to use Result when there is no success value. Why not just return ErrorType directly, then?

The main reason is to allow uniform handling of errors. For example, Result has a lot of useful helper methods like unwrap() and is_ok() that can conveniently assert invariants, check success, etc. And things like a TRY macro will still work to propagate errors up the chain.

C++ can accomplish something like Rust’s unit type with a memberless struct. We could either provide one specifically for this purpose, or use std::monostate from C++17.

2 Likes

Yes I agree that error handling and error reporting are different discussions. This is meant to be only about the error handling part. And hopefully it would be a solution that could be used across all parts of blender :slight_smile:

Right, maybe ErrorOr is not the best name choice. The one thing I like about it is that it reads very naturaly: ErrorOr<T>this function returns an error or a T value. This doesn’t work so well for e.g. Result<T>.

I guess we could go a similar route to std and name it Expect<T>. I.e. the result of this function is expected to be a T value, (but could be an error or something else).

Yeah, that’s a good point.

I don’t think “status” is really a great name either, though, because that sounds like information that might be passed along with a successful return value as well.

IMO Result is a good name. And maybe we could name the error case “failure” instead of error, since that still clearly communicates that the function didn’t succeed, but without necessarily being an error in the deeper sense.

But I think this is kind of bike shedding. We do need to decide on names at some point, of course, but I think we first need to get aligned on the approach.

that sounds like information that might be passed along with a successful return value as well.

While it is not that much applicable in Blender, you can imagine that you want both status and data. Like, you get a message from a wire, but it fails a checksum check. It might be useful to return both message and status saying “hey look, we got something, but it might be a white noise, or could be a real thing with some minor data loss, we don’t know, do whatever with it”.

That being said, I do personally prefer name Result.

For the practical moving forward, I think it is more important to agree on the mechanism, and requirements, and figure out exact naming after that.

It does seem that generally we agree on the Result-type of approach. Which is good. What I am not sure yet is what would be the good definition of Error, on a conceptual level. Is it an error code? Is it an error code and some string message? That sort of things.

While it is not that much applicable in Blender, you can imagine that you want both status and data.

Absolutely. In Rust, at least, when there’s a failure but still questionable data that might be worth passing on, that would probably get passed as side data in the error/failure value.

What I am not sure yet is what would be the good definition of Error, on a conceptual level.

In Rust it’s arbitrary, which is why it’s a type parameter. Although in practice it’s usually an enum, possibly with side data. And ideally all error types are supposed to implement the Error trait, which provides methods for producing a printable string that describes the error, among other things.

I think it could make sense to do something similar, where we have an abstract base class that all error types are expected to derive from, which provides a base level of common functionality. The only annoying thing is that (as far as I know) you can’t do class inheritance for enums, so that would add annoying boiler plate when all the error otherwise needs to be is an enum. (In Rust you can implement traits on anything, including enums, so that’s not an issue.)

Conversely, I think it’s specifically not a good idea to enforce that all errors are literally the same type throughout the entire code base. Different parts of the code base likely need different error types with different error codes and different side data.

Right, I think this is the sensible thing to do. To me, this base class Error should be as small as possible. Custom errors can always tack on their own data if they want to. Maybe the only required data in an error is a std::string message_, so that it’s always possible to report the error. But I could also see an approach that would only store an int error_code_ for example.

Agreed. In fact, I think it’s best if it has no data members at all, and only defines methods.

For example:

Maybe the only required data in an error is a std::string message_, so that it’s always possible to report the error.

If the error string is a public data member, that would necessitate building it up front even if it never gets used. For errors in lower-level components that are called in tight loops, that could be unacceptably detrimental to performance, and would (justifiably) push people to use other error handling methods in those areas.

Whereas a method that returns an error string allows the string to be built as late as possible, on demand when actually needed. And then even lower-level, performance-critical components can use the same error handling approach as everywhere else.

Error types that want to could still build and store the error string in advanced. But it wouldn’t be required.

1 Like

Right, I think that we should have some of these error types built-in then, so that you don’t have to define a custom class just to get an error that has a message.

I think we might be talking about different things, then. (Edit: nevermind, my brain is foggy and I think I misinterpreted you. In any case, below is a rough proposal.)

I’m talking about an abstract base class that defines a minimal common interface for all (or at least most) error types. It might look something like this:

class Error {
public:
    // Returns a human-readable error message.
    virtual std::string error_message() const = 0;
}

It specifically doesn’t have data members, because it should be implementable by any and all concrete error types.

A concrete error might then look something like this:

class FileReadError: Error {
public:
    enum class Reason {
        FileDoesNotExist,
        PermissionDenied,
        WrongFileFormat,
        // Etc.
    };

    Reason reason;

    std::string error_message() const {
        switch reason {
            case Reason::DoesNotExist:
                return "File does not exist."
            case Reason::PermissionDenied:
                return "File cannot be read due to lack of permission."
            case Reason::WrongFileFormat:
                return "File has the wrong file format."
            // Etc.
        }
    }
}

(A real file IO error would likely store things like the file path as well, of course.)

I think that we should have some of these error types built-in then

We could have some kind of “general” error type, that only contains a message string, and no programmatically meaningful data:

class GeneralError: Error {
public:
    std::string message;

    GeneralError(std::string message): message(message) {}

    // Constructor that can take anything that inherits from Error, to
    // make it easy to propagate any error type.
    GeneralError(Error error): message(error.error_message()) {}

    std::string error_message() const {
        return message;
    }
}

But in general, my experience with this style of error handling is that you actually do want to create custom error types for different parts of a code base. And usually you want the core of those error types to be an enum, so that it’s clear what the different error cases are and it’s easy to branch on them.

I think if you want even very low level code uses the same error reporting mechanism, it is important that there is no virtual method. Having that makes the error type at least 8 bytes large (for the vtable) and copying it also becomes more tricky. Having a virtual method there (even if it is never called) likely has more overhead than using the Error trait in Rust.

I think something as simple as the following could also just work.

class FileReadError {
 public:
  enum class Reason {
    FileDoesNotExist,
    PermissionDenied,
    WrongFileFormat,
  };

  Reason reason;

  std::string to_string() const;
};

The name to_string is just used by convention (this convention is semantically similar to a trait). It allows error handling code to look the same everywhere and is easy to use in a template.

We can of course also provide a type-erased AnyError type that uses a virtual to_string method.

2 Likes

How would this proposed error type handle wrapping?

To throw another language in the mix of comparisons, Go has the concept of wrapping errors, where you can query “is this error of that type?” in a way that understands wrapping. So an end-of-file error can be wrapped to include more detailed information, and the code handling the error can call errors.Is(err, ErrEndOfFile) to understand that this is an EOF situation. Or errors.As(err, ErrEndOfFile) to get the actual wrapped error of that type.

This helps to keep error-returning and error-handling code decoupled, where the concrete error type doesn’t matter that much; errors can be handled based on what they wrap, rather than get very specific.

Unless all possible error types are known (which kind of defeats the purpose of custom error/result types), C++ discourages this. And I would argue for good reason.

If a function should return some MyCustomError result then I think it’s best that any function that handles this error also gets a MyCustomError from said function. Otherwise we’re basically checking if the type of an error when we should have known that type in the first place.

C++ allows you to try and dynamic_cast one thing to another, but honestly, it often results in over-engineered code. I’d avoid doing that.

Note that the AnyError type I mentioned previously could be implemented in a way that makes it possible to extract the original error from it again. This has some overhead that should be avoided in low-level code, but for higher level code that seems reasonable.

1 Like

I’m not arguing that we should be all-in with the Go approach and only return the one error type. What I’m asking about is what happens in this case:

  • A lower-level function returns a known error, for example a function operating on a file, returns a FileNotFound error. Since the caller knows which path the function was operating on, that path doesn’t need to be inlcuded in the error.
  • A higher-level function loops over several paths, calls the low-level function, and aborts whenever there was an error. Because here the caller needs to know which of the paths had the error, the FileNotFound error can be wrapped in another error type that does include this info.

In Go it would be possible to call errors.Is(combined_error, FileNotFound) to see if the wrapped error is of the type FileNotFound, or whether it was some other error. There’s also errors.As(combined_error, FileNotFound) which would actually return the low-level error object.

In order to check which kind of error was returned, you’d always use errors.Is() or errors.As(). This makes it possible to, at a later time, change the code to return a different kind of error, without having to change every place where the error is inspected. A concrete example is the one above, where in the initial version the FileNotFound error could have been returned, and later the error type that carries more information. Basically it makes growth / refactors easier.

Yeah I agree, let’s not go that route.

Oh, damn, right. C++. Thanks for catching that.

Having a virtual method there (even if it is never called) likely has more overhead than using the Error trait in Rust.

Yeah, Rust traits don’t embed anything into the types that implement them, unlike C++'s vtables.

I agree, in that case just going by convention probably makes the most sense here. It’s unfortunate, though.