Exception: Access Violation- where to get more information?

If you’ve done a lot of addon development or scripting in Blender, there is a very high probability that you’ve seen Blender crash to the desktop without any warning or error at all. If you’re running with a terminal open, you probably also know that this is almost always because of an Access Violation crash.

Hopefully, you know right away what caused it so you can go and fix it. But if you’re like me you work in a large studio with a lot of people writing a lot of different scripts, so you probably have non-technical artists coming and saying “Hey blender just crashed, I wasn’t doing anything in particular”. I’ve been able to repro the crash on their machine just by doing random things- but I haven’t been able to narrow down where it’s coming from or why it’s happening. I’ve tried running with the --debug-all flag and the ouput I got wasn’t super useful. It tells me there’s an access violation, okay great- WHAT script or operator caused it? what data was being accessed that no longer existed?? How are we supposed to debug these scripts in a large studio environment? These guys are running lots of different scripts and addons, I don’t know if it’s hardops causing it or something in-house and custom, or blender itself. How do I get more information on what is causing a crash?

You can start by building a debug build of Blender yourself. When you also enable ASAN (address sanitizer), you can already get a better idea of where the crash exactly happens in the C code.
To figure out which line of Python caused it, you can either attach a Python debugger to Blender, or just insert some print statements to narrow down the bug.

That’s certainly an option on my machine- but I suppose there are no options for when a random artist at my studio asks me to come to his desk and track down a problem? I don’t think I’ll be able to convince all of our artists to pull down the source from git and do a debug build, and replicating each artist’s individual setup so I can debug it on my machine is going to prove difficult as well. If these are the only options then that’s what I’ll have to do I suppose.

That said- I feel like this is a solvable problem with slightly more verbose logging with the debug-all flag enabled. it dumps out a lot of information but it’s still missing things. For example, one crash I was able to capture with the -debug-all flag on:

wm_event_do_handlers: Handling event
wmEvent type:224 / DEL, val:1 / PRESS,
shift:0, ctrl:0, alt:0, oskey:0, keymodifier:0,
mouse:(851,538), ascii:’ ‘, utf8:’’, keymap_idname:(null), pointer:0000021758D841F8
graph_id_tag_update: id=MECylinder.002 flags=GEOMETRY source=USER_EDIT
graph_id_tag_update: id=MECylinder.002 flags=GEOMETRY source=USER_EDIT
graph_id_tag_update: id=MECylinder.002 flags=SELECT source=USER_EDIT
graph_id_tag_update: id=MECylinder.002 flags=SELECT source=USER_EDIT
[SCScene :: View Layer]: Operation is entry point for update: GEOMETRY_EVAL()
[SCScene :: View Layer]: Operation is entry point for update: COPY_ON_WRITE()
[SCScene :: View Layer]: Operation is entry point for update: GEOMETRY_EVAL_DONE()
[SCScene :: View Layer]: Operation is entry point for update: GEOMETRY_SELECT_UPDATE()
[SCScene :: View Layer]: Accumulated recalc bits for OBCylinder.002: 8322
[SCScene :: View Layer]: Accumulated recalc bits for MECylinder.002: 8834
[SCScene :: View Layer]: deg_evaluate_copy_on_write on MECylinder.002 (0000021758EC70C8)
[SCScene :: View Layer]: deg_evaluate_copy_on_write on OBCylinder.002 (0000021758EE3278)
[SCScene :: View Layer]: BKE_object_data_select_update on MECylinder.002 (0000021758EC70C8)
[SCScene :: View Layer]: BKE_mesh_eval_geometry on MECylinder.002 (0000021758EC70C8)
[SCScene :: View Layer]: BKE_object_eval_uber_data on OBCylinder.002 (0000021758EE3278)
[SCScene :: View Layer]: BKE_object_handle_data_update on OBCylinder.002 (0000021758EE3278)
[SCScene :: View Layer]: BKE_object_select_update on OBCylinder.002 (0000021758EE3278)
[SCScene :: View Layer]: BKE_object_data_select_update on MECylinder.002 (0000021758EC70C8)
Depsgraph updated in 0.030844 seconds.
Error : EXCEPTION_ACCESS_VIOLATION
Address : 0x00007FF7E4DB3F35
Module : E:\blender-2.80\blender.exe

The events that preceeded this were innocuous selection/orbit camera events. On this user’s machine, his delete key is bound to a context-sensitive delete operator which has been in use at our studio since 2.78 days and has never failed before, so the final ‘event’ before the crash appears to be a red herring. What happened beteween that event and the crash? That’s the type of information I wish the debug output showed.

Figuring out what is wrong at an artists desk is not a great way go about this. We’re working on adding automated crash reporting to blender so you’ll have a nice report with proper stack trace on a centralized server in case of crashes. Will not solve the issues, but you’ll have a better idea where the problems are without having to visit any desks and go ‘uhh yeah dunno…weird…right?’

Keep an eye on https://developer.blender.org/D3576

2 Likes

This is great to hear, thanks for the link!

I see that @dfelinto put it on the 2.82 project, any idea what the likelihood is that we’ll see it for that release might be? trying to give people at my studio a light at the end of the tunnel, there’s some growing resentment about Blender happening here due to crashes that we can’t track down and I’d love to be able to stem the tide if possible.

Are you looking to report these crashes to us, or are you looking to run this internally for your studio and see where the pain points are?

The release slipped to a later version due to time constraints on setting up the server side, the patch hasn’t cleared review yet but i’m not expecting having to do changes on how it fundamentally works, if you are in a rush you could apply it to your local build and just hook it up to sentry.io

Both, ideally- if it’s a built-in blender issue we’d love to be able to submit those reports, but really what we would love to know is- when a crash happens, what were the events that led to that crash so we can know if it’s an addon or script that’s causing it. I think that’s the rub most people are feeling right now in our studio, and it’s a bit of a self-inflicted wound admittedly. There’s a lot of functionality Blender is missing that people transferring from autodesk software rely on, so our tech artists are helping to bridge the gap by writing lots of scripts and operators.

Unfortunately we’re now at a point where there are so many custom operators and scripts floating around in our pipeline that it’s nearly impossible to narrow it down to what is causing a crash, and I have no doubt that most if not all of these crashes are being caused by one or more of these scripts.

Just as an example, by sheer luck I happened to stumble across a 100% repro case on a script just yesterday that could have been easy to find if I had a crash log callstack to work with.

Anecdotally- here’s a rough transcript from a conversation I just had with an artist:

Artist: Hey I just had another Blender crash
Me: okay, walk me through what you were doing when it crashed
Artist: I had this object selected, then I scrolled in the outliner down to this other object, then I selected this object and then clicked in the 3D viewport. Instant crash to desktop with no error.
Me: Okaaayyyyy… that shouldn’t happen. I have no idea why it did that. Let me know if it happens again.

crickets

We have no troubleshooting tools for this. Obviously clicking from outliner to viewport shouldn’t cause a crash, something else is going on but where do we even start? There’s no clear repro for any of this stuff. It’s one thing if we say ‘yeah thats a known bug’ or ‘disable that script, its problematic’- but right now all we can do is shrug. It really chips away at people’s interest in using Blender- we’ve had a few guys switch back to Maya “until Blender gets more stable”. It’s no use explaining that Blender itself is already fairly stable and it’s likely third party code causing Blender to crash because to them it’s all the same.

All of this you likely already know- I’m not saying it to whine but to emphasize the importance of what you’re doing/have done and to hopefully shine some light on how unstable scripts and addons can make Blender appear- any crash reporting solution should absolutely include as much possible data leading up to a crash as possible (IE: operator was called via python module at line X, etc).

Not to neg your positive can-fix attitude, but I don’t believe you’re giving debugging enough respect as an art form.

What does that even mean? Care to elaborate?

It means open up x64dbg and look at the memory address while the process is running and find out

Thanks for the suggestion, but it’s clear to me (especially from your edits) that you just skimmed the thread and haven’t bothered to gain the full context here.

No, I read it.
I’m telling you that you have to debug, messages being printed within 1 second of a crash don’t necessarily have anything to do with why it crashed.

I’ll give you a naive example, because I’m cool like that.
Imagine we have two different objects in our program, and together they share a pointer. At some point in a 60 minute time period, one of the objects calls delete/free on the pointer. Later, the other object tries to use it. It’s going to crash now. All the debug messages will be about the second object, but the second object wasn’t the one which deleted a pointer with references.

Even once a crash is reproduced, it isn’t necessarily as simple as, “this undo action is why, so quit undoing”

Lets not get too far off track here, yes debugging is still going to be needed, however if you have a catalog of all crashes your artists are experiencing, it’s easier to focus on the one that happens 10 times a day for them rather then sinking time into this one crash that happened to this one guy this one time.

That’s why having crash reporting is valuable tool in your toolbox.

It won’t magically make debugging obsolete, nor will it make having repro cases obsolete, you’re just spending your time more effectively on the things that cause the most disturbances.

2 Likes

Craspad integration is still on my todo-list, of course, and indeed it is scheduled for 2.82 since we otherwise won’t have enough time to test the entire pipeline.

Anyway, while we don’t have that possibility right now, you should be able to use a debugger, attach it to the running process before it crashes.

If the artist is in-house I would assume it isn’t too hard to get together to make it happen.

If you’re in separate locations you should be able to do a remote debugging session. See for instance this page on remote debugging on Windows. It takes a moment to set up, but you’ll be able to see at least some form of callstack, even with a release build of Blender.

2 Likes

While you’re not technically wrong, it’s not that helpful either, you’ll get a list of addresses and nothing else, which it not all that useful. To get more (like function names, source files, line numbers etc) you’ll need the debugging symbols which we do not ship due to their size.

I’m not really into how that would work exactly, but couldn’t you ship the debugging symbols as a seperate file for every build? Like, enter the hash, and you get the download for the exact pdb file you need? Or do you mean you don’t ship it because of it’s size, because the files would quickly occupy a lot of server storage?

1 Like

hard drives are cheap, so i doubt it’s that- if anything bandwidth. though I would imagine that if set up the way you suggested, only an extremely small percentage of users would download those debug symbols so the additional traffic would be miniscule.

We don’t ship/generate them right now, since it would more than double the size of the download

Once the server side of crash reporting is setup, the build scripts will build the symbol files, and upload them to the processing server all in one go. So this problem should solve it self over time.

1 Like