Blender 2.80.58 crashing when rendering using our addon

Hi there,

Ok, so we’ve been carefully updating our addon as per the changes to the API, today when testing 2.80.58 we noticed that we had crashes during rendering that haven’t been an issue in 2.79 and seemingly not an issue for 2.80 so far.

The problem seems to be in our custom render engine class. There are two distinct varieties of the crash, one is where a segfault is causing the crash, the other a SIGABORT. The segfault unfortunately seems to not provide us with a line number for the crash (we’re using faulthandler in python to try and get line number info), but the abort does give us a line number which points to a line in our code that is polling a message queue.

The segault is also intermittent, whereas we can reproduce the abort fairly easily and seems to be consistent. Both of these occur on windows and MacOS.

To test we’ve used the Mike Pan benchmark and the latest build of our addon that is being updated for blender 2.80.

The abort produces the following output in the system console

location: <unknown location>:-1
blender(3030,0x113faa5c0) malloc: *** error for object 0x7fba86ca7250: pointer being freed was not allocated
blender(3030,0x113faa5c0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

The type of blend file seems to have an impact aswell. I can render the default scene multiple times over with no issues, but rendering the Mike Pan scene, I can do three maybe four renders before the segfault happens. I’ve also tested with scenes specially built for 2.80 such as the racer file, this exhibits the same segault and abort behaviour.

We think something has changed thats affecting the RenderEngine based class we’ve built. This class basically accepts rendered tiles from other computers and loads them as render results. It has to poll a message queue to do this, we also update parts of the ui to show the % complete status for each computer.

Those are the most significant things our code is doing as far as I can recall, something we’re doing seems to be not working anymore!

We would love some insights from the blender devs as to how to troubleshoot and adapt the addon.

Just wanted to update this post, further testing appears to show that animation renders done with the addon are not affected, the segfault does not happen with the MikePan scene, i managed to render 30 plus frames without seeing any crash.

Ok, reproduced the segfault again, macOS produced a crash report, see below. I was using the Mike Pan scene on Macos 10.14.4 with blender 2.80.58.

crash report (a bit long)

Process: blender [3332]
Path: /Applications/Blender_exp/*/blender.app/Contents/MacOS/./blender
Identifier: org.blenderfoundation.blender
Version: 2.80 (2.80 2019-04-24, Blender Foundation)
Code Type: X86-64 (Native)
Parent Process: bash [2468]
Responsible: blender [3332]
User ID: 501

Date/Time: 2019-04-25 13:12:08.782 +1000
OS Version: Mac OS X 10.14.4 (18E226)
Report Version: 12
Anonymous UUID: 17EF2D3A-0AF3-6DC2-5291-566FA57E29F7

Sleep/Wake UUID: 244AC1EF-44D2-4B78-B922-74A2266D8739

Time Awake Since Boot: 35000 seconds
Time Since Wake: 12000 seconds

System Integrity Protection: enabled

Crashed Thread: 1

Exception Type: EXC_CRASH (SIGSEGV)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY

Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: blender [3332]

Thread 0:: Dispatch queue: com.apple.main-thread
0 org.blenderfoundation.blender 0x000000010536cd70 wm_handlers_do_intern + 144
1 org.blenderfoundation.blender 0x0000000105368e5f wm_handlers_do + 31
2 org.blenderfoundation.blender 0x0000000105368313 wm_event_do_handlers + 835
3 org.blenderfoundation.blender 0x00000001053625b0 WM_main + 32
4 org.blenderfoundation.blender 0x0000000104e7050f main + 927
5 libdyld.dylib 0x00007fff74a3f3d5 start + 1

Thread 1 Crashed:
0 libsystem_kernel.dylib 0x00007fff74b7786a __psynch_cvwait + 10
1 libsystem_pthread.dylib 0x00007fff74c3056e _pthread_cond_wait + 722
2 libc++.1.dylib 0x00007fff71a4ea0a std::__1::condition_variable::wait(std::__1::unique_lockstd::__1::mutex&) + 18
3 org.blenderfoundation.blender 0x00000001064ea1d3 IlmThread_2_3::Semaphore::wait() + 147
4 org.blenderfoundation.blender 0x00000001064e7732 IlmThread_2_3::(anonymous namespace)::DefaultWorkerThread::run() + 66
5 org.blenderfoundation.blender 0x00000001064e9c27 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, void (IlmThread_2_3::thread::)(), IlmThread_2_3::Thread> >(void*) + 663
6 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
7 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
8 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 2:
0 libsystem_kernel.dylib 0x00007fff74b7786a __psynch_cvwait + 10
1 libsystem_pthread.dylib 0x00007fff74c3056e _pthread_cond_wait + 722
2 libc++.1.dylib 0x00007fff71a4ea0a std::__1::condition_variable::wait(std::__1::unique_lockstd::__1::mutex&) + 18
3 org.blenderfoundation.blender 0x00000001064ea1d3 IlmThread_2_3::Semaphore::wait() + 147
4 org.blenderfoundation.blender 0x00000001064e7732 IlmThread_2_3::(anonymous namespace)::DefaultWorkerThread::run() + 66
5 org.blenderfoundation.blender 0x00000001064e9c27 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, void (IlmThread_2_3::thread::)(), IlmThread_2_3::Thread> >(void*) + 663
6 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
7 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
8 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 3:
0 libsystem_kernel.dylib 0x00007fff74b7786a __psynch_cvwait + 10
1 libsystem_pthread.dylib 0x00007fff74c3056e _pthread_cond_wait + 722
2 libc++.1.dylib 0x00007fff71a4ea0a std::__1::condition_variable::wait(std::__1::unique_lockstd::__1::mutex&) + 18
3 org.blenderfoundation.blender 0x00000001064ea1d3 IlmThread_2_3::Semaphore::wait() + 147
4 org.blenderfoundation.blender 0x00000001064e7732 IlmThread_2_3::(anonymous namespace)::DefaultWorkerThread::run() + 66
5 org.blenderfoundation.blender 0x00000001064e9c27 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, void (IlmThread_2_3::thread::)(), IlmThread_2_3::Thread> >(void*) + 663
6 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
7 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
8 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 4:
0 libsystem_kernel.dylib 0x00007fff74b7786a __psynch_cvwait + 10
1 libsystem_pthread.dylib 0x00007fff74c3056e _pthread_cond_wait + 722
2 libc++.1.dylib 0x00007fff71a4ea0a std::__1::condition_variable::wait(std::__1::unique_lockstd::__1::mutex&) + 18
3 org.blenderfoundation.blender 0x00000001064ea1d3 IlmThread_2_3::Semaphore::wait() + 147
4 org.blenderfoundation.blender 0x00000001064e7732 IlmThread_2_3::(anonymous namespace)::DefaultWorkerThread::run() + 66
5 org.blenderfoundation.blender 0x00000001064e9c27 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, void (IlmThread_2_3::thread::)(), IlmThread_2_3::Thread> >(void*) + 663
6 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
7 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
8 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 5:: com.apple.audio.IOThread.client
0 libsystem_kernel.dylib 0x00007fff74b7422a mach_msg_trap + 10
1 libsystem_kernel.dylib 0x00007fff74b7476c mach_msg + 60
2 com.apple.audio.CoreAudio 0x00007fff48041eda HALB_MachPort::SendMessageWithReply(unsigned int, unsigned int, unsigned int, unsigned int, mach_msg_header_t*, bool, unsigned int) + 122
3 com.apple.audio.CoreAudio 0x00007fff48041e4f HALB_MachPort::SendSimpleMessageWithSimpleReply(unsigned int, unsigned int, int, int&, bool, unsigned int) + 45
4 com.apple.audio.CoreAudio 0x00007fff4803e39f HALC_ProxyIOContext::IOWorkLoop() + 1017
5 com.apple.audio.CoreAudio 0x00007fff4803ddf4 HALC_ProxyIOContext::IOThreadEntry(void*) + 122
6 com.apple.audio.CoreAudio 0x00007fff4803d956 HALB_IOThread::Entry(void*) + 72
7 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
8 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
9 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 6:
0 libsystem_kernel.dylib 0x00007fff74b7786a __psynch_cvwait + 10
1 libsystem_pthread.dylib 0x00007fff74c3056e _pthread_cond_wait + 722
2 org.blenderfoundation.blender 0x00000001050d942c task_scheduler_thread_run + 124
3 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
4 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
5 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 7:
0 libsystem_kernel.dylib 0x00007fff74b7786a __psynch_cvwait + 10
1 libsystem_pthread.dylib 0x00007fff74c3056e _pthread_cond_wait + 722
2 org.blenderfoundation.blender 0x00000001050d942c task_scheduler_thread_run + 124
3 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
4 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
5 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 8:
0 libsystem_kernel.dylib 0x00007fff74b7786a __psynch_cvwait + 10
1 libsystem_pthread.dylib 0x00007fff74c3056e _pthread_cond_wait + 722
2 org.blenderfoundation.blender 0x00000001050d942c task_scheduler_thread_run + 124
3 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
4 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
5 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 9:
0 libsystem_kernel.dylib 0x00007fff74b7a78e kevent + 10
1 libzmq.cpython-37m-darwin.so 0x000000011a58994b zmq::kqueue_t::loop() + 171
2 libzmq.cpython-37m-darwin.so 0x000000011a57f70e thread_routine(void*) + 46
3 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
4 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
5 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 10:
0 libsystem_kernel.dylib 0x00007fff74b7a78e kevent + 10
1 libzmq.cpython-37m-darwin.so 0x000000011a58994b zmq::kqueue_t::loop() + 171
2 libzmq.cpython-37m-darwin.so 0x000000011a57f70e thread_routine(void*) + 46
3 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
4 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
5 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 11:: com.apple.NSEventThread
0 libsystem_kernel.dylib 0x00007fff74b7422a mach_msg_trap + 10
1 libsystem_kernel.dylib 0x00007fff74b7476c mach_msg + 60
2 com.apple.CoreFoundation 0x00007fff485c813e __CFRunLoopServiceMachPort + 328
3 com.apple.CoreFoundation 0x00007fff485c76ac __CFRunLoopRun + 1612
4 com.apple.CoreFoundation 0x00007fff485c6e0e CFRunLoopRunSpecific + 455
5 com.apple.AppKit 0x00007fff45c55d1a _NSEventThread + 175
6 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
7 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
8 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 12:
0 libsystem_pthread.dylib 0x00007fff74c2c3f0 start_wqthread + 0

Thread 13:
0 libsystem_pthread.dylib 0x00007fff74c2c3f0 start_wqthread + 0

Thread 14:
0 libsystem_pthread.dylib 0x00007fff74c2c3f0 start_wqthread + 0

Thread 15:
0 libsystem_pthread.dylib 0x00007fff74c2c3f0 start_wqthread + 0

Thread 16:
0 org.blenderfoundation.blender 0x00000001051aca49 MEM_lockfree_mallocN + 73
1 org.blenderfoundation.blender 0x000000010507da0c BLI_mempool_create + 284
2 org.blenderfoundation.blender 0x0000000105075760 BLI_ghash_new + 112
3 org.blenderfoundation.blender 0x0000000105168deb DEG::DepsNodeFactoryImplDEG::GeometryComponentNode::create_node(ID const*, char const*, char const*) const + 171
4 org.blenderfoundation.blender 0x000000010516b456 DEG::IDNode::add_component(DEG::NodeType, char const*) + 86
5 org.blenderfoundation.blender 0x0000000105150c22 DEG::DepsgraphNodeBuilder::build_object_data_geometry(Object*, bool) + 114
6 org.blenderfoundation.blender 0x000000010514e49e DEG::DepsgraphNodeBuilder::build_object(int, Object*, DEG::eDepsNode_LinkedState_Type, bool) + 446
7 org.blenderfoundation.blender 0x0000000105153f07 DEG::DepsgraphNodeBuilder::build_view_layer(Scene*, ViewLayer*, DEG::eDepsNode_LinkedState_Type) + 167
8 org.blenderfoundation.blender 0x000000010516d594 DEG_graph_build_from_view_layer + 116
9 org.blenderfoundation.blender 0x0000000105004af0 BKE_scene_graph_update_for_newframe + 128
10 org.blenderfoundation.blender 0x00000001051d5831 RE_engine_render + 881
11 org.blenderfoundation.blender 0x00000001051dff83 do_render_all_options + 435
12 org.blenderfoundation.blender 0x00000001051dfa98 RE_BlenderFrame + 200
13 org.blenderfoundation.blender 0x000000010a622f47 render_startjob + 119
14 org.blenderfoundation.blender 0x00000001053776d4 do_job_thread + 36
15 libsystem_pthread.dylib 0x00007fff74c2d2eb _pthread_body + 126
16 libsystem_pthread.dylib 0x00007fff74c30249 _pthread_start + 66
17 libsystem_pthread.dylib 0x00007fff74c2c40d thread_start + 13

Thread 1 crashed with X86 Thread State (64-bit):
rax: 0x0000000000000004 rbx: 0x0000000000000002 rcx: 0x0000700005e1b988 rdx: 0x0000000000000000
rdi: 0x00007f8e13600710 rsi: 0x0000000100000100 rbp: 0x0000700005e1ba10 rsp: 0x0000700005e1b988
r8: 0x0000000000000000 r9: 0x00000000000000a0 r10: 0x0000000000000000 r11: 0x0000000000000202
r12: 0x00007f8e13600710 r13: 0x0000000000000016 r14: 0x0000000100000100 r15: 0x0000700005e1c000
rip: 0x00007fff74b7786a rfl: 0x0000000000000203 cr2: 0x0000700005e1bf78

Logical CPU: 0
Error Code: 0x02000131
Trap Number: 133

There is also a crash report from blender, but there’s not much in it

Blender crash report

Blender 2.80 (sub 58), Commit date: 2019-04-24 02:30, Hash 1b839e85e142

Connected to node manager # Info
Saving copy of blend file. # Info
local ready # Info
local is synced # Info
local is synced # Info
Connected to render node server # Info
local is synced # Info
local is synced # Info
local is synced # Info
local is synced # Info
local is synced # Info
local is synced # Info
local is synced # Info
local is synced # Info
local is synced # Info
local is synced # Info
local is synced # Info
local is synced # Info

backtrace

Finally I got a line number for the segfault in our code, I had to write a print statement in the update_render_passes method of the RenderEngine class. No idea why that worked though, anyways here’s the line in our code that fails;

if srl.use_pass_environment:           self.register_pass(scene, srl, "Env",           3, "RGB",  'COLOR')

The segfault happens on executing this line. What is more is that the enclosing method, which is called by blender to update the render passes, is called twice each time a render is called. I have no idea if that is normal. Further, it gets called twice all the time, whether the segfault happens or not.

Question here, @brecht I really hope you or someone can help, is this double calling of the method to register the render passes normal? Is the segfaul on the line above possibly related or even caused by this?

I should point out that at this point, we’ve not called any of the code we’ve written for the RenderEngine subclass we made save for this one method and we’ve pretty much copied what cycles does to register the passes, so I am thinking at this stage this is a bug with blender. Happy to be corrected on that if there’s a mistake we’ve made though. Right now I can’t see if thats the case.

Thanks!

Hi @brecht, hope you’re having a great easter break. Anyways, we’ve isolated the problem. The render engine instance seems to be fine, I can render repeatedly as many times as I like, the actual issue is that if I use the button we created as part of the crowdrender ui panel. Then we get a segfault in a seemingly random location somewhere in the RenderEngine Class.

2.79 doesn’t do this. So I think there’s been a bug introduced that hasn’t been caught.

We have tested extensively over the past two days collecting data. I can safely rule our our implementation of the blender RenderEngine class, since we commented out everything in it except for

def render(self, depsgraph):
    pass

Which still segfaulted. It was only when we tested thoroughly using the command line to render vs using the UI that we found the button to render still animations was the issue.

This button calls an operator that configures our addon to do the render, then calls

bpy.ops.render.render()

from the execute method of that render configuring operator. Once again, this works totally fine as long as that operator is run from the python console. If I press the render still button we made for our crowdrender panel (which runs exactly the same operator) then after a few renders, a segfault will happen.

It would be awesome to help you guys fix this. Happy to provide more data and write a bug report. Just let me know :slight_smile: