GSoC 2025 Draft - Mesh Sculpting Performance: Direct GPU Uploads & Spatial Data Optimization

Hello all, I am interested in contributing to the opitimization of mesh sculpting performance this summer! And here is my draft of proposal. I would appreciate any feedbacks and questions about the implementation details in the comments or on the Google Doc here.

Project Title Name
Mesh Sculpting Performance: Direct GPU Uploads & Spatial Data Optimization

Name
Yue Sun

Contact
Email: [email protected]
Blender Chat: @yues:blender.org

Synopsis
Blender’s current sculpting pipeline involves copying vertex data from CPU buffers to GPU memory. This incurs extra memory overhead and increased latency during brush operations, especially on high-poly meshes. The two optimizations proposed here target different layers of this process:
• Direct GPU Vertex Buffers:
Allocate GPU memory directly and extract vertex data into it, avoiding an intermediate CPU copy.
• Spatially Sorted Meshes:
Reorganize vertex and face data to match BVH node spatial partitions, ensuring contiguous memory access and further reducing data transfer latency.
Both approaches are complementary: direct GPU uploads cut down redundant memory copies, while spatial sorting ensures that data is optimally arranged for fast access during sculpt operations.

Benefits
Direct GPU Vertex Buffers: Reduction in CPU-to-GPU data transfer time and lower memory overhead, leading to smoother brush strokes and faster mode switching.
Spatially Sorted Meshes: Improved cache performance and reduced latency in CPU-GPU data access, especially for complex, high-poly meshes. This should translate into more responsive sculpting operations.
Combined Impact: While Direct GPU Vertex Buffers minimize redundant data transfers, Spatially Sorted Meshes ensure that the data is optimally arranged for those transfers. Together, they address both the organization and transfer inefficiencies in the current sculpting pipeline.
Future-Proofing: These improvements pave the way for GPU-driven workflows (e.g., compute shaders or Vulkan/Metal backends) by establishing a more unified and efficient GPU buffer management strategy.

Deliverables

  1. GPU Direct Vertex Buffers
    Code Changes:
    Modify draw_sculpt.cc to bypass MEM_mallocN using stack buffers.
    Implement backend-specific synchronization (Metal: didModifyRange, Vulkan: staging buffers).
    Metrics:
    Profile CPU/GPU memory usage with blender --debug-gpu.
  2. Spatial Mesh Sorting
    Features:
    Develop an operator for spatial sorting on sculpt mode entry.
    Intergration with BKE_pbvh to handle sorted/unsorted indices via IndexMask.
    Undo/Redo: Optimize undo/redo handling for sorted meshes.
  3. Performance Reports
    Benchmarks: Compare pre/post-optimization metrics for:
    Memory usage, CPU-GPU copy time reduced and memory overhead lowered.
    Sculpt mode entry times for large face mesh.
    Brush stroke lantency and sculpting FPS (using CPU and GPU profiling tools).
  4. Documentation
    User Guide: How to enabling spatial sorting via operator panel.
    Technical Docs: GPU buffer API changes and spatial sorting integration.

Projects Details

  1. Direct GPU Vertex Buffers
    Current Problem:
    The function sculpt_batches_get() extracts dirty nodes from the PBVH and creates SculptBatch vectors by allocating CPU buffers and then copying data to the GPU. This extra step increases peak memory usage and delays sculpting responsiveness.
    Proposed Solution:
    Allocation: Use GPU_vertbuf_alloc() upfront for Metal/Vulkan.
    Parallel Extraction:
    Thread-local stack buffers (e.g., thread_local float tls_buffer[len]) for per-node data.
    Chunked uploads to avoid GPU read/write conflicts.
    Synchronization:
    Metal: [MTLBuffer didModifyRange] post-extraction.
    Vulkan: vkCmdPipelineBarrier after staging buffer copies.
  2. Spatially Sorted Meshes
    Current Problem:
    Mesh elements (vertices, faces) are accessed indirectly via index arrays, resulting in scattered memory access patterns. This fragmentation hinders performance during sculpt operations.
    Proposed Solution:
    Spatial Sorting:
    Reorder Mesh::CustomData based on BVH node spatial bounds, rebuild the BVH after sorting to make sure BVH reference contiguous vertex ranges.
    Add a Sculpt Mode operator to trigger sorting on demand.
    IndexMask Integration: Integrate IndexMask::from_range() to handle our sorted indices.
    Undo/Redo Optimization:
    Track deltas (changes in vertex order) instead of full mesh snapshots.
    Store pre-sorted indices in SculptUndoNode to minimize memory overhead.

Project Schedule
Total Duration: 12 weeks (May 27 – August 26)
Weeks 1-2
• Discuss design with mentors
• Set up development environment, profiling tools and benchmarks pre-optimization
Weeks 3-4
• Implement Direct GPU Vertex Buffers with stack-based chunking.
• Benchmark memory usage.
Weeks 5-6
• Optimize parallel GPU writes and validate thread safety.
• Handle different backend APIs and address API-specific chanllenges.
Weeks 7-8
• Develop the spatial sorting operator
• Integrate with IndexMask support.
Weeks 9-10
• Profile sorting impact on sculpting workflows
• Refine undo/redo handling.
Weeks 11-12
• Final testing, documentation, and code cleanup.
Buffer (1 week)
• Contingency for unexpected delays.
The schedule is subject to dynamic adjustments based on progress and mentor feedback. I am fully available this summer, allowing flexibility to extend the schedule if necessary.

Bio
I am a computer science student at the University of Montreal with a passion for graphics and high-performance rendering. My experience as a technical artist in the game industry has given me firsthand familiarity with Blender as a user. I am now pursuing a career as a graphics engineer, and contributing to Blender’s open-source development aligns perfectly with my skills and interests.
My background includes:
• GPU Programming: Proficiency in Vulkan and OpenGL.
• BVH Structures: Experience with acceleration structures from C++ ray tracing projects.
• Rendering Pipelines: A deep understanding of rendering pipelines and strong C++ proficiency.
My previous work in rendering and exposure to GPU pipelines motivates me to refine Blender’s sculpting performance. I have already begun familiarizing myself with Blender’s project structure and am eager to tackle the complexities of its sculpting pipeline, achieving smoother and more efficient sculpting experiences for users.

18 Likes