GSOC 2025 Draft - Pitch Correction for Sound Playback in Sequencer

TheKaceFiles · March 26, 2025, 7:26pm

NOTE: This is currently a draft and subject to change. There’s probably lots of changes probably needed, especially with the schedule and the technical details of the project.

Feel free to leave any feedback or questions in the thread below or on the Google Doc here, and I’ll do my best to answer!

Project Title:

Pitch Correction for Sound Playback in Sequencer

Name:

Kacey La

Contact:

Email: [email protected]

Blender: TheKaceFiles

Synopsis:

Blender comes with a built-in video sequence editor (VSE) that allows users to do basic to intermediate video editing tasks. While the editor supports retiming video and audio through the retiming keys, one particular feature that is missing for audio is being to preserve the original pitch of the audio when it is sped up or slowed down.

Thus, this project will focus on adding a toggle option to preserve pitch in the speed intervals between the retiming keys.

Benefits:

Pitch correction is important in video editing software as it allows for users to manipulate the timing and duration of audio clips and still retain the natural quality of voices, music, and other sound effects. By integrating pitch correction into Blender’s VSE, it allows Blender to better become an open-sourced alternative to other paid video editing softwares. This will also better integrate into the workflow for users already utilizing Blender’s VSE, as it eliminates the need to adjust audio pitch in an external program and enables them to stay within Blender’s packaged 3D modeling and video editing suite

Deliverables:

Investigation document - explore research papers, voice and music
Isolated implementation of pitch-preserving algorithm outside of Blender
Integration of pitch preserving algorithm in Blender
End-user documentation
Super Stretch Goals - Start framework for pitch shifter, which will allow users to adjust the audio up or down the specified semi-tones

Project Details:

First, I will explore audio papers and 3rd party libraries that go into depth about pitch-shifting. I have so far compiled the following papers and libraries for approaching pitch shifting.

Papers:

See potential papers on Google doc

Libraries:

Rubber Band Library - Note the license for open source here.

After exploring the research papers, 3rd party libraries, and how others approached this problem, I will culminate my findings into a document, where if possible, I will compare the benefits and tradeoffs between the different approaches, and choose the best approach that fits the needs of Blender. If we decide to implement the correction algorithms manually, then time will be put towards implementing the algorithm outside of the Blender codebase. After receiving approval from my mentor and other developers, the algorithm will be integrated into Blender’s audio library Audaspace which will then be used by Blender’s main codebase.

Blender Integration:

In Blender, retiming keys can be added through the shortcut I → Add Retiming Key, which can be used to adjust the speed of strips. These retiming keys can be repositioned to achieve the effect of speeding up or slowing down the audio as indicated by the audio speed percentage. However, the change in the audio playback speed has the effect of distorting the natural tonal quality of the audio.

I propose adding a Preserve Pitch toggle option under the Sound tab for each audio strip instance (including the ones that created by the “Split Strip” operation) as demonstrated below.

The pitch correction algorithm itself will be implemented in Blender’s high-level audio library, Audaspace, where it would need to account for any audio speed percentage. Then it would be defined as a binding somewhere in extern/audaspace/bindings which will then be wrapped by a function in blenkernel in sound.cpp to be used in video sequencer code. Additional details will later be further solidified with mentor and other developers.

Then as for the UI, this will require defining an RNA property in the function rna_def_sound() in rna_seqeuncer.cc which will correspond to a newly defined DNA flag defined in DNA_seqeunce_types.h Additionally, the UI for preserve pitch toggle will need to be added in the draw() method in class SEQUENCER_PT_adjust_sound in space_sequencer.py By default, the Preserve Pitch option will be turned on. When the user turns on the preserve pitch option, the audio stored strip will be passed through the pitch correction function in blenkernel. Otherwise, if the toggle is off, the audio would play as it normally would in Blender.

Project Schedule:

This is a large-sized project (350 hours) with a predicted completion time frame spanning across 17-18 weeks. I will likely be working part-time over the summer and will likely commit at least ~20 hours per week on this project after my final semester ends on May 17th. I will probably be on vacation during one of the weeks over the summer. Regardless of whether the project is accepted to GSOC, I still intend to further explore different approaches for pitch correction over the summer and refresh my digital signal processing knowledge.

Week 1-4
- Read and explore different approaches; look at research papers, 3rd party libraries, or what others have implemented
- Create an investigation document listing benefits and tradeoffs for each approach (Deliverable #1)
- Research Blender’s codebase further and solidify details with mentor
Week 5-7
- Continue experimenting with implementation of pitch correction algorithm s
- Finalized isolated implementation of pitch correction in either Python or C++ (Deliverable #2)
- If we’re using a 3rd party library, the weeks can become additional buffer weeks or time spent into integrating the external library to Blender
Week 7-9
- Begin implementing pitch correction algorithm into Audaspace
- Add bindings to extern/audaspace/bindings and integrate into blenkernel
- Ask for feedback and fix any issues from the community
Week 10-13
- Implement UI changes for pitch correction toggle
- Finish integrating pitch correction toggle functionality (Deliverable #3)
- Continue to ask for feedback and fix any issues from the community relating to pitch correction functionality
Week 14-16
- Prepare for final submission
- Make sure the code and functionality is well-optimized and memory-efficient (i.e no major bottleneck delays while preserving pitch and audio is being previewed)
- Clean-up code and add test cases (if needed)
- Finalize user and developer documentation (Deliverable #4)
Week 17-18
- Buffer Weeks, start thinking about the pitch shifter stretch goal if enough time remains or fix any other bugs with VSE

Bio:

My name is Kacey, and I’m currently a senior at a small college called Ursinus College studying Computer Science and Math with an interest in computer graphics, game development, and a bit of digital signal processing. I am currently a Blender beginner, used Blender’s VSE to render a small video, and have some intermediate experience with Python and C++. In my free time, I’m a hobbyist game developer, where I frequently contribute to a roguelike library called RogueEssence/PMDC, help modders troubleshoot some bugs, and write some developer guides in the wiki.

I took a course called Digital Music Processing in 2023, where it broadly overviews how to represent, analyze, and morph/transform digital musical audio. Some topics that were covered include the Fourier Transform, beat tracking, spectrograms, autocorrelation, and implementing a few audio algorithms from papers such as Let it Bee - Towards NMF-inspired Audio Mosaicing.

I’ve been meaning to contribute to a large open-source project for a long time. Almost all tools that I currently use on a very frequent basis and love are open-sourced (Obsidian, Typst, Godot, and of course, RogueEssence!) and I feel that Blender best fits my interests and skills. I have lots to learn, but I hope I can become a lifelong contributor for a project such as Blender and further my understanding of this codebase.

My previous contributions so far (note that one PRs have not been reviewed yet currently as of 3/30/2025):

#136348: VSE: Fix delete retiming key and description

#133747: Improved UI for File Output node panel

Lastly, something very minor, but I reported a Blender Mac issue where the player head was inconsistently resetting back to the beginning and a small UI issue with the retiming keys.

iss · April 1, 2025, 11:32pm

Overall proposal looks great!

Perhaps rather than suggestion for the proposal itself, I wanted to “warn” you about doing too much programming in python during the project. At least I found FFTW much more unfriendly and harder to debug, than doing same stuff with numpy, so I would suggest trying to imlement super simple, even naive algorithm in C++ to dip your toes in at least.
Using python is great to scan through various techniques quickly, but it does a lot of stuff for you.

Ok maybe bit of actual feedback. Maybe it’s just me, but this reads to me, that it is perhaps bit confusing to you. I apologize if I am incorrect, but if I am not, let me explain this a bit:

High level overview is, that Audaspace supports animating sound properties. One of these properties is pitch. The way it’s currently animated is, that you just “upload” a big array with floats to buffer in Audaspace. This is done with function defined in bindings header (its really just normal API). Now whether you reuse pitch property or introduce new property, is largely irrelevant for the proposal IMO. Important is that in the end, you will end up with 2 behaviors and you need a way to switch between 2.

Also reading further, the Blender does not do sound at all - it does not read the files and it does not communicate with sound hardware. It is all handled by Audaspace. So to play sound, we basically copy Blender timeline to its own timeline along with properties and say “now play” via API.

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations