GSoC 2024: Sample Sound Node

Update 4/2/2024: Revision 3 (GSoC version).


Project Title

Sample Sound Node

Name

Lleu Yang

Contact

E-mail: [email protected]

Blender developer account: @megakite

Blender chat: Lleu Yang

Synopsis

This project adds a Sample Sound node that retrieves audio from sound files, and provides their frequency response over time for use in Geometry Nodes.

Benefits

The built-in Geometry Nodes system currently lacks a way to retrieve sounds and generate useful information from them. This makes some specific kind of tasks (e.g. music visualization) hard to be accomplished using vanilla Blender.

Therefore, this project aims to:

  • Provide the ability to retrieve sounds from files in Geomety Nodes,
  • Generate amplitude/frequency response information based on several customizable parameters, and
  • Be written in native C++ with caching/proxy operations to speed up execution.

This project will empower creators to easily access useful information of sounds, which opens up countless possibilities in their creative works. The whole Geometry Nodes system will also benefit from this project, since it now has a brand new dimension: the dimension of audibles.

Deliverables

  • A new socket type in Geometry Nodes called Sound that corresponds to Blender’s data-block type of Sound;

  • A Sound Input node that can read a single Sound from Sounds in data-blocks;

  • A Sample Sound node that takes a Sound, which then goes through several tunable internal processes, including:

    • Gain control,
    • Playback progress (sample time) control,
    • Temporal smoothing,
    • Audio channel selection,
    • Frequency specification (FFT.)

    The Sound will be finally converted to a corresponding amplitude/power value (as Float) by using the options above.

  • A series of usage examples and documentation related to all the deliverables listed above.

Project Details

User Interface

Below is a simple user interface mockup of Sound Input and Sample Sound node:

And one of their possible use cases, i.e. visualizing spectrum for a given Sound:

Details may be further determined during the coding period.

Sound Socket

Blender’s Sound data-block struct bSound already packs up all the data needed for audio processing, e.g. channel amount and sample rate. Thus, the Sound socket should be a relatively thin wrapper upon the Sound data-block type.

Library for Audio Processing

Blender uses Audaspace as its all-purpose audio engine. It already provides nearly all the operations that this project will make use of, for example:

  • Retrieving audio from sound data-block through aud::SoftwareDevice which supports various mixing operations as well as 3-D audio processing,

  • Directly reading sound samples through aud::SequenceReader, and

  • Proceeding FFT calculation using FFTPlan (which is a wrapper upon fftw3.)

Actual functionality of this project will be primarily based on Audaspace.

Caching

Sound Spectrum, which is part of the original Animation Nodes project that calculates spectrum from audio, uses a LRU cache to store Kaiser window function results.

Caching mechanisms like this should be implemented in a larger scale in order to speed up overall execution. One of blender’s dependencies, Boost, already provides several ready-to-use classes, e.g. boost::compute::detail::lru_cache, which can be a good start.

FFT Result Storing

General audio processing software and DAWs usually generate a peak file for each corresponding audio file to achieve faster waveform drawing procedure, e.g. REAPER’s .reapeaks file:

Similar method can be used for storing calculated FFT results. Detailed file format may be designed later in coding period.

Project Schedule

Week Task
0 Bugfix related to Geometry Nodes, RNA/DNA, and media processing
1-2 Basic functionality of Sample Sound node: gain control, playback progress control, channel selection, time domain information (overall amplitude)
3-5 Full implementation of the Sample Sound node: temporal smoothing, frequency domain information (FFT), and caching mechanism
6 Midterm evaluation
7-8 Implementation of the Sound socket type
9 Implementation of the Sound Input node
10-11 Documentation, refinement and bugfix
12 Final evaluation
13 Padding

References

  • Sound Spectrum: One of the nodes from the original Animation Nodes system. It already provides nearly all features that this project plans to implement, and therefore is a very valuable reference for this project.

  • Sound Nodes: An add-on that enables user to analyze and make use of sound clips. Its functionality is based on pre-applied keyframes, which is not as flexible as real-time calculation. Despite the limit, it can still be a possible reference for the bigger picture.

  • Mic Visualization System: A real-time microphone visualization system utilizing UPBGE. Although real-time input is not listed in current deliverables, it can be inspiring for possible extension in the future.

Bio

I am Lleu Yang, an undergraduate student in Computer Science from China, and also a hobbyist music producer and visual designer.

I have a solid grasp of Blender’s usage, and have made several 3D stills and movies using this amazing software.

I am proficient in C and Python, fluent enough in C++, and have been consistently learning modern C++ features and Rust.

I have prior experience in audio/video processing using FFmpeg. I also have extensive knowledge in common audio/video codecs.

I have previously worked on some projects related to signal processing that utilizes FFT, e.g. EEG analysis.

Here are my contributions to Blender so far:

28 Likes

nice!

I have done raw input in upbge using a thread / py before

would be interesting to have that as well - (system audio in) - for stuff like realtime virtual set extension

2 Likes

Very exciting! any ideas how the Sample Sound node could perhaps work with fields? In your mockup I assume the node outputs a single amplitude value for each Sound Input given a specificed frequency. I can see how one would want to split up a given sound input into several amplitude values, to have geometry respond to them separately. Just food for thought, I am not particularly well versed in motion graphics but trying to think of various use cases. Good luck!

Wondering, why use ffmpeg plus some external FFT library, instead of Audaspace? Audaspace is the audio library that blender uses for all things audio related already, and it does file reading, mixing, etc. etc. Plus (I think?) it can do FFT too.

1 Like

Audaspace uses fftw3 internally afaik, which ship in both float and double flavors in our standard library set. It’s also used in the ocean modifier, so it’s already integrated into the build system, ready to use.

1 Like

Some current limitation for design from this mockup:

  1. Built-in node can not have menu socket. This related with the fact that multiple menus cannot be joined from multiple sources. So, if someone will want to use 2 or more Sample Sound node together, menus will have to be connected to Index Switch and controlled such…
  2. Not sure about using 2 new kinds of input data: Speaker and Audio File. I used OpenAL only once so i am not really expert in this area, but pretty sure that it is enough to just input positions in space to sample sound. If use object as container for such data, this can be too non-easy to just sample sound for any Speakers which is can be generated by geometry nodes (geometry nodes can’t generate objects).
  3. Same for Audio. Not sure what this is but this might be redundant to have container of sound data per time/location, instead this should be kind of data that is used to sample sound. But this might depend on technical details…?

All this details can be resolved in blender, but might be too complicated so it might be better to solve them on design side instead.

1 Like

@Jacob_Merrill Nice work! Real-time audio input is definitely worth checking out. Will consider adding this feature into deliverables.

@Hadriscus It works exactly like that! A very common usage would be visualizing audio spectrum, where you can get a series of evenly distributed points connected to an Instance on Points node, then manipulate the scale value of each instance (typically cube) by specifying a range on the Sample Sound node’s Frequency option. Further exploration is all in your hands :wink:

@aras_p @LazyDodo Thank you both for suggesting Audaspace! I was previously not aware of it, so I simply decided to combine FFmpeg + an external library together. I will take a deeper look into Audaspace and reconsider the project details.

@modmoderVAAAA Thank you very much for this detailed feedback! The following is what I’ve considered so far:

  1. The idea of having a menu socket instead of integer input is because that FFT size should be power of 2 (or 3, 5 if the library supports,) so only a fixed number of value is valid. If menu socket cannot be used in built-in nodes, I believe there should be another way to constrain input values?
  2. If a position is all we need to sample sound from a scene, then there is definitely no need to have an object input. A position input should be enough. Will change.
  3. The Audio container is primarily designed for audio data read from files. Although technically we can use a raw sequence to represent audio data, it is not very convenient since audio files may have various specifications, i.e. different number of channels, sample rate, etc. I guess it can also help with sounds directly sampled from scene, but I need to take a closer look at OpenAL-related things.
2 Likes

I mainly interested if this is possible to just use time point value, distance value, and amplitude channel value (all floating point input fields) as way to sample just one float of sound volume. There is no way to play song in geometry nodes to speaker, so i don’t know is this is really matter to have such detailed info for time point like FFT size or other things that i don’t know about.

1 Like

I love the idea. Another node that could be very useful would be to handle MIDI files. Currently to process MIDI files you have to use python or other alternatives.

3 Likes

I see. After looking at the code of Audaspace, I have discovered that your point is totally valid.

If we are sampling sound from scene, it would be perfectly fine to just use the three inputs that you mentioned. When a frequency is specified, depending on the FFT Size, various amounts of latency may be introduced, since we need time for FFT to finish its calculation; but that is definitely not the reason we want to pack a whole bunch of extra information together. We can do simple buffering to get over it.

There is another thing, that I just came to realize that Blender already has a data type of Sound…which just contains about everything that I am going to put into the Audio type :sweat_smile: That means the Audio type is certainly redundant. Sorry for the frustration and I will redesign the node right away. Thank you!

Data block of sound still valid new type of socket, so blender will maintain open sound file and geometry nodes will care only about to sample this file.

1 Like

Just wanted to share that we created a similar node for Animation Nodes with some extra important features like temporal smoothing, maybe the implementation or the design could be useful for you. We used the FFT in numpy.

Example usage:

https://twitter.com/OmarEmaraDev/status/1084881010234404864

Documentation:

Code:

2 Likes

Thank you very much for creating & sharing this! It would definitely be of great help! :hugs: