GSoC 2024 Proposal: Sample Sound Node

megakite · March 23, 2024, 9:18am

Update 4/2/2024: Revision 3 (GSoC version).

Project Title

Sample Sound Node

Name

Lleu Yang

Contact

E-mail: [email protected]

Blender developer account: @megakite

Blender chat: Lleu Yang

Synopsis

This project adds a Sample Sound node that retrieves audio from sound files, and provides their frequency response over time for use in Geometry Nodes.

Benefits

The built-in Geometry Nodes system currently lacks a way to retrieve sounds and generate useful information from them. This makes some specific kind of tasks (e.g. music visualization) hard to be accomplished using vanilla Blender.

Therefore, this project aims to:

Provide the ability to retrieve sounds from files in Geomety Nodes,
Generate amplitude/frequency response information based on several customizable parameters, and
Be written in native C++ with caching/proxy operations to speed up execution.

This project will empower creators to easily access useful information of sounds, which opens up countless possibilities in their creative works. The whole Geometry Nodes system will also benefit from this project, since it now has a brand new dimension: the dimension of audibles.

Deliverables

A new socket type in Geometry Nodes called Sound that corresponds to Blender’s data-block type of Sound;
A Sound Input node that can read a single Sound from Sounds in data-blocks;
A Sample Sound node that takes a Sound, which then goes through several tunable internal processes, including:
- Gain control,
- Playback progress (sample time) control,
- Temporal smoothing,
- Audio channel selection,
- Frequency specification (FFT.)
The Sound will be finally converted to a corresponding amplitude/power value (as Float) by using the options above.
A series of usage examples and documentation related to all the deliverables listed above.

Project Details

User Interface

Below is a simple user interface mockup of Sound Input and Sample Sound node:

And one of their possible use cases, i.e. visualizing spectrum for a given Sound:

Details may be further determined during the coding period.

Sound Socket

Blender’s Sound data-block struct bSound already packs up all the data needed for audio processing, e.g. channel amount and sample rate. Thus, the Sound socket should be a relatively thin wrapper upon the Sound data-block type.

Library for Audio Processing

Blender uses Audaspace as its all-purpose audio engine. It already provides nearly all the operations that this project will make use of, for example:

Retrieving audio from sound data-block through aud::SoftwareDevice which supports various mixing operations as well as 3-D audio processing,
Directly reading sound samples through aud::SequenceReader, and
Proceeding FFT calculation using FFTPlan (which is a wrapper upon fftw3.)

Actual functionality of this project will be primarily based on Audaspace.

Caching

Sound Spectrum, which is part of the original Animation Nodes project that calculates spectrum from audio, uses a LRU cache to store Kaiser window function results.

Caching mechanisms like this should be implemented in a larger scale in order to speed up overall execution. One of blender’s dependencies, Boost, already provides several ready-to-use classes, e.g. boost::compute::detail::lru_cache, which can be a good start.

FFT Result Storing

General audio processing software and DAWs usually generate a peak file for each corresponding audio file to achieve faster waveform drawing procedure, e.g. REAPER’s .reapeaks file:

Similar method can be used for storing calculated FFT results. Detailed file format may be designed later in coding period.

Project Schedule

Week	Task
0	Bugfix related to Geometry Nodes, RNA/DNA, and media processing
1-2	Basic functionality of Sample Sound node: gain control, playback progress control, channel selection, time domain information (overall amplitude)
3-5	Full implementation of the Sample Sound node: temporal smoothing, frequency domain information (FFT), and caching mechanism
6	Midterm evaluation
7-8	Implementation of the Sound socket type
9	Implementation of the Sound Input node
10-11	Documentation, refinement and bugfix
12	Final evaluation
13	Padding

References

Sound Spectrum: One of the nodes from the original Animation Nodes system. It already provides nearly all features that this project plans to implement, and therefore is a very valuable reference for this project.
Sound Nodes: An add-on that enables user to analyze and make use of sound clips. Its functionality is based on pre-applied keyframes, which is not as flexible as real-time calculation. Despite the limit, it can still be a possible reference for the bigger picture.
Mic Visualization System: A real-time microphone visualization system utilizing UPBGE. Although real-time input is not listed in current deliverables, it can be inspiring for possible extension in the future.

Bio

I am Lleu Yang, an undergraduate student in Computer Science from China, and also a hobbyist music producer and visual designer.

I have a solid grasp of Blender’s usage, and have made several 3D stills and movies using this amazing software.

I am proficient in C and Python, fluent enough in C++, and have been consistently learning modern C++ features and Rust.

I have prior experience in audio/video processing using FFmpeg. I also have extensive knowledge in common audio/video codecs.

I have previously worked on some projects related to signal processing that utilizes FFT, e.g. EEG analysis.

Here are my contributions to Blender so far:

Jacob_Merrill · March 23, 2024, 2:25pm

nice!

I have done raw input in upbge using a thread / py before

would be interesting to have that as well - (system audio in) - for stuff like realtime virtual set extension

Hadriscus · March 23, 2024, 5:25pm

Very exciting! any ideas how the Sample Sound node could perhaps work with fields? In your mockup I assume the node outputs a single amplitude value for each Sound Input given a specificed frequency. I can see how one would want to split up a given sound input into several amplitude values, to have geometry respond to them separately. Just food for thought, I am not particularly well versed in motion graphics but trying to think of various use cases. Good luck!

aras_p · March 24, 2024, 8:10am

Wondering, why use ffmpeg plus some external FFT library, instead of Audaspace? Audaspace is the audio library that blender uses for all things audio related already, and it does file reading, mixing, etc. etc. Plus (I think?) it can do FFT too.

LazyDodo · March 24, 2024, 8:19am

Audaspace uses fftw3 internally afaik, which ship in both float and double flavors in our standard library set. It’s also used in the ocean modifier, so it’s already integrated into the build system, ready to use.

modmoderVAAAA · March 24, 2024, 9:17am

Some current limitation for design from this mockup:

Built-in node can not have menu socket. This related with the fact that multiple menus cannot be joined from multiple sources. So, if someone will want to use 2 or more Sample Sound node together, menus will have to be connected to Index Switch and controlled such…
Not sure about using 2 new kinds of input data: Speaker and Audio File. I used OpenAL only once so i am not really expert in this area, but pretty sure that it is enough to just input positions in space to sample sound. If use object as container for such data, this can be too non-easy to just sample sound for any Speakers which is can be generated by geometry nodes (geometry nodes can’t generate objects).
Same for Audio. Not sure what this is but this might be redundant to have container of sound data per time/location, instead this should be kind of data that is used to sample sound. But this might depend on technical details…?

All this details can be resolved in blender, but might be too complicated so it might be better to solve them on design side instead.

megakite · March 25, 2024, 4:53pm

@Jacob_Merrill Nice work! Real-time audio input is definitely worth checking out. Will consider adding this feature into deliverables.

@Hadriscus It works exactly like that! A very common usage would be visualizing audio spectrum, where you can get a series of evenly distributed points connected to an Instance on Points node, then manipulate the scale value of each instance (typically cube) by specifying a range on the Sample Sound node’s Frequency option. Further exploration is all in your hands

@aras_p @LazyDodo Thank you both for suggesting Audaspace! I was previously not aware of it, so I simply decided to combine FFmpeg + an external library together. I will take a deeper look into Audaspace and reconsider the project details.

@modmoderVAAAA Thank you very much for this detailed feedback! The following is what I’ve considered so far:

The idea of having a menu socket instead of integer input is because that FFT size should be power of 2 (or 3, 5 if the library supports,) so only a fixed number of value is valid. If menu socket cannot be used in built-in nodes, I believe there should be another way to constrain input values?
If a position is all we need to sample sound from a scene, then there is definitely no need to have an object input. A position input should be enough. Will change.
The Audio container is primarily designed for audio data read from files. Although technically we can use a raw sequence to represent audio data, it is not very convenient since audio files may have various specifications, i.e. different number of channels, sample rate, etc. I guess it can also help with sounds directly sampled from scene, but I need to take a closer look at OpenAL-related things.

modmoderVAAAA · March 25, 2024, 6:56pm

I mainly interested if this is possible to just use time point value, distance value, and amplitude channel value (all floating point input fields) as way to sample just one float of sound volume. There is no way to play song in geometry nodes to speaker, so i don’t know is this is really matter to have such detailed info for time point like FFT size or other things that i don’t know about.

antonioya · March 26, 2024, 8:51am

I love the idea. Another node that could be very useful would be to handle MIDI files. Currently to process MIDI files you have to use python or other alternatives.

megakite · March 26, 2024, 11:52am

I see. After looking at the code of Audaspace, I have discovered that your point is totally valid.

If we are sampling sound from scene, it would be perfectly fine to just use the three inputs that you mentioned. When a frequency is specified, depending on the FFT Size, various amounts of latency may be introduced, since we need time for FFT to finish its calculation; but that is definitely not the reason we want to pack a whole bunch of extra information together. We can do simple buffering to get over it.

There is another thing, that I just came to realize that Blender already has a data type of Sound…which just contains about everything that I am going to put into the Audio type That means the Audio type is certainly redundant. Sorry for the frustration and I will redesign the node right away. Thank you!

modmoderVAAAA · March 26, 2024, 11:57am

Data block of sound still valid new type of socket, so blender will maintain open sound file and geometry nodes will care only about to sample this file.

OmarEmaraDev · March 26, 2024, 12:41pm

Just wanted to share that we created a similar node for Animation Nodes with some extra important features like temporal smoothing, maybe the implementation or the design could be useful for you. We used the FFT in numpy.

Example usage:

Documentation:

Code:

github.com

JacquesLucke/animation_nodes/blob/master/animation_nodes/nodes/sound/sound_spectrum.py

import bpy
import numpy
from math import expm1
from bpy.props import *
from ... utils.scene import getFPS
from ... base_types import AnimationNode
from ... data_structures import DoubleList

samplingMethodItems = [
    ("EXP", "Exponential", "Sample frequency bins exponentially", "", 0),
    ("CUSTOM", "Custom", "Sample frequency bins based on an input ranges list", "", 1),
    ("SINGLE", "Single", "Get a single frequency bin in the input frequency range", "", 2),
    ("FULL", "Full", "Get all frequency bins", "", 3)
]
reductionFunctionItems = [
    ("MEAN", "Mean", "Sample the frequency bins by computing the mean of frequency bins", "", 0),
    ("MAX", "Max", "Sample the frequency bins by computing the maximum of frequency bins", "", 1)
]
reductionFunctions = {
    "MEAN" : numpy.mean,

This file has been truncated. show original

github.com

JacquesLucke/animation_nodes/blob/master/animation_nodes/data_structures/sounds/sound.py

import numpy
from math import ceil, log
from functools import lru_cache
from . sound_sequence import sampleRate

class Sound:
    def __init__(self, soundSequences):
        self.soundSequences = soundSequences

    def getSamplesInRange(self, start, end):
        if end <= start: raise ValueError("Invaild range!")
        start, end = int(start * sampleRate), int(end * sampleRate)
        samples = numpy.zeros(end - start + 1)

        for sequence in self.soundSequences:
            sequenceStart = int(sequence.start * sampleRate)
            sequenceEnd = int(sequence.end * sampleRate)
            if start > sequenceEnd or end < sequenceStart: continue
            
            sequenceStartOffset = int(sequence.startOffset * sampleRate)

This file has been truncated. show original

megakite · March 26, 2024, 12:50pm

Thank you very much for creating & sharing this! It would definitely be of great help!

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations