One of the most significant opportunities to combine these expertise is in the areas of production for object-based audio, which consists of a number of audio streams together with metadata specifying how the audio is arranged to form a scene. Currently, object-based audio cannot directly be captured. Instead, clean audio channels are captured, and then combined with spatial information at the production stage.
We are developing techniques to capture the audio portion by applying spatial filtering techniques and blind source separation. This work could enable a single microphone array to capture clean object audio for a number of performers or other sound sources. We are also developing multimodal audio-visual tracking approaches to automatically estimate the metadata portions of the object-based audio. Combined, this approach could remove interfering audio from a mixture and initialise the spatial position of an audio object, providing a starting point for producers working on some 3D audio content