Audio-Visual DSP for spatial audio object localisation and separation

Two core challenges of Stream 3 are enabling object-based 3D spatial audio recording and production, and exploiting audio-visual information to achieve the advance in machine perception required for object-based audio representation. The computer vision expertise is also being used to track listeners, thereby advancing spatial audio reproduction technologies. The research fellows’  interests cover a broad range of topics including acoustics, spatial audio, blind source separation, machine audition and computer vision.

One of the most significant opportunities to combine this expertise is in the area of production for object-based audio, which consists of a number of audio streams together with metadata specifying how the audio is arranged to form a scene. Currently, object-based audio cannot directly be captured. Instead, clean audio channels are captured, and then combined with spatial information at the production stage. 
 
 
We are developing techniques to capture the audio portion by applying spatial filtering techniques and blind source separation. This work could enable a single microphone array to capture clean object audio for a number of performers or other sound sources. We are also developing multimodal audio-visual tracking approaches to automatically estimate the metadata portions of the object-based audio. Combined, this approach could remove interfering audio from a mixture and initialise the spatial position of an audio object, providing a starting point for producers working on some 3D audio content.

Stream 3 is based at the University of Surrey with Dr Philip Jackson leading the team in CVSSP (Luca Remaggi, Hansung Kim, Will Morris, Sandeep Chitreddy and Qingju Liu). Research is focused around Audio-Visual DSP for spatial audio object localisation and separation

S3A is funded by the Engineering and Physical Sciences Research Council (EPSRC).
Programme Grant Scheme – Grant Ref: EP/L000539/1
© Copyright 2018 S3A