A sound source’s complex interaction with its surroundings causes reflections and reverberation that provide cues to the presence of walls and obstacles, the size of the room and the distance of the source from the listener. The room acoustic also supports and colours the sound, which is employed artistically to enhance the sound quality, maintain scene continuity and increase immersion using natural and artificial reverbs. S3A research has pioneered object-based reverb by proposing methods to capture, encode and reproduce this effect, by combining an understanding of what is important for timbral and spatial sound quality with audio processing algorithms into a compound object description of reverb.
The most commonly-used dialogue object is composed of a clean speech signal and the associated metadata such as positional information. Yet, limited by non-ideal recording environments, e.g. in the presence of interfering speakers and background noise, the dialogue object might be corrupted. Thus it is necessary to extract the dialogue object from the recorded sound scene, where speech enhancement is employed to microphone recordings to extract dry speech signals such as blind source separation, and person tracking technologies are applied to audio or/and video recordings, to obtain the 3D position of a speaker related to a particular dialogue object.
Being able of estimating the acoustics of enclosed environments and describing them via parameters is fundamental for spatial audio production and reproduction. With a suitable set of parameters, the concept of a spatial audio object, which is typically limited to either dry sound or diffuse field, can be extended to allow an editable spatial description of the room acoustics. Here, I present a method for parameterizing measured room impulse responses (RIRs). The proposed set of parameters looks at describing the recorded RIRs by looking at their three main components (i.e. direct sound, early reflections and late reflections), both in the time and frequency domains.
A plausible interactive audio reproduction can be achieved by understanding the scene geometry and related materials, but it is sometimes difficult to obtain acoustic measurements for a certain environment. In this talk, a simple and efficient system to estimate room acoustic for plausible reproduction of spatial audio using a pair of 360⁰ camera is introduced. A full surrounding scene is captured by two 360⁰ cameras, and a 3D scene is reconstructed by depth estimation and semantic object segmentation. Acoustic properties of objects are assigned to objects and the acoustic scene is rendered on the VR platform.
Source distance perception in spatial sound reproduction has been less explored as compared to the angular dimension. In closed room environments, perception of both these spatial dimensions rely on reverberant cues. One of the focus of S3A project is to explore the source distance dimension by capturing room reverberation information, parameterizing it, and reproducing in an object based reverb framework. Reverberant Spatial Audio Object (RSAO) applied across various source distances under different room conditions and types of sources enabled us to understand how humans perceive source distance. Subjective tests through sound reproduction using RSAO enabled us to identify the JNDs in distance perception and also confirmed the nonlinear nature of distance perception of humans