Listener-centred reproduction

4.1. The VISR Framework

The VISR (Versatile Interactive Scene Renderer) is a software framework for generic audio processing, multichannel and spatial audio created in the S3A project. It enables fast prototyping and efficient processing of many spatial reproduction techniques and complex object-based rendering schemes. The modular design of the VISR fosters the flexible use of processing components ­ including standalone real-time renderers, offline simulations, MAX/MSP externals or DAW plugins. Example applications include the S3A object-based rendering system, the binaural synthesis toolkit, adaptive array processing for sound bars, and semantically informed rendering. The VISR framework has been publicly released under an open source license to make the object-based technology created in S3A available to researchers and software creators to foster reproducible research

4.2. Adaptive cross-talk cancellation

Cross-talk cancellation (CTC) is a method of reproducing, real, binaural 3D audio with a low number of loudspeakers in front of a listener. Compared with other reproduction methods that require loads of loudspeakers around a listener in order to provide surround sound, CTC is able to provide a very realistic immersion with the use of compact sound systems, being able of reproducing also distance cues. CTC used to have one drawback, and is that they suffered from a very determined sweet-spot constraint. In order to overcome this limitations, the S3A programme developed technology for listener-adaptive CTC. Using computer video sensors that track the instantaneous position of the listeners, the control system of the CTC is adapted in real-time, ensuring a full 3D immersion regardless of the listeners’ position. The S3A programme has developed prototypes to test the use of CTC in various applications (gaming, laptop, Hi-Fi), proving that this technology can be an alternative to actual surround reproduction methods.

4.3. Compensated Amplitude Panning

One challenge of reproduction with compact loudspeaker systems is to produce rear images on the rear side, opposite the loudspeakers. Static systems of this kind cannot produce stable rear images because dynamic cues caused by small head movement point to the front. However tracking the head orientation opens the possibility of dynamically modifying the loudspeaker feeds to produce accurate dynamic cues. Compensated Amplitude Panning (CAP) is method that achieves this in the frequency band where the dominant directional cues are most active. CAP is simple and efficient, and can provide a full immersive experience using two loudspeakers. It can be used to directly play back object-based content, channel based content, such as 7.0, and Ambisonic content like that used in 360 video. Furthermore, images can be placed at precise positions, which the listener can walk around, a form of audio augmented reality.

4.4. Generalised framework for multi-loudspeaker amplitude panning

While amplitude panning techniques, for example VBAP, are widely used for practical spatial sound reproduction, they are typically pragmatic extensions of two-channel panning laws. We introduce a theoretical framework to describe multi-loudspeaker panning in the methodology of convex optimisation, and show how advantageous properties of amplitude panning, such as locality, sparsity, and nonnegativity of gains, are reflected in a convex optimisation formulation. This framework is then used to create new panning algorithms. Here we describe extensions to reproduce audio objects with constant or variable spatial extent (spread). Using the proposed framework, we create algorithms that allow for precise control of the spread while retaining the efficiency and sound quality of amplitude panning.

4.5. Device orchestration for customising a listening experience

The cost and complexity of installing spatial audio systems can be a barrier to entry. However, there are often already many addressable devices that can produce audio in the home (e.g. smartphones, laptops). S3A has investigated the concept of “Media device orchestration” (MDO); using ad hoc sets of devices to deliver or augment an experience. Both qualitative and quantitative studies have shown MDO to deliver immersive listening experiences comparable with conventional two and five channel systems. A specially commissioned BBC Taster audio drama “The Vostok-K Incident” explored production and distribution methods, and allowed listener experience evaluation outside of the lab.

S3A is funded by the Engineering and Physical Sciences Research Council (EPSRC).
Programme Grant Scheme – Grant Ref: EP/L000539/1
© Copyright 2020 S3A