Spatial audio perception

2.1. Diffuseness and decorrelation

Diffuse sound fields such as applause, rain or reverberation are particularly difficult to reproduce in object based audio. The concept of an object with a direction is not applicable. An accurate representation would require an infinite number of uncorrelated signals from all directions. However, because of the intrinsic randomness of these sound fields, a plausible reproduction can be achieved using decorrelation methods. Early listening tests showed that decorrelation filters could not produce acceptable results. Long filters added subjectively unacceptable amount of artefacts and short ones did not increase in the amount of envelopment. However, by first extracting the transients using percussive harmonic extraction, longer filters could be used that gave much improved results. This resulted in the s3a decorrelation algorithm that can be used for rendering diffuse sound objects and also upmixing audio to multichannel. The algorithm is implemented in python and includes tools for rendering ambient/diffuse sources as well as general upmix applications.

2.2. Speech intelligibility metering

The problem of unintelligible speech on TV has attracted much media coverage and has been debated in the House of Lords. A number of factors contribute to the problem: clarity of speech delivery, balance between audio elements and reproduction equipment. The use of a spatial object-based format allows manipulation of the dialogue and background sounds (level, spectral content and position) by consumer devices. This allows personalised mixes to improve intelligibility. To enable this a binaural speech intelligibility metric was developed; later refined using machine learning to allow real-time operation. This allows optimal mixes to be made on-the-fly and also can provide feedback to sound engineers during mixing via a Speech Intelligibility Meter.

2.3. Accessible object-based audio using narrative importance metadata

The first step to making accessing broadcast audio is speech intelligibility. But making broadcast audio truly accessible for a wide range of consumers goes beyond this. For those with hearing loss, or listening in a noisy environment, the amount of audio objects and volume balance between them affects the content’s accessibility. We have developed end-user and production tools which allow the balance of objects in an audio mix to be personalised to based on individual user needs. This leverages object-based audio formats to makes personalisation simple whilst still maintaining the creative integrity of the producer’s content. These tools have been developed in collaboration with producers and integrated into the BBC’s online media player ahead of an upcoming trial of the technology on Taster.

S3A is funded by the Engineering and Physical Sciences Research Council (EPSRC).
Programme Grant Scheme – Grant Ref: EP/L000539/1
© Copyright 2020 S3A