Auralization is the process used in virtual technologies to render audible sounds in physically realistic simulations or perceptually plausible manners. The generation of walking sound, which consists of transient signals, is a difficult auralization task due to the complexity of foot-floor interactions. The latter depend not only on the conditions of the acoustic surrounding and the material properties of shoes and surface, but also on the diversity of walking styles. Moreover, very few studies have been analysing relationships in sequences of footsteps. In this study, we compute variations of the spectral envelope which are then applied to render perceptually plausible sequences of footsteps. To address different aspects of foot-floor interactions, we recorded a variety of human footsteps that serve as the initial layer of sound generation in our method. Previous experiments in diffuse field recordings of footsteps showed that modifications of the spectral envelope are adequate to alternate the perceived qualities linked to different types of shoes.
Two consecutive footsteps, also known as a stride, may be seen as the unit of the periodic activity of human walking. Intervals in stride-to-stride temporal fluctuations have been identified as predictors for risk of falling, but has also been shown to facilitate the detection of human walking regardless of the perceptual plausibility of the footstep sounds. Here, we focus on the spectral variation between consecutive footsteps. Our aim is to investigate the effect of spectral envelope variation on the perceived plausibility of walking sound. Our hypothesis is that applying invariant spectral envelopes across time-varying sequences of footsteps will be perceived as significantly less plausible even in the case of footstep sequences with quasi-random modifications of the spectral envelopes.
Our methodology is a three-fold of measurements, modelling and listening evaluation. The experimental design aims to build upon the subjective percepts of walking. For this reason, we asked participants to perform repeated stride recordings in a semi-anechoic chamber while walking as naturally as possible. The recordings served as the raw material to develop a data-driven algorithm for modelling sequences of footstep sounds. The plausibility of the generated sequences of footsteps was evaluated in a web-based listening test using headphones, carried out in two parts. In the first part, we investigated the plausibility of a time-stretching algorithm on the footstep recordings. The time-stretching algorithm was used to modify intervals in stride-to-stride temporal fluctuations in the generated sequences of footsteps. In the second part, we developed a simplified model, based on scientific evidence of how the system fundamentally works. The intuition behind this simplified model is to modify the spectral envelope in a smoothed manner by applying quasi-random modifications on the spectral centroid of the footsteps. We compare our data-driven model against both the simplified model and the invariant sequences (control variable).
The listening test results showed that footstep recordings were significantly more plausible than the time-stretched versions. Furthermore, the data-driven model was significantly more plausible than the control variable and more plausible than the simplified model. The latter was not significantly more plausible than the control variable, although this might be due to the limited experimenter's control on the web-based listening test. These results suggest that spectral variations may significantly improve self-report ratings of perceived plausibility of walking sound. The effect may be attributed to the inherent search of contextual cues during perceptual processing which arise from spectral envelope shifts. This strengthens the argument that intervals in stride-to-stride temporal fluctuations between footsteps cannot be a sufficient condition to facilitate the perceived plausibility of walking sound, although they may account for the recognition of human walking. Such recognition cues may be important for biological motion perception which is linked to evolutionary aspects like survival value. Furthermore, the data-driven modelling makes a link between listening evaluations of plausibility and subjective percepts of walking activity. Future studies may have to take into account a combination of both data-driven models and spectral weighting functions that facilitate perceived continuity of sequential sounds towards a hybrid modelling approach.