The strong link between prosodic expression in speech and gesture has been established through ample empirical evidence that prosodic prominence, prosodic phrasing as well as floor management is expressed and processed in a multi-modal fashion.
However, the exact functional and formal relationship between the different modalities is still not well understood:
First, we lack knowledge about which aspects of prosodic expression, namely signal features such as pitch movements or rather structural features such as information structure, are actually reproduced across modalities. To shed light on this issue, I will present a recent series of analyses (Wagner et al., in press), where this question was tackled with a newly developed methodological approach: Listeners were asked to "reproduce" acoustically presented utterances in a drumming task. The results revealed that the patterns of drumming intensities closely resemble patterns of perceptual prominence gathered with established annotation schemes, and involving both prosodic experts and naive listeners. However, as unveiled by a Random Forest Analysis, the gestural reproductions of prosody were driven comparatively stronger by signal cues than by linguistic structure. Also, we found different strategies for the gestural interpretations of prosodic patterns: a largely signal-driven, and a more integrative strategy.
Second, we do not know much about the flexibility or stability of the cross-modal link in prosodic expression. While speech economy models predict a flexible relationship depending on communicative demands and cross-modal compensation, a strong view of cross-modal congruence predicts a stable cross-modal co-ordination. To explore this issue, I will present a series of recent studies of semi-spontaneous, task-oriented interaction (Wagner and Bryhadyr, 2017; Wagner et al., 2019a; 2019b; in prep.) aiming at a better understanding of the impact of (1) information structure and (2) visibility between interlocutors on the cross-modal link in prosodic expression. Our results once again confirm a strong cross-modal temporal co-ordination. Furthermore, we detected a systematic modulation of this co-ordination as a function of communicative demands: In important or unpredictable contexts, co-speech movements occur later and align tightly with corresponding pitch peaks if interlocutors can see each others' hands. Also, a lack of facial visibility between interlocutors leads to a significantly earlier production of corresponding co-speech movements.
In summary, our results show that co-speech movements in general can express a rich set of signal and structural cues inherent in speech prosody, and that the degree of temporal co-ordination between speech and co-speech movements is a function of communicative needs.
As a side result, we found that cross-modal prosodic link also extends to co-speech movements such as drumming or manual moves on a game board, which are not gestures in the traditional understanding of the term.
Wagner, P. and N. Bryhadyr (2017). Mutual Visibility and Information Structure Enhance Synchrony between Speech and Co-Speech Movements. Journal of Multimodal Communication Studies 4(1-2): 69-74.
Wagner, P., Cwiek, A., and B. Samlowski (in press). Exploiting the speech-gesture link to capture fine-grained prominence impressions and listening strategies. Journal of Phonetics.
Wagner P., Bryhadyr, N., Schröer, M., and B. Ludusan (2019a). Does information-structural acoustic prosody change under different visibility conditions? In: Proceedings of the International Congress of Phonetic Sciences 2019, Melbourne, Australia.
Wagner P., Bryhadyr N., and Schröer M. (2019). Pitch Accent Trajectories across Different Conditions of Visibility and Information Structure - Evidence from Spontaneous Dyadic Interaction. Proceedings of Interspeech 2019, Graz, Austria.
Wagner, P. et al. (in prep.). The temporal co-ordination between speech prosody and co-speech movements as a function of communicative needs.