Beyond Language: Multimodal Semantic Representations

Welcome to MMSR II, the second workshop on Multimodal Semantic Representations!

The demand for more sophisticated natural human-computer and human-robot interactions is rapidly increasing as users become more accustomed to conversation-like interactions with AI and NLP systems. Such interactions require not only the robust recognition and generation of expressions through multiple modalities (language, gesture, vision, action, etc.), but also the encoding of situated meaning.

When communications become multimodal, each modality in operation provides an orthogonal angle through which to probe the computational model of the other modalities, including the behaviors and communicative capabilities afforded by each. Multimodal interactions thus require a unified framework and control language through which systems interpret inputs and behaviors and generate informative outputs. This is vital for intelligent and often embodied systems to understand the situation and context that they inhabit, whether in the real world or in a mixed-reality environment shared with humans.

Furthermore, multimodal large language models appear to offer the possibility for more dynamic and contextually rich interactions across various modalities, including facial expressions, gestures, actions, and language. We invite discussion on how representations and pipelines can potentially integrate such state-of-the-art language models.

Goals

This workshop intends to bring together researchers who work to capture elements of multimodal interaction such as language, gesture, gaze, and facial expression with formal semantic representations. We provide a space for both theoretical and practical discussion of how linguistic co-modalities support, inform, and align with “meaning” found in the linguistic signal alone. In so doing, the MMSR workshop has several goals:

To provide an opportunity for computational semanticists to critically examine existing NLP semantic frameworks for their validity to express multimodal elements;
To explore and identify challenges in the semantic representation of co-modalities across domains and tasks;
To better understand how neurosymbolic architectures can be utilized for multimodal processing and interpretation;
To demonstrate functionality of systems that incorporate representations designed for multimodal communication.

Venue

MMSR II will be held on September 24, 2025, in Düsseldorf, Germany in conjuction with IWCS 2025!