Language model–based AI systems such as Articulate Medical Intelligence Explorer (AMIE, our research diagnostic conversational AI agent recently published in Nature) have shown considerable promise for conducting text-based medical diagnostic conversations but the critical aspect of how they can integrate multimodal data during these dialogues remains unexplored. Instant messaging platforms are a popular tool for communication that allow static multimodal information (e.g., images and docum...