As humans, we often experience moments that are difficult to put into words alone — from the subtle body language during business negotiations to the joy of witnessing a baby’s first steps. Can we equip large language models (LLMs) with the ability to see, like humans do? Current multimodal models that reason about images need […]