Topic: Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models