Topic: Building an Visual Language Model from scratch