I-JEPA methodoly teaches a vision transformer model to predict parts of an image in the latent space rather than the pixel space.| DebuggerCafe
DINOv2 is a self-supervised computer vision model which learns robust visual features that can be used for downstream tasks.| DebuggerCafe
OpenELM is a family of efficient language models from Apple with completely open-source weights, training, and evaluation code.| DebuggerCafe