In this article, we use a pretrained I-JEPA model for image similarity. We specifically use the ViT-H I-JEPA trained with 14x14 patches. The post JEPA Series Part 2: Image Similarity with I-JEPA appeared first on DebuggerCafe.| DebuggerCafe
I-JEPA methodoly teaches a vision transformer model to predict parts of an image in the latent space rather than the pixel space.| DebuggerCafe
In this article, we build a simple video summarizer application using Qwen2.5-Omni 3B model with the UI powered by Gradio. The post Video Summarizer Using Qwen2.5-Omni appeared first on DebuggerCafe.| DebuggerCafe
In this article, we cover the introduction to BAGEL, an unified multimodal model for image generation, image editing, and free-form image manipulation with non-thinking and thinking capabilties. The post Introduction to BAGEL: An Unified Multimodal Model appeared first on DebuggerCafe.| DebuggerCafe
Fine-tuning SmolLM2-135M Instruct model on the WMT14 French-to-English subset for machine translation using a small language model.| DebuggerCafe
Implementation of the Vision Transformer model from scratch (Dosovitskiy et al.) using the PyTorch Deep Learning framework.| DebuggerCafe
In this article, we explore LitGPT. We cover chatting with pretrained models, fine-tuning on custom dataset, and evaluation of model after fine-tuning. The post LitGPT – Getting Started appeared first on DebuggerCafe.| DebuggerCafe
Qwen3, the latest LLM in the Qwen family uses a unified architecture for thinking and non-thinking mode, using the same LLM for reasoning.| DebuggerCafe
Mask2Former for universal image segmentation, tackling semantic, instance, and instance segmentation with one architecture.| DebuggerCafe
In this article, we are modifying the Web-DINO 300M architecture for semantic segmentation. We will add a simple segmentation decoder head and train the model for person segmentation. The post Semantic Segmentation using Web-DINO appeared first on DebuggerCafe.| DebuggerCafe
In this article we use the Web-DINO model for image classification. We modify the Web-DINO 300M model, adding a classification head on top, freezing the backbone, and training on cotton disease classification task. The post Image Classification with Web-DINO appeared first on DebuggerCafe.| DebuggerCafe
Web-SSL 2.0 is a framework to scale DINOv2 models from 1B to 7B parameters by training them in MC-2B (MetaCLIP-2B) dataset.| DebuggerCafe
Qwen2.5-Omni is a multimodal generative AI model capable of accepting text, image, audio, and video as input while outputting text and audio.| DebuggerCafe
Qwen2.5-VL is the newest member in the Qwen Vision Language family, capable of image captioning, video captioning, and object detection.| DebuggerCafe
Phi-4 Mini and Phi-4 Multimodal are the latest Small Language Models for Chatting and Multimodal instruction following by Microsoft.| DebuggerCafe
In this article, we cover the summary of the Phi-3 technical report including the architecture, the dataset curation strategy, benchmarks, and Phi-3 vision capabilities.| DebuggerCafe
Understand the components, pretraining, and results of the Transformer Neural Network by breaking down the Attention is All You Need paper.| DebuggerCafe
In this article, we follow a code-first approach to text classification using PyTorch, NLP, and Deep Learning.| DebuggerCafe
In disaster tweet classification, we train a simple PyTorch language model using a single embedding layer and a linear layer.| DebuggerCafe
Qwen2 VL is a Vision Language model with the Qwen2 Language Decoder and Vision Transformer model from DFN as the image encoder.| DebuggerCafe
Fine-tuning Llama 3.2 Vision on a LaTeX2OCR dataset to predict raw LaTeX equations from images and creating a Gradio application.| DebuggerCafe
Llama 3.2 Vision model is a multimodal VLM from Meta belonging to the Llama 3 family that brings the capability to feed images to the model.| DebuggerCafe
Unsloth provides memory efficient and fast inference & training of LLMs with support for several models like Meta Llama, Google Gemma, & Phi.| DebuggerCafe
Carrying out DINOv2 segmentation experiments for fine-tuning and transfer learning and comparing the results.| DebuggerCafe
Modifying the DINOv2 model for semantic segmentation and training the model on the Penn-Fudan Pedestrian Segmentation Dataset.| DebuggerCafe
Exploring DINOv2 for image classification and comparing fine-tuning and transfer learning results on a custom dataset.| DebuggerCafe
DINOv2 is a self-supervised computer vision model which learns robust visual features that can be used for downstream tasks.| DebuggerCafe
Molmo is a family of new VLMs trained using the PixMo group of datasets that can describe images and also point & count objects in image.| DebuggerCafe
Multimodal RAG Chat application to chat with PDFs, text files, images, and videos using Phi-3.5 family of language models.| DebuggerCafe
Instruction tuning the OPT-125M model by training it on the Open Assistant Guanaco dataset using Hugging Face Transformers.| DebuggerCafe
Instruction tuning the GPT2 model on the Alpaca dataset using the Hugging Face Transformers library and the SFT Trainer pipine.| DebuggerCafe
OpenELM is a family of efficient language models from Apple with completely open-source weights, training, and evaluation code.| DebuggerCafe
Contact DebuggerCafe for Machine Learning, Deep Learning, and AI.| DebuggerCafe
In this article, we will be fine tuning the LRASPP MobileNetV3 segmentation model on the KITTI dataset with two different approaches and compare the results.| DebuggerCafe
Fine-tuning the Phi 1.5 model on the BBC News Summary dataset for Text Summarization using Hugging Face Transformers.| DebuggerCafe
Phi 1.5 is a 1.3 Billion Parameters LLM by Microsoft which is capable of coding, common sense reasoning, and is adept in chain of thoughts.| DebuggerCafe
Instruction following Jupyter Notebook interface with a QLoRA fine-tuned Phi 1.5 model and the Hugging Face Transformers library.| DebuggerCafe
Fine tuning Phi 1.5 using QLoRA on the Stanford Alpaca instruction tuning dataset with the Hugging Face Transformers library.| DebuggerCafe
Training a robust facial keypoint detection model by fine-tuning the pretrained ResNet50 model with the PyTorch framework,| DebuggerCafe
RT-DETR is a Real-Time Detection Transformer model with state-of-the-art performance and speed on image and video inference using PyTorch.| DebuggerCafe
Training a spelling correction model using Hugging Face Transformers using the T5 Transformer model with PyTorch framework.| DebuggerCafe
Text generation with Transformers - creating and training a Transformer decoder neural network for text generation using PyTorch.| DebuggerCafe
Training an LSTM (Long Short Term Memory) model for Word Level Text Generation using the PyTorch deep learning framework.| DebuggerCafe
In this blog post, we train a character level text generation LSTM model using the PyTorch deep learning framework.| DebuggerCafe
DebuggerCafe is a website hosting articles on Deep Learning, Machine Learning, and Artificial Intelligence.| DebuggerCafe