CAPS: A Content Attribution Payment Scheme for the AI Era| dejan.ai
The Problem: A Broken Content Ecosystem We’re watching the collapse of the web’s economic model in real-time, and everyone knows it. AI assistants have fundamentally changed how people consume information. Why wade through ten articles when Claude, ChatGPT, or Gemini can synthesize an answer in seconds? Why maintain 100 browser tabs for research when AI […]| DEJAN
This is the raw data dump from our citation mining pipeline demo on social media. Entered Entities ✅ AEO (10 prompts) ✅ AI Marketing (10 prompts) ✅ AI Optimization (10 prompts) ✅ AI SEO (10 prompts) ✅ AIO (10 prompts) ✅ Answer Engine Optimization (10 prompts) Mining Parameters Available Prompts: 60GPT-5 Citations: 141Gemini Citations: 400Total […]| DEJAN
When you populate your website with language model–generated text, you inherit a subtle but real risk: AI-specific artifacts may leak into the published content. These markers aren’t always obvious to human readers, but they can be highly visible to search engines, researchers, and competitors. One such artifact is the structured output marker that GPT-5 (and […]| DEJAN
In-Context Fine-Tuning for Time-Series: The Next Evolution Beyond Prophet and Traditional Forecasting How Google’s TimesFM-ICF achieves fine-tuned model performance without training – and why this changes everything for production forecasting systems If you’re reading this, you’ve likely wrestled with time-series forecasting in production. Perhaps you’ve implemented Facebook Prophet for its interpretable seasonality decomposition, experimented with […]| DEJAN
├───aocr│ └───google_ocr│ └───engine│ └───page_layout_mutators│ group_rpn_text_detection_mutator_runtime_options.proto│├───aphotos│ └───vision│ └───visionkit│ ├───drishti│ │ hexagon_delegate_calculator.proto│ ││ ├───engines│ │ └───proto│ │ audio_classifications.proto│ ││ ├───pipeline│ │ ├───drishti│ │ │ └───calculators│ │ │ tflite_task...| DEJAN
RexBERT is a domain-specialized language model trained on massive volumes of e-commerce text (product titles, descriptions, attributes, reviews, FAQs). Unlike general-purpose transformers, it is optimized to understand the quirks of product data and the way consumers phrase queries. For a technical SEO professional, this means better alignment between how search engines interpret product content and […]| DEJAN
1. Introduction What is APC? Annotated Page Content (APC) is a structured and actionable representation of a webpage’s content and layout. Its primary function is to enable a deep understanding of page structure, content, and interactive elements by downstream clients, who can receive the information as a protobuf tree. Core Principles APC is designed with […]| DEJAN
Chrome’s “Reader Mode” and its underlying engine, DomDistiller, provide a transparent look into the principles of machine readability. It’s a valuable, real-world model of how a sophisticated Google technology parses, evaluates, and isolates main content from boilerplate. Understanding its mechanics is not about optimizing for a browser feature; it’s about reverse-engineering a proxy for how […]| DEJAN
Classic IR: crawl, index, retrieve, rank remain with search engines. There is a persistent myth that large language models (LLMs) have fundamentally replaced search. In truth, LLMs do not crawl the web, do not maintain indexes, and do not enforce ranking algorithms at internet scale. They operate as presentation and reasoning layers on top of […]| dejan.ai
At its core, Gemini operates as an orchestration layer managing a foundational large language model (LLM). Its primary function is to deconstruct a user prompt into a directed acyclic graph (DAG) of executable tasks. These tasks are then delegated to a suite of specialized tools accessed via synchronous API calls.| DEJAN
This is the story of how AI transitioned from niche to mainstream and the pieces that fell into place to make that happen. Picture this. It’s 2017, we’re in the era dominated by Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), LSTM is cutting edge. These models are tiny, and the common wisdom is […]| dejan.ai
The Temperature parameter is a crucial setting used in generative AI models, such as large language models (LLMs), to influence the randomness and perceived creativity of the generated output. It directly affects the probability distribution of potential next words. Understanding the Basics What the Temperature Value Does In Practical Terms Using the sentence “The cat sat on […]| dejan.ai
GEO stands for Generative Engine Optimisation, an acronym easily confused with, the well-established “geo-” prefix commonly associated with Geosciences. What is a ‘Generative Engine’? Generative engine is recently made up term by the marketing community in an attempt to rename Chatbots, more recently known as AI Assistants including ChatGPT, Claude, Grok, Gemini and Perplexity. Basically […]| dejan.ai
John Botman For nearly two centuries, journalism operated under the assumption that truth mattered, stories should be original, and humans should write things for other humans to read. Quaint, right? We trusted journalists—those quirky creatures who collected facts, verified sources, and occasionally spelled words correctly—to give us nuanced, insightful accounts of the world. Oh, how […]| dejan.ai
In our previous post, Training a Query Fan-Out Model, we demonstrated how to generate millions of high-quality query reformulations without human labelling, by navigating the embedding space between a seed query and its target document and then decoding each intermediate vector back into text using a trained query decoder. That decoder’s success critically depends on […]| dejan.ai
Google discovered how to generate millions of high-quality query reformulations without human input by literally traversing the mathematical space between queries and their target documents. Here’s How it Works This generated 863,307 training examples for a query suggestion model (qsT5) that outperforms all existing baselines. Query Decoder + Latent Space Traversal Step 1: Build a […]| dejan.ai
Google’s embedder uses dot product between normalized vectors which is computationally more efficient but mathematically equivalent to cosine similarity. How Googler’s work and think internally typically aligns with their open source code (Gemini -> Gemma) and Chrome is no exception. It’s why I look there for answers and clarity on Google’s machine learning approaches. After […]| dejan.ai
Generalist, Open‑Set Classification for Any Label Taxonomy We’ve developed a search query classifier that takes any list of labels you hand it at inference time and tells you which ones match each search query. No retraining, ever. Just swap in new labels as they appear. Old workflow Pain New workflow Build + label data + retrain […]| dejan.ai
If Marie Haynes, Barry Schwartz or Cindy Krum had written an article declaring SEO dead and proposing we rebrand our industry you’d seriously consider it. Wouldn’t you? What about Zach Cohen and Seema Amble? I don’t know either. Looked them up just now. Two VC people with insignificant footprint or long-term interest in SEO, Machine […]| dejan.ai
Embedding Methods Evaluation: Results, Key Findings, and a Surprising Insight On June 6, 2025, we ran a comprehensive evaluation comparing four different embedding methods—regular, binary, mrl, and mrl_binary—on a dataset of paired sentences. The goal was to measure each method’s speed, storage footprint, similarity quality, and accuracy against a ground-truth of sentence pairs. Below, we […]| dejan.ai
As a technical SEO, you might be diving into machine learning (ML) to understand how tools like Google’s Gemini process text. One foundational concept is subword tokenization—breaking words into smaller pieces called “tokens.” While tokens themselves are context-agnostic (they don’t consider surrounding words), they do carry an inherent bias: each token’s likelihood reflects how prominent […]| dejan.ai
1. ULM128M 2. LLMIT1B 3. GEMINI2_NANOV2 4. GEMINI2_NANOV2_EE2Q 5. GEMINI_XS 6. GEMINI_XS_DRAFTER_6LAYER_CAUSAL_USM_700M_RESIDUAL 7. GEMINI_XS_LUSM_700M_RESIDUAL_BOTTOM15 8. GEMINI2_NANOV2_EE12Q 9. GEMINI2_NANOV2_EE2_LUSM_700M 10. GEMINI2_NANOV2_CAUSAL_700M 11. GEMINI2_NANOV2_EE20_CAUSAL_LUSM_700M 12. GEMINI_XL_DRAFTER_24LAYER 13. GEMINI_XS_FA1 14. GEMMA2_8B 15. GEMMA2_7B 16. GEMMA2_2B 17. GEMMA3_1B 18. GEMMA3_4B 19. GEMMA3_12B 20. GEMMA3_27B 21. STABLELM_4E1T_3B_PHI_2_TF_LITE| dejan.ai
Using the same tech behind AI Rank, we prompted Google’s latest Gemini 2.5 Pro model with search grounding enabled in the API request. A total of 10,000 prompts were collected and analysed to determine the grounding status of the prompt. The resulting data was then used to train a replica of Google’s internal classifier which […]| dejan.ai
The “Probability Threshold for Top-p (Nucleus) Sampling” is a parameter used in generative AI models, like large language models (LLMs), to control the randomness and creativity of the output text. Here’s a breakdown of what it does: Understanding the Basics What the Threshold Value Does In Practical Terms Imagine you’re asking the model to complete […]| dejan.ai
Google’s Gemini models are designed to provide users with accurate, timely, and trustworthy responses. A key innovation in this process is grounding, the ability to enhance model responses by anchoring them to up-to-date information from Google Search. However, not every query benefits from grounding, and Google has implemented a smart mechanism to decide when to […]| dejan.ai
It’s an exciting time to be in SEO. Honestly, it feels like 2006 all over again – a period of rapid change, innovation, and frankly, a whole lot of fun. For a while there, things had gotten a little… predictable. Technical SEO, keyword research, competitor analysis, link building, schema… it was all necessary, of course, […]| dejan.ai
UPDATE: Addressing guardrails, hallucinations and context size. 1. People are reporting difficulties in recreating the output due to guardrails and hallucinations. 2. Snippet context sometimes grows to several chunks. Guardrails Google attempts (and in many cases) succeeds at blocking these requests, but it does so in a very clumsy way so that we actually get […]| dejan.ai