Google discovered how to generate millions of high-quality query reformulations without human input by literally traversing the mathematical space between queries and their target documents. Here’s How it Works This generated 863,307 training examples for a query suggestion model (qsT5) that outperforms all existing baselines. Query Decoder + Latent Space Traversal Step 1: Build a […]| dejan.ai
Google’s embedder uses dot product between normalized vectors which is computationally more efficient but mathematically equivalent to cosine similarity. How Googler’s work and think internally typically aligns with their open source code (Gemini -> Gemma) and Chrome is no exception. It’s why I look there for answers and clarity on Google’s machine learning approaches. After […]| dejan.ai
Generalist, Open‑Set Classification for Any Label Taxonomy We’ve developed a search query classifier that takes any list of labels you hand it at inference time and tells you which ones match each search query. No retraining, ever. Just swap in new labels as they appear. Old workflow Pain New workflow Build + label data + retrain […]| dejan.ai
If Marie Haynes, Barry Schwartz or Cindy Krum had written an article declaring SEO dead and proposing we rebrand our industry you’d seriously consider it. Wouldn’t you? What about Zach Cohen and Seema Amble? I don’t know either. Looked them up just now. Two VC people with insignificant footprint or long-term interest in SEO, Machine […]| dejan.ai
Embedding Methods Evaluation: Results, Key Findings, and a Surprising Insight On June 6, 2025, we ran a comprehensive evaluation comparing four different embedding methods—regular, binary, mrl, and mrl_binary—on a dataset of paired sentences. The goal was to measure each method’s speed, storage footprint, similarity quality, and accuracy against a ground-truth of sentence pairs. Below, we […]| dejan.ai
As a technical SEO, you might be diving into machine learning (ML) to understand how tools like Google’s Gemini process text. One foundational concept is subword tokenization—breaking words into smaller pieces called “tokens.” While tokens themselves are context-agnostic (they don’t consider surrounding words), they do carry an inherent bias: each token’s likelihood reflects how prominent […]| dejan.ai
1. ULM128M 2. LLMIT1B 3. GEMINI2_NANOV2 4. GEMINI2_NANOV2_EE2Q 5. GEMINI_XS 6. GEMINI_XS_DRAFTER_6LAYER_CAUSAL_USM_700M_RESIDUAL 7. GEMINI_XS_LUSM_700M_RESIDUAL_BOTTOM15 8. GEMINI2_NANOV2_EE12Q 9. GEMINI2_NANOV2_EE2_LUSM_700M 10. GEMINI2_NANOV2_CAUSAL_700M 11. GEMINI2_NANOV2_EE20_CAUSAL_LUSM_700M 12. GEMINI_XL_DRAFTER_24LAYER 13. GEMINI_XS_FA1 14. GEMMA2_8B 15. GEMMA2_7B 16. GEMMA2_2B 17. GEMMA3_1B 18. GEMMA3_4B 19. GEMMA3_12B 20. GEMMA3_27B 21. STABLELM_4E1T_3B_PHI_2_TF_LITE| dejan.ai