Embedding Model

An embedding is a fixed-length vector that represents the meaning of a piece of text in a continuous space.

Images - CLIP, Text - Glove, Audio- Wave2Vec

embedding model is used to create a Vector for Vector-DB

an embedding model takes text and outputs a single fixed-size vector that captures its meaning. Texts with similar meaning land near each other in vector space.

The model is trained so that semantically similar text lands near each other, even when the words differ.

Common dimensions

Embedding model (typically encoder-only, like BERT-style or modified for embeddings):

Input passes through all transformer blocks.
At the final layer, you have a representation for every token in the input.
Pooling step: collapse those per-token vectors into one single vector. Common methods:
Output: one fixed-size vector representing the whole input.
RAG Embedding pipeline

RAG retrieval failures

Layer 1: Chunking problems
Layer 2: Embedding quality
Layer 3: Retrieval mechanics
Three things called "embedding": token embeddings (inside every transformer), document embeddings (output of an embedding model), output projection (unembedding in generation models)
One transformer per model: generation and embedding are separate deployed models, different training objectives, different output heads
Embedding model = transformer + pooling head + contrastive training
Pooling methods: CLS, mean, last-token