An embedding is a fixed-length vector that represents the meaning of a piece of text in a continuous space.
Images - CLIP, Text - Glove, Audio- Wave2Vec
embedding model is used to create a Vector for Vector-DB
an embedding model takes text and outputs a single fixed-size vector that captures its meaning. Texts with similar meaning land near each other in vector space.
The model is trained so that semantically similar text lands near each other, even when the words differ.
Embedding model (typically encoder-only, like BERT-style or modified for embeddings):
- Input passes through all transformer blocks.
- At the final layer, you have a representation for every token in the input.
- Pooling step: collapse those per-token vectors into one single vector. Common methods:
- Output: one fixed-size vector representing the whole input.
- RAG Embedding pipeline
RAG retrieval failures
- Layer 1: Chunking problems
- Layer 2: Embedding quality
- Layer 3: Retrieval mechanics
- Three things called "embedding": token embeddings (inside every transformer), document embeddings (output of an embedding model), output projection (unembedding in generation models)
- One transformer per model: generation and embedding are separate deployed models, different training objectives, different output heads
- Embedding model = transformer + pooling head + contrastive training
- Pooling methods: CLS, mean, last-token