Generation models (decoder-only)
Google / Vertex AI:
- Gemini 2.5 Pro — flagship, ~2M context, MoE, multimodal (text/image/video/audio)
- Gemini 2.5 Flash — faster, cheaper, smaller, still 1M+ context
- Gemini 2.5 Flash-Lite — cheapest, fastest, for high-volume simple tasks
- Gemma 3 — open-weights, smaller (1B-27B), for on-prem / edge
Anthropic:
- Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5 — tiered by capability and cost
OpenAI:
- GPT-5 family — flagship + mini variants
Open weights:
- Llama 3.x / 4.x (Meta) — common for on-prem deployments
- DeepSeek V3 / R1 — strong MoE, reasoning-focused
- Mistral / Mixtral — European, often used for sovereignty/compliance
Embedding models (separate deployment)
Google:
gemini-embedding-001 — current default, 3072 dims, supports Matryoshka truncation
- Older:
text-embedding-004, textembedding-gecko
OpenAI:
text-embedding-3-large (3072 dims), text-embedding-3-small (1536 dims)