Core Storage & Ingestion

Google Cloud Storage (GCS) Blob/object storage. Think S3. In a GenAI pipeline, this is where raw documents (PDFs, CSVs, HTML) live before processing. Your RAG system's "source of truth" before any parsing or embedding happens.

Pub/Sub Async message queue. Decouples producers from consumers. In a GenAI context, use it for async document ingestion pipelines — new document drops into GCS → Pub/Sub event fires → triggers embedding job — so your main app isn't blocked waiting.


Document & Data Processing

Document AI Managed OCR + document parsing. Turns unstructured PDFs and scanned docs into structured text with layout awareness (tables, headers, paragraphs). Critical for enterprise RAG where documents aren't clean text.

BigQuery Serverless data warehouse for analytics at scale. Relevant when a customer wants to run LLM-powered analytics over structured/tabular business data, or when you need to log and analyze eval results at scale.


AI/ML Platform (Vertex AI Suite)

This is the most important family to know. Think of Vertex AI as the unified umbrella.

Vertex AI (Platform) The orchestration layer for everything ML/GenAI on GCP. Hosts models, manages pipelines, handles training jobs, serves predictions. When you say "I'd deploy this on Vertex AI," this is what you mean.

Vertex AI Model Garden / Gemini API Where you access Gemini models (Pro, Flash, Flash-Lite) and other foundation models. You call these for generation, summarization, classification, embeddings. Know the Gemini family: Pro = most capable, Flash = fast/cheap, Flash-Lite = cheapest for high-volume triage.

Vertex AI Vector Search Managed approximate nearest neighbor (ANN) search index. This is where you store and query embeddings at scale. The retrieval engine in a production RAG system. Key advantage: fully managed, auto-scales, integrates natively with the rest of Vertex AI.

Vertex AI Evaluation Service Managed service for running evals on GenAI outputs. Supports LLM-as-judge, metric computation (Faithfulness, Answer Relevancy, etc.), and batch eval pipelines. Use this when you need structured eval rather than ad-hoc scripts.


Compute & Serving

Cloud Run Serverless container platform. You package your app (RAG API, agent endpoint) into a Docker container, deploy to Cloud Run, and it auto-scales to zero when idle and up under load. No cluster management. This is your default serving layer for GenAI microservices unless you need GPUs or persistent state.

Google Kubernetes Engine (GKE) Managed Kubernetes. Use when you need more control than Cloud Run: GPU workloads, long-running processes, stateful services, complex networking. Heavier lift than Cloud Run, but more flexible. Know the trade-off: Cloud Run = simplicity, GKE = control.

Cloud Functions Lightweight event-driven compute. For simple triggers (a file lands in GCS → run a function). Simpler than Cloud Run for small event handlers, but Cloud Run is more versatile for anything beyond trivial tasks.