2. Offline Eval — The Structured Part

2.6 Frameworks to Name

RAGAS — the most-cited OSS RAG eval library. Implements Faithfulness, Answer Relevancy, Context Precision/Recall out of the box.
DeepEval / Promptfoo — general LLM eval frameworks, support pytest-style assertions.
Vertex AI Gen AI Evaluation Service — Google's managed offering, computational and model-based metrics, integrates with Vertex AI Pipelines. Name this in GCP-flavored answers.
LangSmith / Langfuse — used for both evals and tracing; the dataset + experiments features are eval-flavored.
TruLens — eval + observability oriented around RAG triad (groundedness, context relevance, answer relevance).